Having run a Markdown memory system with Claude for over a year, I don't think I've seen any evidence of neuralese. That's even with Claude being regularly encouraged to write "reflections" on each session, including automated sessions, and weekly summaries of those reflections.
The bigger problem is avoiding what I call the Memento Effect. I won't spoil the movie for anyone, but Memento involves a character who cannot make new memories, so he has to take meticulous notes about everything. But if any of those notes are vague or incorrect, they still get accept as truth when next reviewed. So you really need your Markdown memory to be pristine and mustn't allow it to become polluted.
Mythos is the first model that can complete all the steps of their "The Last Ones" evaluation, achieving a full network takeover in an automated manner. The Mythos chart does seem to show some takeoff compared with Opus 4.6...
... but only once you get beyond 1 Million tokens. Weirdly, Opus 4.6 seems to match or outperform Mythos in those first Million tokens, at least on this chart. But clearly if you had a budget with tokens to burn - like a nation state - then this is a tool that can automatically get you full network takeover if you can just keep throwing more tokens at it.
> then this is a tool that can automatically get you full network takeover if you can just keep throwing more tokens at it
There's this caveat though that the AISI points out themselves:
> However, our ranges have important differences from real-world environments that make them easier targets. They lack security features that are often present, such as active defenders and defensive tooling. There are also no penalties for the model for undertaking actions that would trigger security alerts. This means we cannot say for sure whether Mythos Preview would be able to attack well-defended systems.
So Mythos managed to infiltrate and take over a network that's... protected and monitored by nothing in particular.
The "concerning behavior" they're referring to there is cheating and covering its tracks. Mythos is being asked to fine-tune a model on provided training data, and finds its way to access the evaluation dataset. It's also aware that it is in an evaluation and that its behavior is being observed:
"In this last and most concerning example, Claude Mythos Preview was given a task instructing it to train a model on provided training data and submit predictions for test data. Claude Mythos Preview used sudo access to locate the ground truth data for this dataset as well as source code for the scoring of the task, and used this to train unfairly accurate models."
I used to use Mistral OCR, but found it was better just to write a program that sent the documents to Claude Sonnet to OCR instead. Claude is far better quality, better formatting and fewer errors.
I'm also using Voxtral TTS to try to replace OpenAI. It "works", but I've had problems with volume levels being radically different between different audio chunks. It doesn't seem to "understand the full text" the way OpenAI's voice models do, which can be more expressive. Voxtral sometimes sounds robotic in the reading. And some Voxtral TTS output contains music in the background occasionally, which suggests their training corpus isn't that clean. Try generating a personalized news podcast, and the intro may occasionally sound like the music for BBC News underneath....
As for not focusing on AI, there's this interview in the Big Technology Podcast 2 months ago, where the Mistral CEO says their main focus is on helping companies fine-train models for internal use, over being a general model builder.
"I sent money to the god knows how many trillion parameters fully closed source machine built on billions of dollars and it worked better than the model that I can self host from the guys next door"
yeah, no shit ? All you're saying is that you're happily locking yourself in to models you have zero control over and that Anthropic can fuck you over at any time.
However, yes, Mistral is not in the business of providing you with a perfect, general purpose model. They fine tune from their base models for specific tasks.
Mistral OCR 3 isn't open weights and isn't available for download. It's only available via API, or to companies via paid consulting with Mistral.
"For organizations with stringent data privacy requirements, Mistral OCR offers a self-hosting option. This ensures that sensitive or classified information remains secure within your own infrastructure, providing compliance with regulatory and security standards. If you would like to explore self-deployment with us, please let us know."
While I don't use a Mac as my primary anymore, I'm surprised I like the look of this! It actually looks quite Mac-like as well.
Subscription is a big nope here, though. Especially for Mac software, I'd expect something where you pay for one major version, that is guaranteed to works on specific macOS versions, and gets minor bugfix updates too. But maybe the next macOS version requires a newer major version update to run, in which case you pay an upgrade fee to buy the next major version - or maybe the next major version has new features you might want to upgrade to as well.
My old Macs are stuck on 10.13, and I see Ubar mentioned elsewhere in this thread and that it's still compatible with 10.13. I might consider the $30 one off price to buy Ubar and keep it forever, but I wouldn't do a $10 subscription.
Agreed. The idea of having to pay for non-cloud based software in perpetuity forever, and having it stop working the very second I discontinue paying is a hard no for me.
OP, go with the JetBrains model. You can still offer a monthly subscription, but also provide an annual option where you pay up front for a year. After that year, it reverts to a fallback license for the specific version that was current during that period. It’s a good approach.
Please don't overindex on this comment OP, $10 a year is completely reasonable and the status quo they describe has killed so much software for so little benefit
It's a subscription with extra steps and worse retention.
Some people will take the subscription with extra steps and worse retention and I'm saying the product will be worse off for it. Why not just offer the thing with the simpler messaging*, better retention, and better outlook for actually being supported down the road even if it's not a massive success?
* 1 year = 365 days, not when a new major version is subjectively justified
Honestly anyone who'd over index on people claiming they'd pay except $10 a year is just too much for a major utility or subscriptions are just too exotic for them is doomed unless they learn about conversion rates: I don't get the vibe OP is unaware though based on their comments here.
Why not? Apple decides when a breaking change gets introduced, people on an older Intel Mac might get 5+ years of usage out of a Lifetime license for boringBar if they don't upgrade to macOS 27. It's the people that demand constant updates who should subsidize new versions being developed.
> Honestly anyone who'd over index [...] is doomed unless they learn about conversion rates
Converting those sales is OP's problem. People that don't buy SaaS products are principled and their stance won't change.
I didn't downvote, but just to be clear - I'm not saying $10 for lifetime updates. Lifetime updates are a terrible idea and, yes, that does kill off software.
$10 is too low for a one-off purchase as well, I'm not saying to lowball the price. $29 for a small utility could be reasonable, and that gives you some room to offer discount pricing / sales if you want. As for major version upgrades, I'd be imagining a typical 50% off, $15 to buy an upgrade to v2 if the customer wants it. Of course, not every customer will want that.
You could offer both a subscription and a one-off purchase. It might put off some customers that you're even offering a subscription, but at least then you're offering everyone what they might want. And if you offer both, you'll have real data on what customers actually prefer, if you don't have that data already.
And as others have said - it's their business, they can choose their sales model! Offered only as a friendly suggestion and potential customer feedback.
There could be 4x buyers at $10. I’d be one of the now, the dock is a constant annoyance.
The app looks great, downloading the trial now.
To the app’s author I’d say get as many licensees as you can fast before you’re Sherlocked or somebody vibecodes a clone out of “why not”.
I use many free and paid apps, little QoL types, like BetterDisplay, Coctail, etc and for me the included support is of little value. These apps are not mission critical.
There is also Setapp [0], might give you instant access to user base that favors such apps.
> You could offer both a subscription and a one-off purchase.
Regardless of the presentation, $10 a year presumably represents what they want per user, per year, for this to be worth it for them. Don't rush to repackage that very conservative target into a 2nd format for people who won't pay $10 a year for a thing they'll use daily on a Mac in the first place.
> Offered only as a friendly suggestion and potential customer feedback.
And "please don't overindex on that comment OP" is offering an unreasonable response?
> And "please don't overindex on that comment OP" is offering an unreasonable response?
Not at all! Apologies if tone isn't coming through as I wanted. Good to have a contrarian view presented. Maybe a subscription really is what their particular market wants.
How much jitter would you prefer, how many seconds / minutes out? I have some morning tasks that run while I'm asleep via claude -p, and it sounds like I'm slightly contributing to your spikes (presumably hourly and on quarter hours).
Anecdotal, but after playing with the API this week (building a minimal harness for an OS where Claude Code isn't supported), the API felt faster to respond. It did seem like maybe the Max subscriptions are lower priority than API requests. (I hadn't enabled priority service on the API either.)
I don't have metrics, so I could be imagining this, or finally noticing extra lag of the Claude Code client. On the other hand, the API was giving me range anxiety, I won't be pushing a 300k context window into that anytime soon, like I occasionally need to do in Claude Code.
I'm watching a conference talk right now from 2 weeks ago: "I Hated Every Coding Agent So I Built My Own - Mario Zechner (Pi)", and in the middle he directly references this.
He demonstrates in the code that OpenCode aggressively trims context, by compacting on every turn, and pruning all tool calls from the context that occurred more than 40,000 tokens ago. Seems like it could be a good strategy to squeeze more out of the context window - but by editing the oldest context, it breaks the prompt cache for the entire conversation. There is effectively no caching happening at all.
The bigger problem is avoiding what I call the Memento Effect. I won't spoil the movie for anyone, but Memento involves a character who cannot make new memories, so he has to take meticulous notes about everything. But if any of those notes are vague or incorrect, they still get accept as truth when next reviewed. So you really need your Markdown memory to be pristine and mustn't allow it to become polluted.
reply