More

mkozlows · 2026-05-28T21:51:49 1780005109

I was hoping that the web UI would be better -- I like Anthropic better than OpenAI from a values perspective and want to use their products, but ChatGPT in thinking mode has been just vastly better than claude.ai.So my fingers were crossed that these changes would bring it up to par.

But trying it out... alas, no. Simple factual questions where ChatGPT would go do a quick search and get the facts and report them back to me, get a "Great question! [totally invented bullshit]" from Claude, even with this new model and thinking set to high. I have to explicitly tell it to search to get it to look up basic facts, rather than it recognizing that it needs to do that, like GPT does.

Paracompact · 2026-05-28T23:03:30 1780009410

What are some examples?

mkozlows · 2026-05-25T03:50:49 1779681049

5.2 still had a Codex variant, which this doesn't describe using. It also notably is not using the Codex harness -- it does everything with open source harnesses (which obviously are worse). And while it uses two harnesses with its cheap models, it only uses the worse-performing one of those with GPT 5.2 for cost reasons. (They also don't specify effort/thinking level used for GPT 5.2, but given that it performs worse in their baseline testing than obviously non-SOTA models, I'm guessing it wasn't set to anything high.)

mkozlows · 2026-05-23T05:02:15 1779512535

Yeah, they conflate Microsoft's actions (which are not about cost) with a random quote from the "vice president of applied deep learning at Nvidia," who says that compute costs more than people on his team -- but his team isn't using LLMs for software development, they're literally a deep learning team that is burning compute in deep learning development ways.

If people would do even a little bit of math, they'd see that Microsoft can't possibly be paying more for AI than for developers: They have about 80K employees in product development roles. Senior developers probably cost them $400K all-in.

Do they have a $32 billion Claude bill? I suspect they do not.

mkozlows · 2026-05-11T23:12:28 1778541148

The big thing on their roadmap is rearchitecting for something that can handle the increased load, though. Like, they're clearly paranoid that if they don't move fast, they're going to be just as busted as Github.

mkozlows · 2026-05-06T16:26:45 1778084805

The comments aren't an LLM thing, they're a Claude thing. Codex doesn't write those gross hyper-verbose comments.

user34283 · 2026-05-06T23:13:31 1778109211

In my experience Codex barely writes any comments, despite my attempts to encourage it in the AGENTS.md.

mkozlows · 2026-05-04T15:52:08 1777909928

https://www.theverge.com/2022/5/21/23079058/apple-self-servi...

79-pound hyper-elaborate repair kit. Expensive for them to send out, but since only two people will ever want them to, probably amortizes well.

mkozlows · 2026-05-01T16:51:11 1777654271

Everything in this article is purely fake. The numbers don't add up, don't match any reported info, and are just fiction.

mkozlows · 2026-05-01T16:37:47 1777653467

This terrible unsourced article seems to be citing this information piece: https://www.theinformation.com/newsletters/applied-ai/uber-c...

... but the key fact about "$500-$2000" per engineer does not appear there, and seems to be fabricated.

dcre · 2026-05-01T17:58:45 1777658325

Thank you for the link.

mkozlows · 2026-04-12T06:01:13 1775973673

Most Kickstarters have a fake low goal so that they can hit it and "blow past it by 1000%!!!" If a Kickstarter hits its goal, but then still cancels, that typically wasn't their real goal.

mkozlows · 2026-03-30T01:35:24 1774834524

I just run them in separate terminals. The only real gap was that I couldn't tell the robot to open files in nvim when I wanted to look at them, the way it could in other IDEs, so I whipped up a quick skill (https://github.com/mkozlows/nvim-skill) to do that.