More

notnullorvoid · 2026-06-16T20:00:40 1781640040

It's much more likely SpaceX will continue building more ground data centers and using their sat relays to make global connection faster than ground connections can allow.

notnullorvoid · 2026-06-16T17:02:31 1781629351

Quantized Gemma 4 26B is as smart or better than GPT 5 in most of my testing. Granted GPT 5 is nearly a year old at this point, but I can run Gemma 4 on a ~6 year old consumer GPU (RTX 3090) and get 140 t/s.

notnullorvoid · 2026-06-13T14:06:01 1781359561

It can be both.

notnullorvoid · 2026-06-13T13:52:02 1781358722

The value should already be dropping with this news. If this is happening now, it seems likely more will come.

notnullorvoid · 2026-06-13T12:26:00 1781353560

It's a stupid strategy that will put the rest of the world ahead of the US on AI. Anthropic's value will suffer for it.

notnullorvoid · 2026-06-12T01:15:02 1781226902

This sounds like classic "you're using it wrong", if they had said it was done in smaller tasks you would very likely have people here saying that was wrong too.

notnullorvoid · 2026-06-12T01:12:00 1781226720

Maybe a flaw in the labeling, but not the core methodology.

Verbatim code snippets like this imply the model is overfitting to it's training data.

notnullorvoid · 2026-06-12T01:04:05 1781226245

While I probably wouldn't classify it as cheating, it is an even bigger signal of concern for model quality.

Cheating by breaking the rules at least implies some learned patterns.

Repeating training data verbatim for narrow cases like this implies that the model is overfitting.

Spartan-S63 · 2026-06-12T18:03:07 1781287387

If we're evaluating a person, rote recall is not necessarily cheating. It's expected, but then you'd expect them to apply that rote-memorized information in a novel way later on and prove they understand how they applied their priors to the new situation.

Models don't actually reason in the same sense, so recalling rote from their training data is "cheating" in the sense that the training data cheated, not the model. So many of those benches have snaked their way into training data to make them less useful benchmarks. That, I think, is going to be a long-term difficulty in quantitatively assessing model quality and "intelligence." So it is cheating, in a sense of what we expect from the models and training data, but not in a human sense.

notnullorvoid · 2026-06-09T20:02:14 1781035334

If it was truely an arm's race to AGI they would've stopped relying on the data/param scaling law BS ages ago.

notnullorvoid · 2026-06-09T19:49:57 1781034597

The way I see it the benefit of benchmark isn't to take Simon's results at face value. It's a template for your own benchmarks that are easy to visually evaluate.