Hacker Newsnew | past | comments | ask | show | jobs | submit | notnullorvoid's commentslogin

It's much more likely SpaceX will continue building more ground data centers and using their sat relays to make global connection faster than ground connections can allow.

Quantized Gemma 4 26B is as smart or better than GPT 5 in most of my testing. Granted GPT 5 is nearly a year old at this point, but I can run Gemma 4 on a ~6 year old consumer GPU (RTX 3090) and get 140 t/s.

It can be both.

The value should already be dropping with this news. If this is happening now, it seems likely more will come.

It's a stupid strategy that will put the rest of the world ahead of the US on AI. Anthropic's value will suffer for it.

This sounds like classic "you're using it wrong", if they had said it was done in smaller tasks you would very likely have people here saying that was wrong too.

Maybe a flaw in the labeling, but not the core methodology.

Verbatim code snippets like this imply the model is overfitting to it's training data.


While I probably wouldn't classify it as cheating, it is an even bigger signal of concern for model quality.

Cheating by breaking the rules at least implies some learned patterns.

Repeating training data verbatim for narrow cases like this implies that the model is overfitting.


If we're evaluating a person, rote recall is not necessarily cheating. It's expected, but then you'd expect them to apply that rote-memorized information in a novel way later on and prove they understand how they applied their priors to the new situation.

Models don't actually reason in the same sense, so recalling rote from their training data is "cheating" in the sense that the training data cheated, not the model. So many of those benches have snaked their way into training data to make them less useful benchmarks. That, I think, is going to be a long-term difficulty in quantitatively assessing model quality and "intelligence." So it is cheating, in a sense of what we expect from the models and training data, but not in a human sense.


If it was truely an arm's race to AGI they would've stopped relying on the data/param scaling law BS ages ago.

The way I see it the benefit of benchmark isn't to take Simon's results at face value. It's a template for your own benchmarks that are easy to visually evaluate.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: