Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
PunchTornado
10 months ago
|
parent
|
context
|
favorite
| on:
Grok 4 Launch [video]
what? nobody looks at those benchmarks, you use whatever works for your task, in most cases either gemini or claude. those benchmarks don't mean anything as models overfit on them.
esafak
10 months ago
[–]
Come on, the benchmarks do mean something, even if companies overfit them. Models are indisputably improving together with their benchmark scores.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: