Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
nprateem
on Dec 20, 2024
|
parent
|
context
|
favorite
| on:
OpenAI O3 breakthrough high score on ARC-AGI-PUB
There should be a benchmark that tells the AI it's previous answer was wrong and test the number of times it either corrects itself or incorrectly capitulates, since it seems easy to trip them up when they are in fact right.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: