Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
whimsicalism
on Dec 20, 2024
|
parent
|
context
|
favorite
| on:
OpenAI O3 breakthrough high score on ARC-AGI-PUB
We need to start making benchmarks in memory & continued processing over a task over multiple days, handoffs, etc (ie. 'agentic' behavior). Not sure how possible this is.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: