We need to start making benchmarks in memory & continued processing over a task ...

		whimsicalism on Dec 20, 2024 \| parent \| context \| favorite \| on: OpenAI O3 breakthrough high score on ARC-AGI-PUB We need to start making benchmarks in memory & continued processing over a task over multiple days, handoffs, etc (ie. 'agentic' behavior). Not sure how possible this is.