Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We need to start making benchmarks in memory & continued processing over a task over multiple days, handoffs, etc (ie. 'agentic' behavior). Not sure how possible this is.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: