Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

None whatsoever.

It's a "let's find a task humans are decent at, but modern AIs are still very bad at" kind of adversarial benchmark.

The exact coverage of this one is: spatial reasoning across multiple turns, agentic explore/exploit with rule inference and preplanning. Directly targeted against the current generation of LLMs.

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: