Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Achieving human-level performance in the ARC benchmark, as well as top human performance in GPQA, Codeforce, AIME, and Frontier Math strongly suggests the model can potentially solve any problem at the human level if it possesses essential knowledge about it.

The article notes, "o3 still fails on some very easy tasks". What explains these failures if o3 can solve "any problem" at the human level? Do these failed cases require some essential knowledge that has eluded the massive OpenAI training set?



Great point. I'd love to see what these easy tasks are and would be happy to revise my hypothesis accordingly. o3's intelligence is unlikely to be a strict superset of human intelligence. It is certainly superior to humans in some respects and probably inferior in others. Whether it's sufficiently generally intelligent would be both a matter of definition and empirical fact.


Chollet has a few examples here:

https://x.com/fchollet/status/1870172872641261979

https://x.com/fchollet/status/1870173137234727219

I would definitely consider them legitimately easy for humans.


Thanks! I added some comments on this at the bottom of the post above.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: