Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How long has SimpleBench been posted? Out of the first 6 questions at https://simple-bench.com/try-yourself, o1-pro got 5/6 right.

It was interesting to see how it failed on question 6: https://chatgpt.com/c/6765e70e-44b0-800b-97bd-928919f04fbe

Apparently LLMs do not consider global thermonuclear war to be all that big a deal, for better or worse.



Don't worry, I also got that wrong :) I thought her affair would be the biggest problem for John.


John was an ex, not her partner. Tricky.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: