Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the "semi-private" numbers here already measure that: https://arcprize.org/2024-results

For example, Claude 3.5 gets 14% in semi-private eval vs 21% in public eval. I remember reading an explanation of "semi-private" earlier but cannot find it now.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: