Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There should be a benchmark that tells the AI it's previous answer was wrong and test the number of times it either corrects itself or incorrectly capitulates, since it seems easy to trip them up when they are in fact right.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: