Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

fun! the benchmarks are so interesting because real world use is so variable. sometimes 4o will nail a pretty difficult problem, other times o1 pro mode will fail 10 times on what i would think is a pretty easy programming problem and i waste more time trying to do it with ai


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: