I have the opposite experience: random HN/Reddit comments saying “this sucks” or “whoa this is a huge improvement” are the only benchmark that means anything. Standard benchmarks are all gamed and don’t capture the complexity of the real world.
Total gimmick. I guess we're "making progress", but this is will never lead to any useful application other than "Yes, you're absulotely right" bots. What's needed for real applications is 10000× the input token context and 10× the output token speed, so we're off by a factor of ... 100,000×?
Correct, also with the context growing, the conversations cannot continue at the initial speed either. Gimmick or not, this is very sci-fi compared to 10-20 years ago.
It's fast. It's cheap compared to employees. It's really the latter that people are upset about.
As for good. Well, how much software is really good? A lot of it is sewn together APIs and electron-like runtimes and 5,000 dependencies someone else wrote. Not exactly hand-crafted and artisanal.
I'm sure everyone here's projects are the exception, but engineering is always about meeting the design requirements. Either it does or it doesn't.
Have you ever programmed with AI? It needs a lot of hand holding for even simple things sometimes. Forgets basic input, does all kinds of brain dead stuff it should know not to do.
Both the curl and the SQLite project have been overburdened by AI bug reports.
Unless the Google engineers take great care to review each potential bug for validity the same fate might apply here. There have been a lot of news regarding open source projects being stuffed to the brim with low effort and high cost merge requests or issues.
You just don't see all the work that is caused unless you have to deal with the fallout...
This project has nothing to do with bug reports... it's an opt-in tool for reviewing proposed changes that kernel developers can decide to use (if they find it useful).
reply