Hacker Newsnew | past | comments | ask | show | jobs | submit | sillyfluke's commentslogin

3 in 100 is a lot, yet I have never met anyone in person who disclosed that they are face-blind. Your flock needs to break the stigma and mention it more if possible I guess.

As every educated American knows, nothing good ever comes from asking, "What about the twinkie?"

https://www.youtube.com/watch?v=ZuwC3ESdwZA


>I'd still place a bet that the SOA models make _far_ less mistakes than humans.

Genuine question: your top coder seems to be producing the most error-free code from your perspective, has the deepest knowledge of the architecture and codebase, and is faster on the trigger than the others.

But your top coder has proven and verifiable dementia, where they will confidently assume the existence of apis and code that do not exist, mix up the purpose of others and forget other things, and you can't predict when and how they will introduce errors into the system or the severity of such errors.

Are you really comfortable letting this person with dementia generate most of your codebase in the airline and health industry?

I also hope you have an iron-clad agreement that prevents the model provider from doing silent updates because all your evidence of correctness you collected thus far goes out the window in that case.

Another genuine question:

You have witnessed a human coder and the AI you're using make the same important mistake. Assuming you do not have the time and resources to retrain, fine tume, and test your frontier model:

Who would you trust not to make the same mistake multiple times in the future after you have warned them that their job depends on it, the AI or the human?


Your top coder has guard rails in place to prevent him autonomously going free - right? This is how you should approach agentic development with LLMs. Like it or not, we are the final bastion, the gatekeepers. The hallucination thing I think is mostly overblown and from speaking to colleagues it seems to vary wildly depending on which model and harness you are using - always go for SOA. In the last 3 months I can count on one hand where it's done something wrong and that's primarily as I'm operating it with guard rails and giving it context.

>Your top coder has guard rails in place to prevent him autonomously going free - right?

The parent is implying they would prefer an AI when working in the airline and health industry because it makes less errors. Read the comment again.

They have not said, "Hey, I work in the airline and health industry and I'd love to use AI for a couple of the bullshit IT UIs we have as long as we can put guardrails on the AI to stay in its lane."

I asked a yes or no question. The guardrails you can put to mitigate errors are the same guardrails pre-AI for the humans (tests, regressions, reviews). If you were wary of employing a top lead engineer with verifiable dementia prior to AI for a mission critical system, logic implies you should think twice giving that much responsibility to an AI as well.

> The hallucination thing I think is mostly overblown

Can you predict when and how the SOTA model will hallucinate? Yes or no. Can you predict the severity impact of that error beforehand? Yes or no.

>from speaking to colleagues it seems to vary wildly depending on which model and harness you are using

You have partially answered my question it would seem.


> Can you predict when and how the SOTA model will hallucinate? Yes or no. Can you predict the severity impact of that error beforehand? Yes or no.

No, but the same can be said for your colleagues. You might call what the LLM does hallucinations, I'd call them mistakes. I think we have totally forgotten that humans make them all the time and are confidently wrong too.

Your original question, doesn't really get to the bottom of the point I'm trying to make, and I don't really feel it fairly represents the issue we are talking about here. They are not the same things.


This is such a tired, meaningless argument. I've never seen a human in 10 years of professional software engineering at a large company ever so confidently, consistently create and send out seemingly well-reasoned code that's as wrong as what SOTA models using CC or Codex do. If a human did this, they would be fired or perpetually remain a junior who no one wants to work with.

Also, if a human does this, you can replace them and get a human who will not do it. The default for an LLM is to generate plausible-looking text that may or may not be completely incoherent. That is not the default for a human. Again, if you find that your colleague consistently fabricates APIs, you can hire someone who isn't crazy instead, but you cannot do the same with LLMs.


If a human was hallucinating and polluting a codebase with errors, they would be fired and possibly treated for dementia. Even worse, an LLM is trained to produce plausible-looking results, so it's harder to detect the mistakes.

>No, but the same can be said for your colleagues.

That's absolutely false. My collegues don't routinely and confidently invent apis that are not there, or spectacularly and repeatedly misunderstand the purpose of certain functions or exhibit extreme forgetfullness. Especially when I've warned them. Hallucinations and confabulations in otherwise healthy individuals are mental disorders. When I ask them why they made an certain kind of error, I can expect to get a reasonable answer. No one has uttered the phrase "Bob hallucinated again while writing those tests" when the Bob in question is a human.


Well, your experience doesn't align with mine. I have been using, and in part of an organisation that is extensively using, Claude with Opus for everything for about 3 months now and I am not experiencing the problems you describe. We'll have to agree to disagree here.

That is fine. "Your experience may vary" is the crux of my argument amusingly. You can't have just realized that people are having different experiences using AI, or even that the same person has different experiences when they change domains or technical contexts. There's been lots of comments littered on this forum to that effect.

Calling hallucinations simply mistakes does not seem to me to be a healthy way to reason about LLMs. I can ask a collegue how well they can program in Ada and adjust my expectations on productivity and bug rates. I can't ask an LLM how well they can code in Ada (just a throwaway example), or even how much of Ada was in its training data. I have to actually spend money and spend time code reviewing before I can even formulate any expectations at all.


Not only have I never ran across a hallucination in the past ~6 months or so; the latest Opus models have gotten to the point where they can emit inline assembly that is _superior_ to what gcc or clang can generate from optimized cpp. Had it rewrite a hot simd loop that took it from ~10 flops/cyc to ~14 by shaving off broadcasts. I _could not_ get any compiler to do this, no matter which flags I tried to use. So I literally have no idea what these people are talking about when they claim that SOA models hallucinate constantly.

Last week, Opus gave me a decrement instead of an increment, on one particular line. Where I already had the decrement, but it was changing the width of the datatype everywhere.

And it took "convincing" that it had made a mistake.


But your top coder has proven and verifiable dementia

Dementia gets worse. AI gets better. Nothing matters except d/dt.


>a monoculture facilitates easier economic growth

The why of this is not explained by you nor the original comment referring to the lack multiculturism, it is simply asserted. That's why the original comment came off as nonsensical. More than a quarter Singapore's population are foreign workers, and they make up at least 40% of Singapore's entire workforce. Seeing as the workforce that is driving Singapore's economic growth is not a monoculture your claim needs a little tweeking I would think.

I'm not claiming this means Singapore is embracing multiculturalism in the same way I don't claim the UAE embraces multiculturalism due to similar foreign workers dynamics, but not putting a disclaimer involving these stats while talking about the benefits of monoculture and lauding a country for its realism is a ridiculous sidestepping of reality. Both Singapore and the UAE are extremely cosmopolitan.

>Multiculturalism is essentially pacification because obviously this is unpopular

Why would it be unpopular to the dominant monoculture to maintain the monoculture? Who is being pacified when a country embraces multiculturalism? Please explain.


Having a diverse population != multiculturalism in the way that we were discussing. Singapore promotes "racial harmony" where every "race" is given equal public holidays, equity, and inclusion in government etc. This is not wholly true, as the majority ethnic group is overrepresented in positions of power, and foreign workers making up (the bottom) 40% of the workforce does not change that. This extends beyond ethnic lines - the monoculture is not only Chinese > over the rest, but also the dampening of any freedom in expresssion, art, etc. - this is easily verifiable by 1. speaking to any Singaporean 2. looking at the education system and 3. looking at how much cultural impact Singapore has had compared to other "global" cities. You are also clearly misrepresenting my comment about pacification, and I do not appreciate it. I am clearly not referring to some conspiracy where people are disappeared. I am talking about a decades-long government campaign to supress alternative voices and _pacify_ (in the correct sense of the word) dissent.

>asserting that AI will botch software might hold more weight with people who have already forgotten how dogshit software was pre-AI.

You're responding to an assertion with an assertion. It has been empirically proven that SOTA models can create more dogshit software than pre-AI software. It is also trivially known that the user is unable to predict when and how the AI will introduce dogshit into the software. We literally had a study posted on this forum claiming models give more accurate answers if you're mean to them. This is the shit we're dealing with. Stuff you couldn't make up in a dystopian Douglas Adams novel.

>you can encode these things into prompts

Is this satire? SOTA models randomly disobey rules in prompts all the time.

When a dev drops a production db I can warn them. If they do it multiple times during their employment I can change their roles or fire them.

I can count the number of companies providing SOTA models with the fingers on my hands. Imagine having an employee pool of only 5 savant coders with dementia to choose from to hire to your company. That's it. Thats the entire applicant pool. You can only fire one of them by hiring one of the other four to replace them with. And you can't really fire them for dropping production dbs if you can't prevent the other ones from making the same mistake. This is the current AI-first hellscape as it stands.


The AI cartel's hope is that the market will stay irrational longer than the naysayers can stay solvent both financially and intellectually.

Putting it a different way, it won't matter if the firms who went too deep at the very beginning are fucked if the rational are forced to succumb to the market pressures created by the irrational and thus are reluctantly pushed to adopt AI-first workflows for appearance's sake in order to survive anyway. Because then everybody will be likewise fucked and completely dependent on AI, despite it being a subpar development paradigm with respect to robustness of the systems under development. History has taught us that it is adoption dynamics not capability that determines the winning paradigm or technology (Betamax vs VHS is one historical example. Javascript vs everything else is another one).

(We know it's a subpar development paradigm with respect to robustness because the entire coding agent paradigm turns the most knowledgeable programmer into a person "who doesn't know what they don't know" because development speed far exceeds their ability to reason about the codebase and the underlying SOTA models that they depend on to fix the bugs that the model itself has introduced are at best unreliable narrators with no objective evidence of correctness or deterministic behavior.)


In what context did you witness all these cases?

It's one thing to know these cases exist because they all have been reported in the news or having made a note of it through separate and unrelated word of mouth interactions, but one person having direct experience of all these cases is unusual for a civilian (ie non-medical or healthcare professional).


3 would be either direct or friends/relatives with experience and I got involved to help, other 3 would be through news and incidentally knowing some people.

> but one person having direct experience of all these cases is unusual for a civilian

Sure, still, indirect stories I have a lot more, just stopped at those 6


It's a bit weird when the article in question is predominantly about software development in a professional setting and the top comment is about how some people in thread are disregarding this context and opining unrelatedly about their unique solo development or personal project development experiences, to then respond to said comment by insistently going on about how AI is great for your personal projects, when people are unable to assess the value of your AI-assisted personal projects and whether they would concur with the high opinion you have of them. A turd with a CI pipeline is still a turd, I think we can all agree on that. IF someone said AI is great because they can now expand test coverage and build a CI pipeline for their todo app in rust, it wouldn't exactly be the proof you're looking for I don't think.

But I agree fully with your last paragraph, and said something similar in a comment elsewhere where I stated my tangible bar as being a Ladybird like browser built from scratch achieving Chrome parity in six months while doing continuous stable releases with coding agents in tow. Otherwise, as you said, the jury is still out.


I support this take especially since you added the "I don't care if nobody else uses what I make", but you should at least acknowledge what you're talking about is pretty unrelated to the article, as the author's entire context seems to be making something for other people to use and building it together with other people.

Since you said you want to make those things that you list, I assume none of these things have been built yet. If so, I would encourage you to consider how excited you will be to constantly maintain those things you build. But even if the maintainence cycle won't be as exciting, since you are the sole user you have the advantage of being able to proceed at a leisurely pace even while doing maintainence work.

In a professional setting, the dopamine hit of being able to build something quickly that works in an area that you have little to no knowledge in makes you more dependent on the AI in the maintaince cycle as you want to chase that dopamine high by maintaining the same development speed. This in turn leads to a bigger burnout crash after that peak dopamine hit. Maintainence is a phase of diminishing returns even without AI, but when your coding agents are introducing new bugs at record pace with their bugfixes with no new features to write home about you are in a special place in Hell.

I'm all for using AI to build ambitious projects. I have yet to see a person/company/organization continuously release huge software endeavours in a stable professional manner day in and day out with a coding agent harem in tow.

If something like the Ladybird browser, or any browser that is "built by scratch", achieved Chrome parity in six months and consistently maintained the same level of stability with continuous releases then I would see that as proof that this approach has become professionaly sustainable.

The reason people are getting away with so much using AI is because of the open secret in most enterprise engineering practices: the customer cares more about the response time for fixes they report than they do about overall or longterm product quality.


> I would encourage you to consider how excited you will be to constantly maintain those things you build.

Why should I consider that?

Its funny how the default with programming is that the piece of software exists forever. I've been learning to play the piano lately. The default with piano is that every piece is ephemeral. If I don't go out of my way to record something I play, after the notes have run out, the piece is gone forever. The same is true of cooking - except you can't record a meal at all. Once you eat it, its gone. Lots of art forms are like this - theatre. Dance. The circus. They're no less beautiful for being ephemeral.

Why do we assume software has to be maintained indefinitely? Why even think about that right now? Maybe I'll work on these projects for awhile, maintain them as long as I want, and then in a few years someone will make something way better and I'll use that instead? Would that make the effort I put in pointless? I don't think so. I think it would make programming more like playing the piano. How lovely.

> I'm all for using AI to build ambitious projects. I have yet to see a person/company/organization continuously release huge software endeavours in a stable professional manner day in and day out with a coding agent harem in tow.

Yes, I've burned through enough claude tokens now that I find myself agreeing with you. I wouldn't use an LLM to make and maintain google chrome. But I wasn't planning on doing that anyway. There are also a lot more options than (1) write everything yourself and (2) vibe code the whole thing.

LLMs are good at small-to-medium scope tasks right now. Fine. I'll use them - or not use them - with their limitations in mind.


>Why would I consider [maintainence]...Its funny how the default with programming is that the piece of software exists forever.

By all means don't consider it if you don't plan on using them for a considerable amount of time, but there's a lot of of distance between a decent amount of time and "forever". You listed a mini OS and a UI toolkit among your projects, I hope you can forgive me for assuming you were planning to use those things to build more things, which would in turn often entail improving and maintaining these building blocks while they are actively used.


Ah fair. When you say “maintenance”, my mind goes to handling pull requests and keeping on top of filed issues on GitHub. If the software has an audience of 1, a lot of that work goes away.

Adding the features I have a need for over time is the fun part as far as I’m concerned.


>I guess I thought this should be obvious

People in this thread are talking past and misunderstanding each other and making unrelated points.

The point of the response to the top level comment was questioning the conflict of interest in model providers creating separate revenue streams for themselves by selling a product that fixes problems their other product created, akin to OS providers selling anti-virus software back in the day.

Similarly, it should be obvious to you that a software engineer can trivially get into the mindset of writing more expoitable code by pretending the production code they're tasked with writing is hobby code or prototype code.

If profitable revenue streams with adverserial products are in place, no one should be surprised when model providers are disincentivised to improve the "garbage code quality, but hey it works!" nature of their most used code generators.

>And, LLMs are ALREADY trained negatively against writing buggy or exploitable code.

...it should also be obvious people in this forum have wildly different experiences with respect to the code quality the LLMs they use generate. I personally find it difficult to find anyone that argues that the LLMs they are using are consistently generating high-quality code across a vast codebase.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: