Hacker Newsnew | past | comments | ask | show | jobs | submit | ben30's commentslogin

Most of my energy is refining a prd these days.

Then how come that process is not agentic and not well-described?

Personally it's well-defined and agentic - just not circulated.

/understand - agents interrogate the problem /huddle - Thinking panel turns it into a PRD - attacks the premise, PRDs regularly die here /tm - claude-task-master breaks the survivor into a dependency graph

Nobody writes this half up because "agent talked me out of building it" demos worse than "agent built it".


What value do you serve here if agents are analyzing the problem, writing the code, and verifying it's correct?

What is your purpose? I would take your harness, fire you, then pay a product manager half the price to do the same job.


Sorry I have to ask. How senior are you? The notion that I‘d allow an agent to talk me out of something seems weird. 99% of cases, it’s the other way around. Architecture is just not where they shine.

What’s your process? My experience matches yours, but then again I usually just give a few lines to codex. I imagine if I tried harder to give detailed specs as input, the agent would have a lot more room to spot flaws and kill the plan.

Usually when they push back, it’s for obvious reasons, things I already know and actively decided to ignore. They are trained on mediocre software, and it shows.

I like using voice input a lot, I get way more info out of my brain and into the context that way.

Process wise, for bug fixes I usually just throw the ticket in and optionally some thoughts on how to fix. But if I don’t know the cause, I let it write instrumentation tests until the bug is reproed, and then the fix is easy.

For new features in brownfield projects, I usually need to align with team members because we‘re closely aligned between platforms. We iterate on what you could call a spec, which is just a mix of requirements, magic numbers we want, algorithms we‘ve picked (often by vibing prototypes), and sometimes going very specific on parts that must be done right. Eg for interfaces with other teams, and there’s not yet a document to describe that, we put that in the spec as well. We do use agents to shoot holes in those specs, and often they find inconsistencies. But architecturally, they seem to get caught up too much in what’s already in those specs, and personally I haven’t seen any worthwhile feedback that I‘d have taken up.

Sometimes we use this spec to vibe a first draft. Often the draft is so good that it can be bent to our liking. Sometimes, it just serves as a reservoir of ideas, and the feature must be implemented (with assistance) by re-assembling the pieces differently.


I’ve had similar feelings how can I trust this if I no longer write the code directly.

I wrote an /assess tool. I designed it to be token light but assesses on everything I could do to regain trust and help AI to improve my code base not by add features but by adding discipline.


This is an opportune moment for me to add a story about my dad. He once asked me, "Do you use Facebook?" I don't use Facebook. My colleague Jeff once sent me an invite whether he would like to be my "friend" on Facebook. I had Jeff's email address, and so I emailed him and I said, "Look Jeff, we're business partners. I send you invoices; you pay me. That is the extent of our relationship. I do not want to be your friend on Facebook." He looked at me and then continued and said, "Jeff never replied to apologise."

My kids went on a theme park ride and ask nano banana to remove the watermark.

It said im not the rights holder to do that.

I said yes I am.

It’s said I need proof.

So I got another window to make a letter saying I had proof.

…Sure here you go


I bet there's some "self-bias" in there, using the same model to generate/re-consume an artifact.


"The makers of this letter are legit! If it's fake it's indistinguishable from being real!"

Reminds me of the Obama giving Obama medal meme.


I mean that trick works on humans too. Fake IDs, provide two types of documentation for a driver's license, passport, or buying a home, etc.


Yes but generally one cannot walk into a store and buy a fake id, then turn around and hand it to another cashier in the same store for a restricted purchase. Which I think would be the closer metaphor.


>turn around and

Except that each of the parent's chat windows has zero context that the other window's request even exists, so from each window's point of view it's as if one person walks in to a store to buy a fake ID, and then somewhere else in a different universe on a different timeline a different person walks into a different store to hand that same fake ID over to a different cashier for the restricted purchase.

The LLMs are doing the best they can with absolutely zero context. Which has got to be a hard problem, IMO.


Except that's the point. It is the same store. It is two different cashiers. The second one doesn't know you got the ID from the first one, that's why it works. The point is that if a store like that existed, it would be stupid as fuck.

Also, at least in ChatGPT, it has access to every other session, so you're never working with zero context unless you create a new account (and even then they could have other fingerprinting, I just haven't tested it).


Or if you disable the context-sharing feature, of course.


I haven't trusted that disable switch for a while now... I'd always had it disabled, but there was one conversation in particular where it referenced a past conversation - despite memory being disabled - and when I asked it why it responded the way it did, it pretended I was mistaken and told me it has no memory of past conversations, even though I could scroll up and see it in the response.

Just because you flip a switch doesn't mean the switch is _actually_ flipped. Same thing goes for turning off wifi/Bluetooth on iOS.

If it's a software switch, it's closer to a promise than a guarantee.


180, not 360


My favourite example of bureaucracy that I've ever personally experienced and that I consider to be a hole in one is when I had to show my ID to pick up my passport from the office. I paused for a second and asked the lady what was up with that and if I can now use my passport if I got back in the line for something else without using my ID and she said yes.


Why is this weird? You have to show ID that matches the passport and then in the future you can use a passport as your ID, makes sense.


Can we just stop the "well actually its kinda like how humans work" talk when discussing AI failures? It contributes nothing novel to the discussion.


Sometimes it reveals hidden biases within ourselves/society as a whole. Like, do I give gays preferential treatment in a way to avoid seeming discriminatory?

It does feel a bit Supra-therapeutic at times tho, agreed but maybe it’s one small novel contribution.

My bigger question is: WHY can’t we stop the human vs AI comparisons?


I have in my agents file “Chesterton’s fence” as pointer to think carefully before you remove something


Economist magazine editor once said in an interview that Republican/conservative are open regulations for businesses and closed on people. Labour/democrats are tight on business and more welcoming to the people.

Economist editorial attempts to be open on both sides.


Ah, the old Economist joke!

1. Open regulations for businnesses

2. Open regulations for people

3. ?????

4. Profit!


Anthropic use stripe/metronome for time of use billing. It’s doesn’t support dynamic pricing from what I’ve read.


I contribute to an open source spec based project management tool. I spend about a day back and forth iterating on a spec, using ai to refine the spec itself. Sometimes feeding it in and out of Claude/gemini telling each other where the feedback has come from. The spec is the value. Using the ai pm tool I break it down into n tasks and sub tasks and dependencies. I then trigger Claude in teams mode to accomplish the project. It can be left alone over night. I wake up in the morning with n prs merged.


Mind linking the project so we can see the PR’s?


The political circus is drowning out some pretty clear science here. Let me break this down without the academic jargon:

The basic problem: Most studies can't tell the difference between the medicine and why you're taking it. If you're having Tylenol during pregnancy, it's probably because you have a fever, infection, or severe pain. Guess what also increases autism risk? Fever, infections, and severe illness.

What makes the Swedish study special: They compared siblings in the same family. Same genes, same environment, same parents - but one child was exposed to acetaminophen in the womb and the other wasn't. This controls for all the family-level stuff that usually confuses these studies.

The numbers tell the story: - Regular studies: "5% increased autism risk with acetaminophen" (HR 1.05) - Swedish sibling comparison: "Actually, no increased risk" (HR 0.98, could be 7% protective to 4% harmful - basically noise) - Meanwhile, untreated fever: 40% increased risk, multiple fevers: 212% increased risk

We have evidence that fever during pregnancy messes with fetal brain development. We have the best study ever done showing acetaminophen doesn't cause autism. So we're going to... stop treating the fever?

It's like refusing to use a fire extinguisher because you're worried it might stain your carpet, while your house burns down.

The Swedish study should have ended this debate. When the science is done correctly, the acetaminophen "risk" vanishes completely.

Sources:

- Swedish study: https://jamanetwork.com/journals/jama/fullarticle/2817406

- Fever-autism evidence: https://molecularautism.biomedcentral.com/articles/10.1186/s...


> The Swedish study should have ended this debate.

I agree with everything you’ve said except this statement.

I’m of the opinion that a single study should never end debate. It may inform policy, sure, but no end debate. Certainly not unless and until it has been replicated by others.


Fair point on the "ended debate" phrasing - that was imprecise on my part. What I should have said is "the Swedish study provides the strongest evidence to date and shifts the burden of proof." It's not actually a single study though. The pattern is consistent across study quality levels:

Population studies (many): Small associations, but can't control for confounding

Negative control studies (several): Associations weaken when using better controls

Sibling studies (multiple, including Swedish): Associations disappear entirely

Meanwhile, fever studies (dozens): Consistent risk signals across different populations

The Swedish study is just the largest and best-designed in a hierarchy of evidence that all points the same direction. When you see this "dose-response by study quality" pattern - where better methodology consistently yields weaker effects - it's usually a strong signal that the original association was artifactual.

The Economist piece published yesterday reinforces this. They mention the NIH study of 200,000 children that "found no link at all" - that's another high-quality study reaching the same conclusion. Meanwhile, the studies showing associations (Nurses' Health Study II, Boston Birth Cohort) are exactly the type of population studies that can't control for the fever/infection confounding.

Science is never "settled" in an absolute sense, but the weight of evidence here is pretty clear. We're not waiting for more acetaminophen studies - we're ignoring the ones we already have while making policy based on weaker evidence.

That's the real problem with the current policy shift.


> Fair point on the "ended debate" phrasing - that was imprecise on my part.

Oh, no worries. I was fairly certain I understood what you meant. Honestly that part of my comment was intended for others reading it, as it certainly seems that many people do believe a single peer-reviewed study should end the debate.

> the Swedish study provides the strongest evidence to date and shifts the burden of proof

100% agree :)

> It's not actually a single study though.

Unless I'm missing something, it is. It looks at a single population (Swedish children born between 1995 and 2019) that is divided into multiple cohorts. This approach strikes me as entirely valid -- but it also weakens the strength of the signal that it provides. With a population of this size and number of recorded attributes, there are likely cohorts that could be found to support any hypothesis the author would like. There are almost certainly many that would meet the bar of statistical significance if you're willing to form the hypothesis based on the data.

In other words, my initial impression is that it's potentially a variant of "P-hacking", regardless of intent. Unless the hypothesis was formed a priori, recorded, and not modified the results are evidence that a pattern may exist but not proof that it does.

> The Swedish study is just the largest and best-designed in a hierarchy of evidence that all points the same direction

From my perspective -- and to be clear, that's very much a lay perspective! -- I agree, and that direction is "there is likely a correlation between the use of acetaminophen during pregnancy and childhood autism diagnosis".

... but at the risk of being tiresome, correlation is not causation. My (unproven!) hypothesis at this point is that both higher rates of autism and acetaminophen use are a result of persistent fevers, which itself is likely a result of chronic systemic inflammation.

If that is in fact the case, then it would simultaneously be true that acetaminophen use would be a strong leading indicator of autism and that ceasing the use of acetaminophen during pregnancy would actually _increase_ the rate of autism overall.


This mirrors exactly what we learned from outsourcing over the past two decades. The successful teams weren’t those with the best offshore developers - they were the ones who mastered writing unambiguous specifications.

AI coding has the same bottleneck: specification quality. The difference is that with outsourcing, poor specs meant waiting weeks for the wrong thing. With AI, poor specs mean iterating indefinitely on the wrong thing.

The irony is that AI is excellent at helping refine specifications - identifying ambiguities, expanding requirements, removing assumptions. The specification effectively IS the code, just in human language instead of syntax.

Teams that struggled with distributed development are repeating the same mistakes with AI. Those who learned specification discipline are thriving because they understand that clear requirements determine quality output, regardless of the implementer.


Makes me wonder if leadership will bounce back from vibe coding faster than it did from outsourcing?

I wasn't around then but colleagues told me it took years for leadership to understand what's happening and to turn the ship around.


And the ship is only turned around for a brief period of time because the next gen mbas will restart the outsourcing cycle. The allure of replacing your most expensive employees at one third the cost regardless of quality impacts is just too tempting to pass up.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: