More

javawizard · 2026-04-14T20:53:41 1776200021

Love this.

How did you implement gates? Are they simply tasks Claude itself has to confirm it ran, or are they scripts that run to check that the thing in question actually happened, or do they spawn a separate AI agent to check that the thing happened, or what?

giancarlostoro · 2026-04-14T21:01:30 1776200490

Claude or whatever agent will get a message when it tries to close a task, which tells them which gates are not resolved yet, at which point, the agent will instinctively want to read the task. I did run into an issue where I forgot to add gates to a new project, so Claude did smoosh over by making a blanket gate, I have otherwise never had an issue when I defined what the gate is, Claude usually honors it. I havent worked on big updates recently, but I noticed other tools like rtk (Rust Token Killer) will add their own instructions to your claude's instructions.md file, so I think I need to craft one to tack on with sane instructions, including never closing tasks without having the user create gates for them first.

In a nutshell, a gate is a entry in the DB with arbitrary text, Claude is good about following whatever it is. Claude trying to close a task will force it to read it.

Life's gotten slightly busy, but you can see more on the repo. I've been debating giving it a better name, I feel like GuardRails implies security, when the goal is just to validate work slightly.

https://github.com/Giancarlos/GuardRails

skybrian · 2026-04-14T22:28:47 1776205727

It sounds like a gate is a prompt that shows up at the appropriate time, which works because LLM’s pay more attention to the last thing they read.

It seems like a lot of coding agent features work that way?

giancarlostoro · 2026-04-15T02:49:20 1776221360

I suppose, I mean the LLM is still reading it, the issue is, Beads gives the model a task, and then the model finishes, and never checks anything. I kept running into this repeatedly, and sometimes I'd go to compile the project after it said "hey I finished" it wouldn't compile at all, where if it would have just tried to build the project, it would have just worked.

0x457 · 2026-04-16T05:05:36 1776315936

From my understanding the way Gas Town uses beads is that it's not only "what to do" but also contains a workflow.

maleldil · 2026-04-14T21:38:10 1776202690

Who closes the gate? Is it Claude itself after it runs the verification? Who makes sure the verification did in fact run?

giancarlostoro · 2026-04-15T02:44:37 1776221077

I usually have Claude confirm with me but I've seen it close it if its a unit test that passed for example.

maleldil · 2026-04-15T08:43:43 1776242623

You can't trust it 100%. Sometimes it will just refuse to fix a compiler or lint warning (often saying "This was a pre-existing issue...") or write a trivial test that does nothing and always passes.

0x457 · 2026-04-16T05:08:00 1776316080

> writes code with a lot of warnings > compacts > "This was a pre-existing issue..."

I still take this over writing code myself though.

giancarlostoro · 2026-04-15T14:47:50 1776264470

Point being that there are multiple gates to one story, including human testing as one of them.

wyre · 2026-04-15T00:22:02 1776212522

I built something similar with verifiable gates tasks. The agent has a command to mark the task as done and it will run the bash script, if it passes the task closes, if it doesn’t it appends the failure information into the task description for the agents next attempt at the task.

javawizard · 2026-04-12T09:43:25 1775987005

The trouble with that argument, though, is that it works the other way as well: how do I, a random internet citizen, know that you're not doing the same thing for Anthropic with this comment?

(FWIW I have definitely noticed a cognitive decline with Claude / Opus 4.6 over the past month and a half or so, and unless I'm secretly working for them in my sleep, I'm definitely not an Anthropic employee.)

iLoveOncall · 2026-04-12T09:48:38 1775987318

Oh it's pretty clear to me that Anthropic employs the same tactics and uses bots on socials to push its products too. On Reddit a couple of months ago it was simply unbearable with all the "Claude Opus is going to take all the jobs".

You definitely shouldn't trust me, as we're way beyond the point where you can trust ANYTHING on the internet that has a timestamp later than 2021 or so (and even then, of course people were already lying).

Personally I use Claude models through Bedrock because I work for Amazon, and I haven't noticed any decline. Instead it's always been pretty shit, and what people describe now as the model getting lost of infinite loops of talking to itself happened since the very start for me.

felixgallo · 2026-04-12T13:03:27 1775999007

https://isitnerfed.org/

in short, it looks like nothing has been nerfed, but sentiment has definitely been negative. I suspect some of the openclaw users have been taking out their frustrations.

javawizard · 2026-04-12T13:41:02 1776001262

That's fascinating.

Any idea what their test harness looks like? My experience comes primarily from Claude Code; this makes me wonder if recent CC updates could be more to blame than Opus 4.6 itself.

javawizard · 2026-04-02T19:00:56 1775156456

> I know software quality has been going down in recent versions of macOS

Note that this particular problem has existed for well over a decade. It's atrocious, but let's not pretend it's anything new.

paxys · 2026-04-02T19:16:14 1775157374

The macbook notch has existed for a decade?

javawizard · 2026-04-02T20:00:51 1775160051

No, menu bar items being hidden when there are too many of them has happened for a decade.

The notch has just made menu bar space more scarce than it used to be.

data-ottawa · 2026-04-02T19:45:58 1775159158

If you opened an app like Xcode with a lot of menus options, it would extend beyond across the screen and cover up your menu bar icons.

If I open Xcode today on a 14" MacBook, two menu items extend past the notch, and they still hide your menu bar icons.

This has been the case for a long long time, and it's always been an obvious failure case.

simonh · 2026-04-02T19:31:32 1775158292

Menu bar icons overflowing. The notch just makes it a problem quicker, and in an exciting new way.

javawizard · 2026-03-29T14:21:01 1774794061

Yes, and: If I recall correctly, cloudflare is sinking all the extra traffic for him, so it doesn't actually impact him.

Last I heard it's a morally objectionable thing at this point rather than something that's having any practical impact.

(Which of course doesn't make it ok... I'm just a little less inclined to judge people that still use archive links when needed.)

javawizard · 2026-02-26T16:42:13 1772124133

I wonder if that was an automated HN edit?

Similarly to how titles that start with "how" usually have that word automatically removed.

some_furry · 2026-02-26T17:19:00 1772126340

Usually HN only auto-edits on first submission. If you go in and undo it manually as the submitter, you can force it to read how you intend.

meatmanek · 2026-02-26T20:01:35 1772136095

Maybe I'm only noticing the times when it messes things up, but it kinda seems like these auto-edits cause a lot of confusion that could be avoided if they were shown up-front to submitters, who would then have the option to undo them.

Or maybe judicious use of an LLM here could be helpful. Replace the auto-edits with a prompt? Ask an LLM to judge whether the auto-edited title still retains its original meaning? Run the old and new titles through an embedding model and make sure they still point in roughly the same direction?

pinkmuffinere · 2026-02-26T18:53:00 1772131980

oh interesting, TIL I can go edit my submission titles! That's useful, I've definitely submitted stuff and gotten a less-good title due to the automated fixes, so I'll have to pay attention to this next time

javawizard · 2026-02-09T17:51:38 1770659498

Oh now that would be a fun version 2 challenge: have all the clocks in one household synchronize such that they're all early by the same amount at any given time.

Easy enough for wifi enabled ones: a UDP broadcast to discover other clocks on the network, then sync how you will.

For non-wifi-enabled clocks, perhaps something like a CH572 would do the trick: a $0.20 RISC-V microcontroller with BLE support that all the clocks in the same vicinity could use to talk to each other.

You could really mess with your neighbors if they had the same clocks and you were within range...

seg_lol · 2026-02-09T18:41:52 1770662512

You don't already do this with the NTP servers under your control?

javawizard · 2026-02-09T19:00:02 1770663602

If I had any NTP servers under my control, I probably would :)

javawizard · 2026-02-02T13:23:22 1770038602

I used to work at a place that had the famous Antoine de Saint-Exupéry quote painted near the elevators where everyone would see it when they arrived for work:

  Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.

I miss those days.

bookofjoe · 2026-02-02T14:49:39 1770043779

Original French: "Il semble que la perfection soit atteinte non quand il n'y a plus rien à ajouter, mais quand il n'y a plus rien à retrancher".

rkomorn · 2026-02-02T14:51:16 1770043876

"Il semble" sure gives the quote a different tone to me.

javawizard · 2026-02-02T00:29:56 1769992196

The article and the submitted HN post both appear to have the same title to me: "Margin Call".

What are you talking about? Did one or the other have a different title when you wrote your comment?

caminante · 2026-02-02T01:36:47 1769996207

Yes, an admin changed the headline.

Original headline was misleading and editorialized.

javawizard · 2026-02-02T01:39:39 1769996379

Got it. Out of curiosity, what was it?

caminante · 2026-02-02T01:42:34 1769996554

From memory, "Apple has 76% margin in Services"

javawizard · 2026-02-02T04:38:53 1770007133

Ah, very fair. Without otherwise taking a position on the article, I can totally see how that's editorialized.

conorcleary · 2026-02-02T10:59:10 1770029950

Would prefer if dang/admins added subtitle or bracketed the altered title

caminante · 2026-02-02T12:20:02 1770034802

Usually, admins make a simple comment upon edit. But not really required given it's an explicit policy in HN guidelines.

javawizard · 2026-01-29T08:44:53 1769676293

Jet engines do not strike me as being inherently simpler than muscles, not by a long shot.

They're still the best way we know of going about the business of building a flying machine, for various reasons.

rightbyte · 2026-01-29T09:20:28 1769678428

Piston engines surely are more complex than jet engines though? Which replaced the "flapping engines".

readmodifywrite · 2026-01-29T13:48:46 1769694526

They are not. Turbine engines require much higher quality manufacturing and tolerances and operate at much higher speeds and pressures. There is more to it than the perceived number of moving parts.

javawizard · 2026-01-10T20:41:24 1768077684

> But the other way around is not possible due to the closed nature of GPT-5.

At risk of sounding glib: have you heard of distillation?

dust42 · 2026-01-10T22:27:24 1768084044

Distilling from a closed model like GPT-4 via API would be architecturally crippled.

You’re restricted to output logits only, with no access to attention patterns, intermediate activations, or layer-wise representations which are needed for proper knowledge transfer.

Without alignment of Q/K/V matrices or hidden state spaces the student model cannot learn the teacher model's reasoning inductive biases - only its surface behavior which will likely amplify hallucinations.

In contrast, open-weight teachers enable multi-level distillation: KL on logits + MSE on hidden states + attention matching.

Does that answer your question?