More

amoss · 2026-04-14T06:31:03 1776148263

What is the program?

There are two different answers to this question, and which one is "correct" depends entirely on the context of who is asking it.

1. It's the code that is specific to this program that sits above the run-time layer (internal view, that most programmers would take).

2. It's the code in the binary that is executed (external view, that most users would take).

The key question does not seem to be "was the proof correct", rather "did the proof cover everything in the program". The answer depends on whether you are looking at it from the perspective of a programmer, or a user. Given the overly strong framing that the article is responding to - highlighting the difference in this way does seem to be useful. The title is correct from the perspective that most users would take.

danparsonson · 2026-04-14T07:15:48 1776150948

Yes but, without wishing to be snarky, did you read the article? There is no program as such, in either sense - the announcement from Lean only mentions "a C compression library" (zlib). Not only that, but since we're talking about formal verification, a programmer would likely understand that that is about proving a bounded, specific codebase at source code level, and not operating on a binary along with its associated dependencies (again caveat my limited understanding of these things).

My feeling is that if you told the average non-technical user that some person/organisation had produced a formally verified version of a C compression library, you would likely get a blank look, so I think it's reasonable to assume that both Lean's intended audience, and the audience of the blog post linked here, correspond with number 1. in your list.

amoss · 2026-04-14T14:42:19 1776177739

The article describes fuzzing the library, this execution requires a program to be compiled. Typically fuzzing involves a minimal harness around the payload (a single call into the library in this case). There is clearly a bug in this program, and it does not exist in the minimal harness. It must be in the library code, which was covered by the proof.

The bounded, specific codebase that you refer to is typically the library *and all of its dependencies*, which in this case includes the Lean runtime. This is why formal verification is difficult: the proof chain needs to extend all the way down to the foundations. In this case the original gushing claim that everything was verified is incorrect and premature. The article seems like a good exposition of why.

danparsonson · 2026-04-14T15:46:11 1776181571

Thank you, I understand what fuzzing is; that test harness was presumably provided either by the blog post author or generated by Claude somehow, and therefore would not have been part of the proven code, nor part of the original claim by the Lean devs. That's what I meant by saying there is no program as such.

> The bounded, specific codebase that you refer to is typically the library and all of its dependencies, which in this case includes the Lean runtime.

How does that work? I thought the main idea is to write code in the Lean language which has some specific shape conducive to mathematical analysis, along with mathematical proofs that operate on that code. How then does a system like this handle a third party dependency? I've searched around and I can't find any information about how it works. I assumed that the boundary of the proof was the source code - surely they can't also be proving things like, say, DirectX?

> This is why formal verification is difficult: the proof chain needs to extend all the way down to the foundations.

The difficulty is not in explosions of computational complexity due to problems of incompleteness, decidability, the halting problem, those kinds of things? As I said this is not something I know much about but it's surprising to me if 'analysing all the code' is really the difficult bit.

amoss · 2026-04-14T17:41:54 1776188514

There are standard convex assumptions to handle incompleteness, decidability etc, i.e. the results are an over-approximation that terminates. Picking a approximation that is precise enough in the properties that you care about is part of the challenge, but it is an in-band problem. There are no hard edges between the theory and reality.

As with most engineering problems the out-of-band issues tend to be the hardest to solve. Models of the part underneath the interesting part need to be complete/accurate enough to make the results useful. Compare it to crypto where people do not usually try to break the scheme - they try to break the specific implementation of the scheme because the weakest points will be at the interface between the theoretical construction and the actual concrete instantiation of the device that it will run on.

StilesCrisis · 2026-04-14T21:16:30 1776201390

I am pretty sure you could tell a teenager "there's a ZIP compression program that's scientifically proven to have no bugs" and they'd understand you. People don't have to be CS experts to understand that. (Technically it's Gzip but that's mostly irrelevant to understanding the claim here)

nathell · 2026-04-14T11:13:17 1776165197

Gentle reminder about this excerpt from HN Guidelines:

> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".

danparsonson · 2026-04-14T11:45:25 1776167125

Noted, thank you

amoss · 2026-04-14T06:24:54 1776147894

But is fair to state that the verification was *incomplete*, which is what the article does.

amoss · 2026-04-10T09:52:18 1775814738

I would use this tool. Ship it

amoss · 2026-04-10T09:48:44 1775814524

It's the old broken. Clearly it must be replaced with the new hotness.

amoss · 2026-04-02T09:39:22 1775122762

Cybertrucks. Electric semis. Full self driving. Battery city.

I'll leave up to the reader to put them on the appropriate list.

Someone · 2026-04-02T15:29:27 1775143767

I think Elon made loads of money from full self driving, without Tesla even delivering it once.

He personally might have made money on Cybertrucks, too.

ckemere · 2026-04-02T10:47:56 1775126876

Technically, the question was about Elon personally, eh (not Tesla).

amoss · 2026-04-01T20:58:37 1775077117

I've watched Patrick's videos for enough years that I know he is not, but I still wonder from time to time. His voice is incredibly flat and uniform, he always uses fake backgrounds and there is extremely high use of jump cuts in his edits.

literalAardvark · 2026-04-01T22:15:12 1775081712

The jump cuts have been there since almost the beginning.

The community noticed he rarely blinks and he ran with the gag and edited all of them out.

richbell · 2026-04-01T21:22:27 1775078547

I suspect that he at least uses AI for scripts. He tends to repeat the same thing worded slightly differently a few times.

amoss · 2026-03-25T07:47:49 1774424869

You may be surprised by how much easier it is to dump the framework/stack and just write it from scratch. I say this because I too work on compilers and have a crud app as a personal project. The first versions were a nightmare in various frameworks and since I switched to a C++ backend / vanilla .js frontend it has been incredibly easy to write.

zem · 2026-03-25T08:03:48 1774425828

interesting, how did you manage the db interactions in c++? and did you have a single executable for the app and the web server combined?

amoss · 2026-03-25T12:51:59 1774443119

no database for this project - the data model has a simple text representation so it gets serialized out to a folder/file layout on disk that goes into version control. single self-contained binary: contains the web/websockets server, backend logic, parser/serialization. there is a separate component in python that sits behind an internal network connection to handle an execution sandbox.

amoss · 2026-03-19T12:54:44 1773924884

The rate was 1% so this does not mean that "many" people are using LLMs despite claiming not to.

amoss · 2026-03-19T12:53:30 1773924810

This comment doesn't seem to fit the discussion at all?

The discussion is not about humans using LLLs to write papers. It is about humans who agreed not to use LLVM in reviewing papers, then did exactly that.

Cthulhu_ · 2026-03-19T13:56:36 1773928596

There's a lot of irony in a defensive comment being written based on misreading / inattentive reading of a post about reviewing papers (requiring attentive reading).

bjourne · 2026-03-19T14:03:32 1773929012

It might be that paper authors required others not to use LLMs for reviewing their work. Then, by the rule of reciprocity, they shouldn't use LLMs for reviewing others work. The article is unclear on whether this implied reciprocity rule was explicitly stated or not.

ameliaquining · 2026-03-19T19:59:14 1773950354

It was. More details here: https://icml.cc/Conferences/2026/LLM-Policy

In particular: "Any reviewer who is an author on a paper that requires Policy A must also be willing to follow Policy A."

bumby · 2026-03-19T15:50:44 1773935444

In addition to being a reviewer, they also submitted their own research to this journal. So it leads to the question: if they were willing to cheat on the side of review with less incentive, why wouldn’t they cheat on the side that provides more incentives?

(Meaning, your career doesn’t get boosted much for reviewing papers, but much more so for publishing papers)

amoss · 2026-03-15T11:07:42 1773572862

As the DJ is an interface to shuffle, and the author specifically wants to listen to unshuffled music the lack of intelligence may not be entirely in the AI.