Hacker Newsnew | past | comments | ask | show | jobs | submit | heikkilevanto's commentslogin

I use SqLite for a small hobby project, fine for that. Wanted to read the article to see why I should not, but it attacked me with a "subscribe" popup, so I stopped there. The comments here seem to be based on daydreaming on scaling to a lot of users who need 24/7 uptime, which is not always the case.

I guess EU will have to retaliate, and forbid any US based routers, for exactly the same kind of reasons


> Saying you are not your work is wishful thinking. Try giving it up and check in on how much of you is still the same.

I retired a few years ago, and I believe and insist that I am very much the same person.

To see a person only as what they do at work seems awfully limiting. Even when I was working, I was also a sailor, musician, woodworker, home brewer, cat person, chess player, leather guy, and a good number of other things. And yes, even after retiring, I am still a computer guy. I even like hobby coding projects more than I did.


Well said. I'm nearing retirement age and planning what I'll do, and yes, setting up my hobby room with computers and whatnot. And, as you, I've also been many other things, including some on your list, and more.


And before that, COBOL was supposed to allow computer users to write in almost plain English without even knowing the machine instruction set.

It did change the programming landscape, but there was still a huge need for this new kind of programmers.


I like their privacy policy


> I find it interesting that a substantial number of people seem to think it's wrong or unethical to cold-email someone about a potential recruitment or business opportunity if they post their email in a public place

I find it interesting that some fucking spammers think that just because they found out my email somewhere, they should be allowed to waste my time and resources for their shit.

That is explicitly illegal here in EU. Unless I have clearly given you my consent, you are not allowed to spam me. Is informed consent really such a difficult concept to understand?


> Not everything is a state secret.

No, but almost everything is a potential DDOS. And slight modifications to emails, documents, and calendars can cause a lot of havoc that may be hard to detect.


Well, if you have a perfect evaluation function, you don't need to search. And if you can do a perfect search to the end, you don't an evaluation function. Un(?)fortunately none of these extremes seems reasonable for a game like chess (and even less for go). So most software use both search and evaluation. And a whole lot of optimizing and other tricks. With impressive results.


I just bought Kampot peppers from https://www.unclespepper.com/ which is in Germany, the name notwithstanding. And yes, I paid with my Danish Visa card. No problems except that I had to adjust my ad blocker once.


If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time. A traditional compiler will produce a program that behaves the same way, given the same source and options. Some even go out of their way to guarantee they produce the same binary output, which is a good thing for security and package management. That is why we don't need to store the compiled binaries in the version control system.

Until LLMS start to get there, we still need to save the source code they produce, and review and verify that it does what it says on the label, and not in a totally stupid way. I think we have a long way to go!


> If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time.

There’s a related issue that gives me deep concern: if LLMs are the new programming languages we don’t even own the compilers. They can be taken from us at any time.

New models come out constantly and over time companies will phase out older ones. These newer models will be better, sure, but their outputs will be different. And who knows what edge cases we’ll run into when being forced to upgrade models?

(and that’s putting aside what an enormous step back it would be to rent a compiler rather than own one for free)


> New models come out constantly and over time companies will phase out older ones. These newer models will be better, sure, but their outputs will be different.

IIUC, same model with same seed and other parameters is not guaranteed to produce the same output.

If anyone is imagining a future where your "source" git repo is just a bunch of highly detailed prompt files and "compilation" just needs an extra LLM code generator, they are signing up for disappointment.


>IIUC, same model with same seed and other parameters is not guaranteed to produce the same output.

Models are so large that random bit flips make such guarantees impossible with current computing technology:

https://aclanthology.org/2025.emnlp-main.528.pdf


Presumably, open models will work almost, but not quite, as well and you can store those on your local drive and spin them up in rented GPUs.


Greedy decoding gives you that guarantee (determinism). But I think you'll find it to be unhelpful. The output will still be wrong the same % of the time (slightly more, in fact) in equally inexplicable ways. What you don't like is the black box unverifiable aspect, which is independent of determinism.


If you’re using a model from a provider (not one that you’re hosting locally), greedy decoding via temperature = 0 does not guarantee determinism. A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]

[1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in...


What people don’t like is that the input-output relation of LLMs is difficult, if not impossible, to reason about. While determinism isn’t the only factor here (you can have a fully deterministic system that is still unpredictable in practical terms), it is still a factor.


The question is: if we keep the same context and model, and the same LLM configuration (quantization etc.), does it provide the same output at same prompt?

If the answer is no, then we cannot be sure to use it as a high-level language. The whole purpose of a language is providing useful, concise constructs to avoid something not being specified (undefined behavior).

If we can't guarantee that the behavior of the language is going to be the same, it is no better than prompting someone some requirements and not checking what they are doing until the date of delivery.


Mario Zechner has a very interesting article where he deals with this problem (https://mariozechner.at/posts/2025-06-02-prompts-are-code/#t...). He's exploring how structured, sequential prompts can achieve repeatable results from LLMs, which you still have to verify. I'm experimenting with the same, though I'm just getting started. The idea I sense here is that perhaps a much tighter process of guiding the LLM, with current models, can get you repeatable and reliable results. I wonder if this is the way things are headed.


> I want to see some assurance we get the same results every time

Genuine question, but why not set the temperature to 0? I do this for non-code related inference when I want the same response to a prompt each time.


A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]

[1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in...


Thank you for this, this was a really interesting read about batch invariance, something I didn't even know about.


This still doesn't help when you update your compiler to use a newer model


Anyone doing benchmarks with managed runtimes, or serverless, knows it isn't quite true.

Which is exactly one of the AOT only, no GC, crowds use as example why theirs is better.


Reproducible builds exist. AOT/JIT and GC are just not very relevant to this issue, not sure why you brought them up.


Because they are dynamic compilers!


But there is functional equivalence. While I don't want to downplay the importance of performance, we're talking about something categorically different when comparing LLMs to compilers.


Not when those LLMs are tied to agents, replacing what would be classical programming.

Using low code platforms with AI based automations, like most iPaaS are now doing.

If the agent is able to retrieve the required data from a JSON file, fill an email with the proper subject and body, sending it to another SaaS application, it is one less integration middleware that was required to be written.

For all practical business point of view it is an application.


Even those are way more predictable than LLMs, given the same input. But more importantly, LLMs aren’t stateless across executions, which is a huge no-no.


> But more importantly, LLMs aren’t stateless across executions, which is a huge no-no.

They are, actually. A "fresh chat" with an LLM is non-deterministic but also stateless. Of course agentic workflows add memory, possibly RAG etc. but that memory is stored somewhere in plain English; you can just go and look at it. It may not be stateless but the state is fully known.


Using the managed runtime analogy, what you are saying is that, if I wanted to benchmark LLMs like I would do with runtimes, I would need to take the delta between versions, plus that between whatever memory they may have. I don’t see how that helps with reproducibility.

Perhaps more importantly, how would I quantify such “memory”? In other words, how could I verify that two memory inputs are the same, and how could I formalize the entirety of such inputs with the same outputs?


Are you certain to predict the JIT generated machine code given the JVM bytecode?

Without taking anything else into account that the JIT uses on its decision tree?


For a single execution, to a certain extent, yes.

But that’s not the point I’m trying to make here. JIT compilers are vastly more predictable than LLMs. I can take any two JVMs from any two vendors, and over several versions and years, I’m confident that they will produce the same outputs given the same inputs, to a certain degree, where the input is not only code but GC, libraries, etc.

I cannot do the same with two versions of the same LLM offering from a single vendor, that had been released one year apart.


Good luck mapping OpenJDK with Azul's cloud JIT, in generated machine code.


The output being the actual program output, not the byte code. No one is arguing that in the scope of LLMs.


Enough so that I've never had a runtime issue because the compiler did something odd once, and correct thr next time. At least in c#. If Java is doing that, then stop using it...

If the compiler had an issue like LLMs do, the half my builds would be broken, running the same source.


> If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time.

Give a spec to a designer or developer. Do you get the same result every time?

I’m going to guess no. The results can vary wildly depending on the person.

The code generated by LLMs will still be deterministic. What is different is the product team tools to create that product.

At a high level, does using LLMs to do all or most of the coding ultimately help the business?


This comparison holds up to me only in the long standing debate "LLMs as the new engineer", not "LLMs as a new programming language" (like here).

I think there are important distinctions there, predictably one of them.


Even as a SSWE I do often wonder if I am but a high-level language.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: