I use SqLite for a small hobby project, fine for that. Wanted to read the article to see why I should not, but it attacked me with a "subscribe" popup, so I stopped there. The comments here seem to be based on daydreaming on scaling to a lot of users who need 24/7 uptime, which is not always the case.
> Saying you are not your work is wishful thinking. Try giving it up and check in on how much of you is still the same.
I retired a few years ago, and I believe and insist that I am very much the same person.
To see a person only as what they do at work seems awfully limiting. Even when I was working, I was also a sailor, musician, woodworker, home brewer, cat person, chess player, leather guy, and a good number of other things. And yes, even after retiring, I am still a computer guy. I even like hobby coding projects more than I did.
Well said. I'm nearing retirement age and planning what I'll do, and yes, setting up my hobby room with computers and whatnot. And, as you, I've also been many other things, including some on your list, and more.
> I find it interesting that a substantial number of people seem to think it's wrong or unethical to cold-email someone about a potential recruitment or business opportunity if they post their email in a public place
I find it interesting that some fucking spammers think that just because they found out my email somewhere, they should be allowed to waste my time and resources for their shit.
That is explicitly illegal here in EU. Unless I have clearly given you my consent, you are not allowed to spam me. Is informed consent really such a difficult concept to understand?
No, but almost everything is a potential DDOS. And slight modifications to emails, documents, and calendars can cause a lot of havoc that may be hard to detect.
Well, if you have a perfect evaluation function, you don't need to search. And if you can do a perfect search to the end, you don't an evaluation function. Un(?)fortunately none of these extremes seems reasonable for a game like chess (and even less for go). So most software use both search and evaluation. And a whole lot of optimizing and other tricks. With impressive results.
I just bought Kampot peppers from https://www.unclespepper.com/ which is in Germany, the name notwithstanding. And yes, I paid with my Danish Visa card. No problems except that I had to adjust my ad blocker once.
If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time. A traditional compiler will produce a program that behaves the same way, given the same source and options. Some even go out of their way to guarantee they produce the same binary output, which is a good thing for security and package management. That is why we don't need to store the compiled binaries in the version control system.
Until LLMS start to get there, we still need to save the source code they produce, and review and verify that it does what it says on the label, and not in a totally stupid way. I think we have a long way to go!
> If we consider the prompts and LLM inputs to be the new source code, I want to see some assurance we get the same results every time.
There’s a related issue that gives me deep concern: if LLMs are the new programming languages we don’t even own the compilers. They can be taken from us at any time.
New models come out constantly and over time companies will phase out older ones. These newer models will be better, sure, but their outputs will be different. And who knows what edge cases we’ll run into when being forced to upgrade models?
(and that’s putting aside what an enormous step back it would be to rent a compiler rather than own one for free)
> New models come out constantly and over time companies will phase out older ones. These newer models will be better, sure, but their outputs will be different.
IIUC, same model with same seed and other parameters is not guaranteed to produce the same output.
If anyone is imagining a future where your "source" git repo is just a bunch of highly detailed prompt files and "compilation" just needs an extra LLM code generator, they are signing up for disappointment.
Greedy decoding gives you that guarantee (determinism). But I think you'll find it to be unhelpful. The output will still be wrong the same % of the time (slightly more, in fact) in equally inexplicable ways. What you don't like is the black box unverifiable aspect, which is independent of determinism.
If you’re using a model from a provider (not one that you’re hosting locally), greedy decoding via temperature = 0 does not guarantee determinism. A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]
What people don’t like is that the input-output relation of LLMs is difficult, if not impossible, to reason about. While determinism isn’t the only factor here (you can have a fully deterministic system that is still unpredictable in practical terms), it is still a factor.
The question is: if we keep the same context and model, and the same LLM configuration (quantization etc.), does it provide the same output at same prompt?
If the answer is no, then we cannot be sure to use it as a high-level language. The whole purpose of a language is providing useful, concise constructs to avoid something not being specified (undefined behavior).
If we can't guarantee that the behavior of the language is going to be the same, it is no better than prompting someone some requirements and not checking what they are doing until the date of delivery.
Mario Zechner has a very interesting article where he deals with this problem (https://mariozechner.at/posts/2025-06-02-prompts-are-code/#t...). He's exploring how structured, sequential prompts can achieve repeatable results from LLMs, which you still have to verify. I'm experimenting with the same, though I'm just getting started. The idea I sense here is that perhaps a much tighter process of guiding the LLM, with current models, can get you repeatable and reliable results. I wonder if this is the way things are headed.
A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]
But there is functional equivalence. While I don't want to downplay the importance of performance, we're talking about something categorically different when comparing LLMs to compilers.
Not when those LLMs are tied to agents, replacing what would be classical programming.
Using low code platforms with AI based automations, like most iPaaS are now doing.
If the agent is able to retrieve the required data from a JSON file, fill an email with the proper subject and body, sending it to another SaaS application, it is one less integration middleware that was required to be written.
For all practical business point of view it is an application.
Even those are way more predictable than LLMs, given the same input. But more importantly, LLMs aren’t stateless across executions, which is a huge no-no.
> But more importantly, LLMs aren’t stateless across executions, which is a huge no-no.
They are, actually. A "fresh chat" with an LLM is non-deterministic but also stateless. Of course agentic workflows add memory, possibly RAG etc. but that memory is stored somewhere in plain English; you can just go and look at it. It may not be stateless but the state is fully known.
Using the managed runtime analogy, what you are saying is that, if I wanted to benchmark LLMs like I would do with runtimes, I would need to take the delta between versions, plus that between whatever memory they may have. I don’t see how that helps with reproducibility.
Perhaps more importantly, how would I quantify such “memory”? In other words, how could I verify that two memory inputs are the same, and how could I formalize the entirety of such inputs with the same outputs?
But that’s not the point I’m trying to make here. JIT compilers are vastly more predictable than LLMs. I can take any two JVMs from any two vendors, and over several versions and years, I’m confident that they will produce the same outputs given the same inputs, to a certain degree, where the input is not only code but GC, libraries, etc.
I cannot do the same with two versions of the same LLM offering from a single vendor, that had been released one year apart.
Enough so that I've never had a runtime issue because the compiler did something odd once, and correct thr next time. At least in c#. If Java is doing that, then stop using it...
If the compiler had an issue like LLMs do, the half my builds would be broken, running the same source.
reply