Hacker Newsnew | past | comments | ask | show | jobs | submit | theanonymousone's commentslogin

Have you seen the 8bit quantisation matter a lot? The "consensus" in r/LocalLlama is that up to 4 bits the loss is tolerable.

Absolutely. Difference in Q6 vs Q8 is not as immediately noticeable, but if I test by starting from a blank slate context and giving it the same complicated task with Q4 vs a Q8 GGUF file loaded, the difference is apparent. The Q4 will struggle or do 'stupid' things with even simple bash or python. Q4 might not be as noticeable for conversational purely text one on one interaction with an LLM, but when you dig deeper into something that's more esoteric in a training dataset than a chat conversation, absolutely a big gap there.

I think some of the folks in the local llm social media communities are using them for things like company-hosted customer service chat bots, or purely english text writing stuff where Q4 will probably not cause a problem. For more discrete technical work I stick pretty much exclusively to Q8.


Thanks a lot. How about Q8 vs FP16/BF16? Have you checked them too?

I have not spent a lot of time running FP16 'full precision' versions of some things, but as the other commenter says, it's not much difference. There's a really wide array of benchmarks and tests from a lot of third parties unrelated to the trainer of the AI models that shows at most a two percent difference in score and capability between BF16 and Q8.

Q8 quant is very minimal fall off in terms of KLD against the lab 16 bit. If you have the memory for BF16 KV-cache (which is usually easier to stomach) then the Q8 is very close. But even Q8 quant model with Q8 KV-cache is very close.

Smaller quants for the model start to fall off but more importantly, smaller KV-cache quants fall off much faster so avoid less than Q8 there.


It’s not a general rule, and depends highly on the model and the quantisation used. Don’t guess, Unsloth sometimes publish graphs in their tutorials showing the error rate vs file size… sometimes Q4 is great, other times I go for Q6

My question as well. Isn't Tencent a very well-known company? Maybe the mystery is in the model itself?

This is a big deal when/if it's working, to me at least. Where can I contribute?

https://github.com/evmar/theseus

Looks like just enough was supported to run minesweeper. Impressive though.


Isn't this link a duplicate? Or I have déjà vu?

Hi. Is there a significance to that date or that commit? The commit doesn't look very special, and Git was apparently being used from early April already: https://en.wikipedia.org/wiki/Git

Or I may be entirely wrong.


I have always said please and thank you to LLMs, not to increase accuracy or because I'm stupid. I believe it is more about me than about the LLM, and this is anyway a habit I don't want to lose.

Thomas Aquinas believed cruelty to animals was wrong not because animals have souls (and with that all the standard moral rights), but because it can teach us cruelty to other humans.

Snarky morning: "spiritual souls" as opposed to "mere animal souls". Sorry, could not control myself.

Spiritual or not, anyone watching cattle in an abatoir will recognize symptoms of the kind of foreboding that I would suffer prior to execution.

Genuine question: do you add 'please' and 'thank you' to Google searches? If not, what sets them apart?

Google searches being keyword based, rather than simulated conversations?

The same reason you wouldn't put in an entire actual question/sentence, unless you either don't know how to use Google, are pissed off, or have an actual reason to suspect that it would yield proper hits (e.g. looking up an excerpt).


Google has been optimized for sentence like questions so much that for a good 6+ years now it has been completely useless as keyword search.

To clarify: sentence search got slightly better at the cost of keyword search. So the result is unusable garbage.


It is rather hard to lose of habit of using search engine with keywords given the change took place without much fanfare. I have no problem using sentences with the current ai tools through.

Genuine question: do you write Google search queries in natural language?

I didn't used to but I do now that the searches go straight to an LLM. I almost always find the model output to be much more useful than the list of search results.

I don't. I was recently doing some searching for information I thought AI would be good for: fuzzy natural language search with some conditions. And it was, but ...

Gemini at least is not great at citing and picking sources. Or providing multiple sources for the same thing.

It tends to stop at threes. So if you want more, you have to prompt it uselessly, like: "any more?"


llms seem more human like so if you were to treat them badly then you are more likely to condition yourself to treat other living creatures badly.

Google isn’t conversational.

I searched for "Hey Google" and got this in response:

  Hey! I'm here and ready to help. What’s on your mind today? Whether you need to look up information, plan a trip, or get things done, just let me know!

That's only because Google is an LLM now.


One of the dumbest thing supposedly clever people keep bringing up.

Is it worth getting worse results for that reason? From the article:

"Contrary to expectations, impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts. These findings differ from earlier studies that associated rudeness with poorer outcomes, suggesting that newer LLMs may respond differently to tonal variation. "

I am not polite to LLMs because I do not want to anthropomorphise them.


I guess it's about habit. In the end you are communicating. If I get into the habit of being rude while communicating with a machine, I would be afraid of this habit spilling over to my communication with other humans.

I don't feel like trying to get information from an LLM is a different kind of "communication" from, say, writing code. And I don't use INTERCAL, so.

What about the risk that talking to a machine as though its human leads to thinking of it has human? That leads down a lot of dangerous paths.

> Is it worth getting worse results for that reason?

> accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts

I can live with that, for now at least.


I also remember reading a long time ago someone who wrote that they wanted to be polite to an LLM because after they prompted it to learn about whether politeness was good for improving accuracy of responses, they got a message that led them to conclude that politeness could probably help. It seems a bit odd then because I have heard so much about how people use LLMs' responses about themselves to learn about LLMs themselves, but that seems like it is a suspicious approach.

Me too! You've said exactly what I was about to say. Anyone else feel that way?

There's also awareness of the basilisk...

Isn't caching a server-side thing? How does the agent affect it, significantly at least?

Say you put the current time down to the second in the system prompt, which is the message that goes in front of the entire conversation, then basically nothing will be cached, every agent turn needs to ingest the entire session over and over. Contrast to not doing that, and the backend can leverage caching all the way up to the latest message, as nothing until then changed.

Surely other agent CLIs are not dumb enough to invalidate cache on every turn over something so obvious?

I don't think any the agents breaks caching on every turn, but they might do things like current list of files, or available tools depending upon plan/build mode... or lots of other things that breaks caching multiple times during a session.

Probably not that exactly, but there is a tradeoff between effectiveness of the prompt and cache hit rate. If putting the user’s datetime in the middle of the prompt scores higher on evals but worsens cache hits, versus at the end of the prompt where it’s cache friendly but may not be as effective, what do you do?

This is still art as much as science and the different harnesses take different approaches.


Obviously not, most agents properly keep previous messages unchanged, at least the major ones I've been digging into the source off. Also, everything would get so much slower, that even developers creating their own agents would notice quickly how much slower theirs is, if they fuck this up.

That's not necessarily true, you can have multiple cache points, see e.g. https://platform.claude.com/docs/en/build-with-claude/prompt...

Yes, of course you can destroy it. But how far can you "improve", beyond decent "common sense" behaviour.

I'm wondering whether ReactOS can exploit Claude et. al. to their fullest and "recreate" Windows 2000/95. I may donate some tokens for that cause.

That sounds like a terrifying legal minefield that they would not want to tread

Is it not safe to assume Window source code is not present in the LLM training data?


There are repos on github. Which, technically, means you can download Windows source from Microsoft. Just not legally.

Slap a fair use on it and call it a day.

> Anthropic offers a formal copyright indemnification policy for its enterprise customers using the Claude API. The policy protects businesses from copyright infringement claims arising from authorized use of Claude or its generated outputs

So just claim it is Claude


What's that phrase, "derivative work" or something?

But surely anything the LLM outputs is clear of licensing requirements /s

Or would Microsoft like to argue otherwise in court?


I've used Claude to fix/reconstruct & build leaked Win2k3 on Linux with original toolchain via Wine. This approach included full gdi sources reconstruction. I just don't know what to do with this, it's kinda difficult to "wash" on this scale

Run it as your daily driver and trust your data to it. /s

A weird feeling tells me that this "keeping only in name" was done because someone at Google was cross with killedbygoogle.com.

Is it not the same thing as/very similar to falsifiability in philosophy of science?

Yes, I think it is, now that you mention it. But maybe "non-decidable" could be dropped into a casual conversation more easily?

But if I say that I think an issue is non-decidable, my interlocutor is likely to think that I simply can't make up my mind!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: