Hacker Newsnew | past | comments | ask | show | jobs | submit | om8's commentslogin

cargo/uv/go have lock files though


with curl | sh you could use a checksum you download with curl!


https://docs.vllm.ai/en/v0.20.0/api/vllm/model_executor/laye...

`vllm.model_executor.layers.quantization.turboquant`

> The technique implemented here consists of the scalar case of the HIGGS quantization method (Malinovskii et al., "Pushing the Limits of Large Language Model Quantization via the Linearity Theorem", NAACL 2025; preprint arXiv:2411.17525): rotation + optimized grid + optional re-normalization, applied to KV cache compression. A first application of this approach to KV-cache compression is in "Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models" (Shutova et al., ICML 2025; preprint arXiv:2501.19392). Both these references pre-date the TurboQuant paper (Zandieh et al., ICLR 2026).


Those works did cite DRIVE/EDEN :)

HIGGS is an extension of EDEN (using the well known method for blockwise Lloyd-Max).

The proper framing of this "TurboQuant" layer in vllm (which does not include JQL) is precisely EDEN 22 without the scale correction.


EDEN is clearly relevant prior work for HIGGS. But reducing HIGGS to “an extension of EDEN” seems unfair to the authors of HIGGS. Similar primitive, different problem setting, different constraints, different contribution.

Curious: where do you draw the line between “related prior work” and “an extension of EDEN”?


In the vLLM documentation quoted above, TurboQuant (which is a restricted version of EDEN) is referred to as a specific case of HIGGS. Note the symmetry: EDEN acts as a special case of HIGGS; hence, HIGGS functions as a generalization of EDEN.

In any case, the quantizer is indeed an extension, regardless of whether it was explicitly framed that way in the paper. I say this not to diminish their contribution at all, but just to clarify the relationship, as it was also stated in the vLLM doc.


These are very different media types with very different goals.


Is there a way to disable it? Sometimes I value agent not having knowledge that it needs to cut corners


90-98% of the time I want the LLM to only have the knowledge I gave it in the prompt. I'm actually kind of scared that I'll wake up one day and the web interface for ChatGPT/Opus/Gemini will pull information from my prior chats.


They already do this

I've had claude reference prior conversations when I'm trying to get technical help on thing A, and it will ask me if this conversation is because of thing B that we talked about in the immediate past


You can disable this at Settings > Capabilities > Memory > Search and reference chats.


I'm fairly sure OpenAI/GPT does pull prior information in the form of its memories


Ah, that could explain why I've found myself using it the least.


All these of these providers support this feature. I don’t know about ChatGPT but the rest are opt-in. I imagine with Gemini it’ll be default on soon enough, since it’s consumer focused. Claude does constantly nag me to enable it though.


Had chatgpt reference 3 prior chats a few days ago. So if you are looking for a total reset of context you probably would need to do a small bit of work.


Gemini has this feature but it’s opt-in.


Claude told me he can disable it by putting instructions in the MEMORY.md file to not use it. So only a soft disable AFAIK and you'd need to do it on each machine.


I ran into this yesterday and disabled it by changing permissions on the project’s memory directory. Claude was unable to advise me on how to disable. You could probably write a global hook for this. Gross though.



... Yet.


FYI you can run just `uvx pdm`


Interesting! I honestly didn't even know that this shorthand existed


Also got it, found this thread by googling "ycombiinator"


Also got it, found this thread by googling "ycombiinator"


Have a similar project. Also written in rust, runs in a browser using web assembly

In-browser demo: https://galqiwi.github.io/aqlm-rs

Source code: https://github.com/galqiwi/demo-aqlm-rs


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: