More

om8 · 2026-04-30T01:36:20 1777512980

cargo/uv/go have lock files though

dnnddidiej · 2026-04-30T04:04:10 1777521850

with curl | sh you could use a checksum you download with curl!

om8 · 2026-04-27T10:32:44 1777285964

https://docs.vllm.ai/en/v0.20.0/api/vllm/model_executor/laye...

`vllm.model_executor.layers.quantization.turboquant`

> The technique implemented here consists of the scalar case of the HIGGS quantization method (Malinovskii et al., "Pushing the Limits of Large Language Model Quantization via the Linearity Theorem", NAACL 2025; preprint arXiv:2411.17525): rotation + optimized grid + optional re-normalization, applied to KV cache compression. A first application of this approach to KV-cache compression is in "Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models" (Shutova et al., ICML 2025; preprint arXiv:2501.19392). Both these references pre-date the TurboQuant paper (Zandieh et al., ICLR 2026).

amitport · 2026-04-27T12:46:59 1777294019

Those works did cite DRIVE/EDEN :)

HIGGS is an extension of EDEN (using the well known method for blockwise Lloyd-Max).

The proper framing of this "TurboQuant" layer in vllm (which does not include JQL) is precisely EDEN 22 without the scale correction.

kumarhn · 2026-04-27T14:38:25 1777300705

EDEN is clearly relevant prior work for HIGGS. But reducing HIGGS to “an extension of EDEN” seems unfair to the authors of HIGGS. Similar primitive, different problem setting, different constraints, different contribution.

Curious: where do you draw the line between “related prior work” and “an extension of EDEN”?

amitport · 2026-04-27T15:22:20 1777303340

In the vLLM documentation quoted above, TurboQuant (which is a restricted version of EDEN) is referred to as a specific case of HIGGS. Note the symmetry: EDEN acts as a special case of HIGGS; hence, HIGGS functions as a generalization of EDEN.

In any case, the quantizer is indeed an extension, regardless of whether it was explicitly framed that way in the paper. I say this not to diminish their contribution at all, but just to clarify the relationship, as it was also stated in the vLLM doc.

om8 · 2026-03-25T09:58:53 1774432733

These are very different media types with very different goals.

om8 · 2026-02-05T18:23:41 1770315821

Is there a way to disable it? Sometimes I value agent not having knowledge that it needs to cut corners

nerdsniper · 2026-02-05T19:19:31 1770319171

90-98% of the time I want the LLM to only have the knowledge I gave it in the prompt. I'm actually kind of scared that I'll wake up one day and the web interface for ChatGPT/Opus/Gemini will pull information from my prior chats.

pdntspa · 2026-02-05T21:27:58 1770326878

They already do this

I've had claude reference prior conversations when I'm trying to get technical help on thing A, and it will ask me if this conversation is because of thing B that we talked about in the immediate past

sanxiyn · 2026-02-06T00:53:52 1770339232

You can disable this at Settings > Capabilities > Memory > Search and reference chats.

hypercube33 · 2026-02-05T19:48:30 1770320910

I'm fairly sure OpenAI/GPT does pull prior information in the form of its memories

nerdsniper · 2026-02-05T19:50:39 1770321039

Ah, that could explain why I've found myself using it the least.

vineyardmike · 2026-02-05T21:18:12 1770326292

All these of these providers support this feature. I don’t know about ChatGPT but the rest are opt-in. I imagine with Gemini it’ll be default on soon enough, since it’s consumer focused. Claude does constantly nag me to enable it though.

sumtechguy · 2026-02-06T19:52:34 1770407554

Had chatgpt reference 3 prior chats a few days ago. So if you are looking for a total reset of context you probably would need to do a small bit of work.

sharifhsn · 2026-02-05T19:53:04 1770321184

Gemini has this feature but it’s opt-in.

kzahel · 2026-02-05T20:17:57 1770322677

Claude told me he can disable it by putting instructions in the MEMORY.md file to not use it. So only a soft disable AFAIK and you'd need to do it on each machine.

jsw97 · 2026-02-06T08:48:58 1770367738

I ran into this yesterday and disabled it by changing permissions on the project’s memory directory. Claude was unable to advise me on how to disable. You could probably write a global hook for this. Gross though.