deepseek-1b, qwen2.5-coder:1.5b, and starcoder2-3b are all pretty fast on cpu du...

ghthor · on Dec 19, 2024

Starcoder3B works great in my second hand rtx2080. Can’t run 7B, just a hair too little Ram, but still great completions

thot_experiment · on Dec 19, 2024

You should definitely be able to run 7B at q6_k and that might be outperformed by 15b w/ a sub 4bpw imatrix quant, iQ3_M should fit into your vram. (i personally wouldn't bother with sub 4bpw quants on models < ~70b parameters)

Though if it all works great for you then no reason to mess with it, but if you want to tinker you can absolutely run larger models at smaller quant sizes, q6_k is basically indistinguishable from fp16 so there's no real downside.