Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

KV-cache is still quite small compared to the weights. It can stay in memory for reasonable context length, or be streamed to storage as a last resort. This actually doesn't impact performance too much, since we were already limited by having to stream in the much larger weights.
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: