Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The software has real software engineers working on it instead of researchers.

Remember when people were arguing about whether to use mmap? What a ridiculous argument.

At some point someone will figure out how to tile the weights and the memory requirements will drop again.

 help



The real improvement will be when the software engineers get into the training loop. Then we can have MoE that use cache-friendly expert utilisation and maybe even learned prefetching for what the next experts will be.

> maybe even learned prefetching for what the next experts will be

Experts are predicted by layer and the individual layer reads are quite small, so this is not really feasible. There's just not enough information to guide a prefetch.


It's feasible to put the expert routing logic in a previous layer. People have done it: https://arxiv.org/abs/2507.20984

Manually no. It would have to be learned, and making the expert selection predictable would need to be a training metric to minimize.

Making the expert selection more predictable also means making it less effective. There's no real free lunch.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: