There’s honestly so much interesting stuff here, esp. the llm-related things - large concept models (operating on and predicting concepts, not tokens), dynamic byte latent transformers (byte-level alternative to standard tokenization), sparse memory layers (successfully scaling key-value memory layers without an increase in computational requirements).
Here they are presented as separate things, each of which apparently improves quality / efficiency. I wonder what the quality / efficiency increase is of all those methods put together? Maybe that’s what Llama 4 will be?
This looks like a lot of innovation is happening at Meta in those areas, really cool!
I hope that Llama 4 or 5 will have a different architecture. All released llamas are +/- same inference with a better training pipeline. The downside is that llamacpp will probably not be able to run new models and maybe it will be too much big rewrite, so we will need new c,cpp,go,rust programs.
I'd put a table of contents-like page up front with some exciting short description of each section and use hyperlinks, allowing the user to navigate to the section and back
Here they are presented as separate things, each of which apparently improves quality / efficiency. I wonder what the quality / efficiency increase is of all those methods put together? Maybe that’s what Llama 4 will be?
This looks like a lot of innovation is happening at Meta in those areas, really cool!