Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Likely not.

We’re seeing a massive slowing in the value of all that additional training. Folks don’t like to talk about that, but absent a completely new break-thru the current math of LLMs has largely run its course.

We simply don’t need massive training forever and ever. We’re getting to the point that “good enough” models will solve most use cases. The demonstrated business value is also still broadly missing for AI on the level required to keep funding all this training for much longer.

 help



I dunno, I thought that too for a while too, but there are a lot of new ideas in terms of architecture that may warrant massive training runs. Mamba and state space models are pretty interesting, but haven’t had their transformer moment yet because I haven’t really seen anyone go for broke on training it with a huge data set and model size. Even some of the more fundamental changes too like Kolmogorov–Arnold Networks or some of the ideas behind continuous back propagation haven’t really had the opportunity to be pushed to the limit. I think it’s still early days on what these models can do. And I say this as someone who bought a Mac m3 max 128gb ram, based on the hope that the on device training and inference work would eventually move locally. It’s encouraging to see the progress though and I hope it does move locally though.

> but there are a lot of new ideas in terms of architecture that may warrant massive training runs

I don't think the argument is that isn't true, it's that the gains from those massive training runs is diminishing. Eventually, it won't be worth it to do the run for each new idea, you'll have to bundle a bunch together to get any noticeable change.


Same here. Then you see SOTA in a browser from Ex0byt, online 10x training (JIT-Lora), TurboQuant (Google), etc. Just saw KV prediction mentioned in this thread, so looking into that too.

I'm adapting all of this to Rust+WGPU with compute shaders if you want to follow along.

See this repo: https://github.com/tmzt/shady-thinker

Goal is Qwen3.5 27b on a Pixel 10 Pro running GrapheneOS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: