Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Everything Deepmind published at this year's ICML would be a good start.

Transformers (or rather the QKV attention mechanism) has taken over ML research at this point, it just scales and works in places it really shouldn't. Eg. you'd think convnets would make more sense for vision because of its translation invariance, but ViT works better even without this inductive bias.

Even in things like diffusion models the attention layers are crucial to making the model work.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: