Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for the information. I know Google had TPU custom made a long time ago, and that the concept has existed for a LONG TIME. I assumed that a technical hurdle (i.e. VRAM) was finally behind allowing this theoretical (1 token/sec on a CPU vs 100 tokens/sec on a GPU) to become reasonable.

Thanks for the links too!



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: