I think the post is a bit disingenuous. But about bandwidth, matrix multiplicati... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		alecco on Dec 11, 2024 \| parent \| context \| favorite \| on: The GPU is not always faster I think the post is a bit disingenuous. But about bandwidth, matrix multiplications happen mostly in cache and that has a lot more bandwidth than RAM. Blocks of the matrix are loaded to cache (explicitly in CUDA) and used multiple times there. I'd exploit the better multi-level cache hierarchy in CPUs and make the code NUMA aware. But still I wouldn't bet against a recent GPU card.

rnrn on Dec 12, 2024 [–]

> But about bandwidth, matrix multiplications happen mostly in cache and that has a lot more bandwidth than RAM. Blocks of the matrix are loaded to cache (explicitly in CUDA) and used multiple times there.

The post is about dot product, not matrix multiply. Dot product has no data reuse

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact