Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So everyone is supposed to do all of their testing on H100's?


4090 (2022) PCIe 4.0 x16 is quite decent. The major limit is memory, not bandwidth. And 3090 (2020) is also PCIe 4.0 x16, and used cards are a bargain. You can hook them up with Nvlink.

Nvidia is withholding new releases but the current hardware has more legs with new matrix implementations. Like FlashAttention doing some significant improvement every 6 months.

Nvidia could make consumer chips with combined CPU-GPU. I guess they are too busy making money with the big cloud. Maybe somebody will pick up. Apple is already doing something like it even on laptops.


get a GH100 on lambda and behold you have 900GB/s between CPU memory and GPU, and forget PCIe.


That doesn't add up, because you're now exceeding the memory bandwidth of the memory controller. I.e. it would be faster to do everything, including the CPU only algorithms, in far away VRAM.


latency might be higher and latency usually dominates CPU side algorithms.


Where are you seeing 900 GB/s?


The 900 GB/s are a figure often cited for Hopper based SXM boards, it's the aggregate of 18 NVLink connections. So it's more of a many-to-many GPU-to-GPU bandwidth figure.

https://www.datacenterknowledge.com/data-center-hardware/nvi...


this is bidirectional bw and it's gh200, not gh100




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: