4090 (2022) PCIe 4.0 x16 is quite decent. The major limit is memory, not bandwidth. And 3090 (2020) is also PCIe 4.0 x16, and used cards are a bargain. You can hook them up with Nvlink.
Nvidia is withholding new releases but the current hardware has more legs with new matrix implementations. Like FlashAttention doing some significant improvement every 6 months.
Nvidia could make consumer chips with combined CPU-GPU. I guess they are too busy making money with the big cloud. Maybe somebody will pick up. Apple is already doing something like it even on laptops.
That doesn't add up, because you're now exceeding the memory bandwidth of the memory controller. I.e. it would be faster to do everything, including the CPU only algorithms, in far away VRAM.
The 900 GB/s are a figure often cited for Hopper based SXM boards, it's the aggregate of 18 NVLink connections. So it's more of a many-to-many GPU-to-GPU bandwidth figure.