> 500 core The newest fully E-core based Xeon CPUs have reached that figure by n...

timschmidt · 2026-03-15T16:14:57 1773591297

Yup. And high end GPU compute now has on-package HBM like Knight's had a decade ago, and those new Intel CPUs are finally shipping with AVX reliably again. We lost a decade for workloads that would benefit from both.

fc417fc802 · 2026-03-16T04:56:01 1773636961

But I'm surprised PCIe based CPU+RAM modules aren't a "thing" since that's basically what a GPU is if you ignore all the rather fundamental differences. Seems like it would be convenient to cheaply attach additional compute without worrying about all the other stuff.

I suppose I'm just reinventing SXM at this point. The BC-250 comes close but despite the formfactor it isn't actually a PCIe card. Although if it integrated a 100 Gbit SFP slot it might actually be superior to a solution that resided in a host system. But the BC-250 is very much an anomaly as opposed to the norm.

zozbot234 · 2026-03-16T05:00:45 1773637245

You need CXL to extend the cache coherency properties of actual RAM over a remote link. That's costly tech. Otherwise, you're relying on the OS (and even the compiler/basic libraries, since you need to make fences, etc. OS-visible) to paper over the differences by doing its own implementation of distributed shared memory (this is known as a 'SSI' or single-system-image approach) which has significant challenges and is closer to the spirit of setting up swap.

fc417fc802 · 2026-03-16T05:17:58 1773638278

I didn't mean anything like that. Just the equivalent of a GPU with the ability to run arbitrary CPU oriented programs.

Of course GPUs do many tasks very well but there are also plenty of problems that aren't well suited to them. Well I suppose I've answered my own question at this point. There probably just aren't enough real world problems that aren't amenable to running on a GPU while also being either compute or memory bandwidth bound.

Still the near-monoculture does strike me as odd. I guess GPUs have bifurcated into enterprise versus consumer at this point but otherwise all we've got is a single CPU example from over a decade ago and a single alternative take on the concept from Fujitsu. Is it just due to the obscene cost of masks for modern process nodes?

akvadrako · 2026-03-16T09:31:04 1773653464

Things like that existed in the category of accelerator cards. Xeon Phi (Knights) is one example, focused on core count. Some from HP have soldered on SSDs too. You also had blade servers which is more focused on that use case, though that's going out of style.

I don't think PCIe is really a good fit for general CPU tasks. You need big heatsinks and power and can't fit that much RAM on board.