Making a chip this large is difficult, expensive, and error prone.
It blows past the reticle limit, so you end up having to do multiple (carefully aligned) exposures for adjacent chunks of the design. Signals can't really travel more than a few mm to a cm or so without significant degredation, so you end up needing to add buffers all over the place. Good yield tends to be exponentially harder with larger designs, since the larger area has a higher probability of overlapping with a defect, so you end up needing to harden the design with redundancy and parts that can selectively be disabled if they turn out to be defective so you don't have to throw out the whole part.
FPGAs have been pushing towards this kind of craziness, but there aren't huge advantages to doing this with a CPU, since the only thing you can do with so much more area is add cores and cache. The cost to going off-die to additional cores isn't so bad. On the other hand, trying to synthesize a multi-FPGA design (while meeting timing) is torture.
Why not, then, build a motherboard with many processor slots that can handle the linking then use traditionally sized processors? I don’t have the background here and am curious.
That's possible already. But usually it is done at the machine level rather than at the processor level (2, 4 and even 8 slot machines exist but are expensive) once you get over certain limits. The kind of problems people tend to solve on such installations (typically clusters of commodity hardware) are quite different from the kind of programs that you run on your day-to-day machine, think geological analysis, weather prediction and so on.
At some point the cost of the interconnect hardware dominates the cost of the CPUs. Lots of parties, for instance Evan Sutherland (https://news.ycombinator.com/item?id=723882) have tried their hand at this, but so far nobody has been able to pull it off successfully.
Eventually it will happen though, this is an idea that's too good to remain without sponsors long.
2 socket servers are actually the norm and dominate datacenters. 4 is still common but usually only done for the ability to address a large amount of RAM in a single machine or niche commercial workloads. 8 socket x86 servers are very unusual.
To add context, this is mostly because the numa properties get weird. With 2 sockets all of the inter-socket links can go directly to the other processor, and xeons have 2-3 of those currently. With 4 and 8 you end up having strange memory topologies that has hops that are less predictable unless you know your application was written for it.
It blows past the reticle limit, so you end up having to do multiple (carefully aligned) exposures for adjacent chunks of the design. Signals can't really travel more than a few mm to a cm or so without significant degredation, so you end up needing to add buffers all over the place. Good yield tends to be exponentially harder with larger designs, since the larger area has a higher probability of overlapping with a defect, so you end up needing to harden the design with redundancy and parts that can selectively be disabled if they turn out to be defective so you don't have to throw out the whole part.
FPGAs have been pushing towards this kind of craziness, but there aren't huge advantages to doing this with a CPU, since the only thing you can do with so much more area is add cores and cache. The cost to going off-die to additional cores isn't so bad. On the other hand, trying to synthesize a multi-FPGA design (while meeting timing) is torture.