I agree that specialization is often necessary, but I would prefer to think in terms of add-ons to general purpose commodity architectures. For example, GPUs are specialized for the task of 3D math, but their particular purpose is decided by the CPU. Similarly, FPGAs can be dynamically programmed for a special circuit. Neither is ideal, but in hindsight I would have liked to see something similar develop for the task of routing - and the economic pressure applied by not buying ASICs (that break the stack seperation) would have helped (yes we have benefited from them, but we have benefited from taking a specific path - we don't know if there was a better one). Routing probably would have benefited from an earlier and stronger push towards multicore processors - only there was no reason to do it in consumer CPUs because we view routing and computing largely as exclusive domains (even though the Internet's success has come from connecting many general purpose endpoints... Why is the fabric not also general purpose?).
The specialised chips that are used by routers are CAMs and TCAMs. Basically, associative arrays (hashes or dictionaries as they're called in most scripting languages) implemented in hardware. You can program an FPGA to run as a CAM. Mostly what they're doing are repeated lookups of where to switch or route packets (based on header fields). If you're in a CPU world with RAM you can try and optimise the data structure that holds the lookup tables so you get decent speed but with CAM it's implemented in hardware so the lookup times are fixed and known. It's not really a limit of computation that can be helped with multiple cores, the CPUs can be pretty slow. Multithreading or parallel processing is typically handled on routers by distributing the workload to packet processors that handle a single port or a group of ports, all looking at the same CAM (or copies of it in the more distributed chassis-based architectures). Modern routers are basically supercomputers in a box. Of course there's an inherent limit at the PHY level where you can only receive or transmit a single packet at a time on an interface so you can't multiprocess beyond a CPU per port basically which limits the complexity of the design. 1000 cores on a GPU won't help you if you only have 100 ports to route traffic on.
On a typical Intel PC, the MMU's TLB is implemented in CAM as well but it's tiny in comparison to the ones in routers. CAMs are pretty power hungry compared to RAM. They are becoming commodities though and the router manufacturers are getting away from doing custom ASIC designs. Intel bought Fulcrum which makes these packet processors and companies like Big Switch are doing cool stuff with white box network hardware. It's really going to bring the cost of the hardware down pretty dramatically in the next couple years.
Going forward, I suspect GPUs could also be applied to make routing decisions with acceptable speeds. Have a look at http://www.date-conference.com/proceedings/PAPERS/2010/DATE1... for some trials.