Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Pushing the Limits of Kernel Networking (redhat.com)
50 points by jsnell on Oct 2, 2015 | hide | past | favorite | 10 comments


It's important to note that this is pushing the limits of x86 kernel networking. All commercial and enterprise routers (like the ones in your colo) use an ASIC for transit traffic. Management traffic usually gets routed by the ASIC to a GP processor like an x86 or similar.

> On a 10 Gbps link it is possible to pass packets at a rate of 14.88 Mpps. At that rate that we have just 67.2ns per packet. Also you have to keep in mind that an L3 cache hit, yes that is a “hit” and not a miss, costs us something on the order of 12 nanoseconds.

It's for this reason that transit traffic is handled by the ASIC, where rules for the session have already been established, and it can just match and forward/drop/reject. The management packets (of which there are far fewer, but they require a great deal more processing) are handed off to the general purpose (GP) processor. In Junipers they do use x86; I'm not sure what Cisco uses.


"With all of the changes mentioned above the overall processing time per packet for small packet routing is reduced by over 100ns per packet."

Right. This is for the case where the machine isn't doing anything with the packet except sending it back out. But it might also help when the machine is performing a service which requires very little CPU time but moves huge amounts of data. Like serving video. Are the big video servers still x86 machines, or does Netflix now use custom hardware?

If it's a web server constructing pages on the fly from a database, that work will dominate the time spent getting the packets in and out of the machine.


> But it might also help when the machine is performing a service which requires very little CPU time but moves huge amounts of data. Like serving video. Are the big video servers still x86 machines, or does Netflix now use custom hardware?

Netflix is using x86 for their video servers. They're running FreeBSD, and they've published recently[1] about moving TLS session crypto into the kernel so they can serve their videos with TLS without as much of a performance loss. The pdf mentions they were able to get 40 Gbps out of their machines with TLS disabled; with TLS enabled, they were only able to get close to 10 Gbps. CPU is E5-2650L (Sandy Bridge), serving from SSD, equipped with quad port 10g nic.

[1] https://people.freebsd.org/~rrs/asiabsd_2015_tls.pdf


It's interesting, that CPU has the AES new instructions, so it can hardware accelerate AES operations. Yet TLS is so impacting.


Even with the intrinsics AES is pretty expensive

https://software.intel.com/en-us/articles/intel-aes-ni-perfo...


> I'm not sure what Cisco uses.

Probably PowerPC, though I hear Freescale's testing out ARM cores.

http://www.freescale.com/applications/networking:IFNWNTWRKNG...


Nexus uses x86, Celerons specifically (at least all of them I'm aware of). You can see it if you watch one boot over serial. It's in the example text from this upgrade notice: http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nex...


Why can't we just do it in hardware/FGPAs at that point?


That depends what "it" is. Routing and switching are good candidates for ASICs and simple end-host networking has good NIC offload. But Linux can be configured to perform arbitrarily complex network stuff that specialized hardware does not support.


That's about what I was going to say. Basically, you can keep adding more features to your ASIC, but the more things your ASIC supports, the more it starts to resemble a poorly-designed CPU.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: