I am in the same boat you are. I also appreciate your practical and concise poin...

epistasis · on Nov 8, 2015

I've only messed with a single tunnel at a time. However, there's a whitepaper from Intel from several years ago that says that multiplexing over 6-12 tunnels should possibly work:

http://www.intel.com/content/dam/www/public/us/en/documents/...

The question remains on how to perform routing for this. It appears that running Quagga on both end points, speaking BGP to each other, will allow ECMP over the 6-12 IPsec tunnels. Each individual stream will still be capped at ~1Gbit, but at least the total throughput could be used and there's no traffic between the two endpoints that's not encrypted.

However, getting Quagga + BGP + ECMP + multiple IPsec tunnels + pinning tunnels to particular cores + setting up the RSS flows from the NIC to particular cores... Well it was a bit more than I could bite off at the time. The pieces feel sooo close to working, but it becomes a Rube Goldberg machine of systems software, with each piece operating at the limit of its design properties.

And honestly, I kind of want two orthogonal encryption technologies; TLS has bugs every other day it seems, and IPsec has always seemed a little bit suspicious, and really who knows if my random numbers are really random, so I wanted two layers.

Please do share if you make any progress!

phil21 · on Nov 8, 2015

I hesitate to post as my experience is a bit dated as well, but this topic continually seems to come up lately in my circles without a whole lot of authoritative information I've been hearing.

Single-stream IPSec is still a major performance issue. Maybe someone out there is getting more than about 2.5Gbps performance in the real world per stream out there, but I haven't talked to them. This is with some pretty idealistic assumptions too - I would expect most folks real-world performance is orders of magnitude lower.

Scaling it over multiple cores does work, but introduces the complexity that you mention. This still completely ignores the major problem of the fact that single flows are stuck with early-2000's level network performance! With ethernet moving to 40g in high-performance applications, being stuck with 2gbps TCP connections inside the datacenter is not ideal and makes doing global-infrastructure-level ipsec a non-starter. I briefly spent some time considering going 100% ipsec for all inter-cluster communications for a product we were building, but the performance implications put that dead in the water almost immediately. Moving it up the stack into the application was far less expensive - but of course carries with it the cost that you are no longer operating in a "fail safe" environment.

You could make the argument that anything high performance needs to horizontally scale across multiple streams and links - but that's being too idealistic for the current state of technology. Sometimes you just really need to make that legacy NFS transfer go faster.

With the advent of hardware acceleration built into "commodity" CPUs these days - I really did expect more real-world progress on this front both on the linux side and custom asics (e.g. switches/routers). Given the number of colleagues/customers who have asked for similar solutions ("I want 10gbps single stream ipsec") it's not an uncommon problem folks are running across.

epistasis · on Nov 8, 2015

For others who are interested, here's a thread that culminates in the 2.5Gbps number:

http://permalink.gmane.org/gmane.linux.network/280175

And here's some StrongSwan documentation about getting there:

https://wiki.strongswan.org/projects/strongswan/wiki/Pcrypt