I am in the same boat you are. I also appreciate your practical and concise point of view.
Yes, it ought to work quite easily, but in practice it doesn't yet, does it?
I'm planning to spin up a few servers in Amazon to test it there just to get some benchmarks. What kind of testing and experimentation have you done? In my use case I would put the traffic through a single machine across a ten gig link to another machine. So I could ensure that all the traffic passing through with it.
The practical use case is to encrypt traffic between data centers. I don't see a feasible way to do that on a megabyte at a time. It almost certainly has to be done on a per packet basis. Either at the application which is a little bit difficult to do in the short-term, or in some sort of IP sec or other similar solution.
No question that the right solution here is to make all the applications do encryption, but again, I need to have a practical solution now, and then the developers can put that in.
I've only messed with a single tunnel at a time. However, there's a whitepaper from Intel from several years ago that says that multiplexing over 6-12 tunnels should possibly work:
The question remains on how to perform routing for this. It appears that running Quagga on both end points, speaking BGP to each other, will allow ECMP over the 6-12 IPsec tunnels. Each individual stream will still be capped at ~1Gbit, but at least the total throughput could be used and there's no traffic between the two endpoints that's not encrypted.
However, getting Quagga + BGP + ECMP + multiple IPsec tunnels + pinning tunnels to particular cores + setting up the RSS flows from the NIC to particular cores... Well it was a bit more than I could bite off at the time. The pieces feel sooo close to working, but it becomes a Rube Goldberg machine of systems software, with each piece operating at the limit of its design properties.
And honestly, I kind of want two orthogonal encryption technologies; TLS has bugs every other day it seems, and IPsec has always seemed a little bit suspicious, and really who knows if my random numbers are really random, so I wanted two layers.
I hesitate to post as my experience is a bit dated as well, but this topic continually seems to come up lately in my circles without a whole lot of authoritative information I've been hearing.
Single-stream IPSec is still a major performance issue. Maybe someone out there is getting more than about 2.5Gbps performance in the real world per stream out there, but I haven't talked to them. This is with some pretty idealistic assumptions too - I would expect most folks real-world performance is orders of magnitude lower.
Scaling it over multiple cores does work, but introduces the complexity that you mention. This still completely ignores the major problem of the fact that single flows are stuck with early-2000's level network performance! With ethernet moving to 40g in high-performance applications, being stuck with 2gbps TCP connections inside the datacenter is not ideal and makes doing global-infrastructure-level ipsec a non-starter. I briefly spent some time considering going 100% ipsec for all inter-cluster communications for a product we were building, but the performance implications put that dead in the water almost immediately. Moving it up the stack into the application was far less expensive - but of course carries with it the cost that you are no longer operating in a "fail safe" environment.
You could make the argument that anything high performance needs to horizontally scale across multiple streams and links - but that's being too idealistic for the current state of technology. Sometimes you just really need to make that legacy NFS transfer go faster.
With the advent of hardware acceleration built into "commodity" CPUs these days - I really did expect more real-world progress on this front both on the linux side and custom asics (e.g. switches/routers). Given the number of colleagues/customers who have asked for similar solutions ("I want 10gbps single stream ipsec") it's not an uncommon problem folks are running across.
Yes, it ought to work quite easily, but in practice it doesn't yet, does it?
I'm planning to spin up a few servers in Amazon to test it there just to get some benchmarks. What kind of testing and experimentation have you done? In my use case I would put the traffic through a single machine across a ten gig link to another machine. So I could ensure that all the traffic passing through with it.
The practical use case is to encrypt traffic between data centers. I don't see a feasible way to do that on a megabyte at a time. It almost certainly has to be done on a per packet basis. Either at the application which is a little bit difficult to do in the short-term, or in some sort of IP sec or other similar solution.
No question that the right solution here is to make all the applications do encryption, but again, I need to have a practical solution now, and then the developers can put that in.