> AES instruction sets implemented in hardware are the main factor in making HTTPS viable.
Not true. It's something like 30-40 cycles per byte to do AES without specialized instructions. That makes saturating 100mbit trivial, and saturating gigabit one of the easier problems. How much traffic can your toaster and router possibly be terminating?
Power usage is very important for low-end devices, especially smartphones which will be talking to lots of HTTPS resources. This is the complaint on the other side of the pond, away from the people who need ASIC accelerators for whatever reason to keep up with bandwidth/latency/cpu concerns.
Just as an example of the difference - I wrote a naive implementation of ChaCha20 in C with zero optimization effort and it does 5cpb out of the gate (Sandy Bridge). Just using vector-types and letting GCC/Clang vectorize brings that down to ~3cpb on Sandy Bridge - no effort. The Krovetz implementation of ChaCha20 is closer to 1.2cpb on my machine, with AES-256 doing 1.0cpb using AESNI (again, my own naive implementation). All software.
Even the most hand optimized, secure AES software implementations are still in the realm of ~15-20cpb (IIRC), three-to-four times worse than the unoptimized competitor. As linked elsewhere in this thread by me, non-scientific tests show it 3x faster in software on some mobile phones. That's a lot of extra cycles-per-byte for your battery to chew through using AES-256, and I'd guess I easily churn through a low number of gigs of HTTPS data every month..
Not true. It's something like 30-40 cycles per byte to do AES without specialized instructions. That makes saturating 100mbit trivial, and saturating gigabit one of the easier problems. How much traffic can your toaster and router possibly be terminating?