If the bottleneck is storage bandwidth that's not "slow". It's only slow if you insist on interactive speeds, but the point of this is that you can run cheap inference in bulk on very low-end hardware.
You're simply pointing out that most people who use AI today expect interactive speeds. You're right that the point here is not raw power efficiency (having to read from storage will impact energy per operation, and datacenter-scale AI hardware beats edge hardware anyway by that metric) but the ability to repurpose cheaper, lesser-scale hardware is also compelling.