This seems to have only affected one region. Am I missing something?

dangrossman · on Nov 19, 2014

Yes. It started as a failure in one region, and propagated to others as it overloaded the "control plane" -- the stuff that runs "the cloud", and EBS tried to replicate "failed" disks to the point that Amazon ran out of disk space in the cluster. At the time, I was paying for RDS Multi-AZ which runs your database in multiple availability zones at once with hot failover if the primary goes offline. It failed to fail over despite that. Many large sites went down for a very long time that day, and people couldn't spawn replacement instances even in other AZs than the one the failure started in.

jedberg · on Nov 19, 2014

You're confusing region with AZ. They've never had a multi-region outage (yet).

parhamn · on Nov 19, 2014

It was one region, multiple availability zones. You're right Multi-AZ != Multi-region (for things like Sandy and natural disasters) BUT they have mostly separate infra which does make multi-az failures very unlikely. Due to its simplicity (with VPC and stuff) some people (perhaps wrongly) treat multi-az like multi-region.