Introduction
On March 1, 2026, AWS confirmed that drone strikes had damaged its Middle East data centers in UAE (me-central-1) and Bahrain (me-south-1), causing one of the largest cloud outages in AWS history.
We were running a 100K-user Open edX platform in me-south-1. This is the story of how we migrated to us-east-1 in 48 hours.
AWS Middle East region — me-south-1 (Bahrain) was our primary production region
What Failed
At 03:14 UTC, all EC2 instances in me-south-1 became unreachable. RDS went into a failover loop. S3 cross-region replication to us-east-1 was our only saving grace — course content and media were already mirrored.
The platform was completely down for 6 hours before we made the call: do not wait for AWS, migrate now.
Migration Steps
Hour 0–6: Assessment
- Confirmed S3 data was intact in us-east-1
- Took latest RDS snapshot (from 2 hours before outage)
- Inventoried all services: EC2, RDS, ElastiCache, ALB, Route 53
Hour 6–24: Rebuild in us-east-1
Spun up new VPC with matching CIDR blocks, created new EC2 instances using our Ansible playbooks, restored RDS from snapshot.
# Restore RDS snapshot to new region
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier platform-prod-restored \
--db-snapshot-identifier rds:platform-prod-2026-03-01-01-00 \
--db-instance-class db.r6g.xlarge \
--region us-east-1
Hour 24–48: DNS Cutover
Updated Route 53 health checks, pointed ALB to new instances, changed TTL to 60 seconds, then flipped DNS.
Running emergency migration scripts at 2AM during the outage
What We Learned
- S3 cross-region replication is non-negotiable. It saved us. Everything else could be rebuilt. Data cannot.
- Ansible playbooks for everything. We rebuilt 12 EC2 instances in 4 hours because every config was in code.
- Cold standby is not enough. We now run a warm standby in us-east-1 — spun down but ready to go live in 30 minutes.
- RTO vs RPO. We had 2-hour RPO (last snapshot). Acceptable. RTO was 42 hours. Not acceptable. Now targeting 4 hours.
Multi-region failover in action — what we wished we had set up earlier
Current DR Setup
After this incident we implemented:
- Warm standby in us-east-1 (always running, scaled down)
- RDS Multi-AZ with cross-region read replica
- Route 53 health checks with automatic failover
- S3 cross-region replication on every bucket
- Monthly DR drills — we actually cut over and back every quarter