Skip to content

Post: AWS US-EAST-1 thermal event + the DR drill that was not#1272

Merged
bobbyonmagic merged 1 commit into
mainfrom
content/post/aws-use1-az4-thermal-event
May 15, 2026
Merged

Post: AWS US-EAST-1 thermal event + the DR drill that was not#1272
bobbyonmagic merged 1 commit into
mainfrom
content/post/aws-use1-az4-thermal-event

Conversation

@bobbyonmagic
Copy link
Copy Markdown
Collaborator

Operational post (not a CVE) on the May 7-8 AWS US-EAST-1 outage. Use1-az4 had a single-hall thermal event, cooling failed, power loss damaged EC2 instances and EBS volumes. Coinbase, FanDuel, and CME Group were all hit for hours.

The hook is Coinbase's Head of Platform publicly saying their primary exchange runs in a single AZ for latency and that their backup systems "did not work as expected during the incident, extending the outage and forcing engineers to manually execute disaster recovery procedures." That sentence is the post.

Post covers:

  • The technical timeline from AWS's own status updates
  • Why "EC2 + EBS damaged on impacted hardware" is the worst-case AWS wording and what it means for runbooks
  • A multi-AZ checklist per service (EC2 ASGs, ALB cross-zone, RDS Multi-AZ + the connection-pool trap, EKS topology spread, MSK acks=all, EBS snapshots, what S3/DynamoDB do for you for free)
  • An AWS Fault Injection Simulator template that kills an AZ for 30 min with a CloudWatch stop-condition tied to a real customer-impact alarm
  • The single-AZ-for-latency exception and the four concrete requirements that make it survivable
  • A five-step "what to do this week" checklist

OG image generated. 5014/5016 tests green (2 pre-existing skipped). No em dashes, no AI-cadence words. Featured = true.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 15, 2026

Deploying devops-daily with  Cloudflare Pages  Cloudflare Pages

Latest commit: 879b649
Status: ✅  Deploy successful!
Preview URL: https://741eda32.devops-daily.pages.dev
Branch Preview URL: https://content-post-aws-use1-az4-th.devops-daily.pages.dev

View logs

@bobbyonmagic bobbyonmagic merged commit 0fc5584 into main May 15, 2026
4 checks passed
@bobbyonmagic bobbyonmagic deleted the content/post/aws-use1-az4-thermal-event branch May 15, 2026 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant