Coding

AWS data center outage hits trading on Fanduel, Coinbase

A 4-hour Amazon Web Services (AWS) data center outage in the US East region crippled high-frequency trading on Fanduel and Coinbase, highlighting the fragility of cloud-based financial infrastructure. The disruption, which affected a cluster of EC2 instances and S3 storage, underscores the critical role of cloud services in modern finance. The incident serves as a stark reminder of the need for robust disaster recovery and redundancy in cloud-based systems.

An Amazon Web Services (AWS) data center outage in the US East region caused a four-hour disruption affecting multiple financial platforms, including FanDuel and Coinbase. The incident, which impacted EC2 instances and S3 storage, exposed vulnerabilities in cloud-dependent trading systems and raised concerns about infrastructure resilience.

Overview

The outage occurred on May 8, 2026, in AWS’s US East (Northern Virginia) region, one of the company’s largest and most widely used cloud zones. According to AWS’s status dashboard, the disruption began at approximately 10:14 AM ET and lasted until 2:18 PM ET. The root cause was traced to a failure within a cluster of EC2 compute instances, which in turn triggered cascading issues with S3 storage availability and latency.

Services relying on low-latency access to cloud infrastructure experienced significant degradation. FanDuel, a sports betting platform, reported delays in odds updates and bet settlement during peak trading hours. Coinbase, a major cryptocurrency exchange, confirmed intermittent API failures and delayed transaction confirmations, affecting high-frequency trading operations.

What it does

The affected AWS components—EC2 and S3—are foundational to most cloud-hosted applications:

  • EC2 (Elastic Compute Cloud): Provides scalable virtual servers for running applications. The outage impacted a subset of instances, leading to failed launches, unexpected terminations, and elevated error rates.
  • S3 (Simple Storage Service): Offers object storage with high durability. During the incident, S3 experienced increased error rates and latency spikes, particularly for cross-region replication and retrieval operations.

The combined failure disrupted backend systems that depend on rapid data access and compute availability, especially those with tight latency requirements such as financial trading platforms.

Tradeoffs

The incident highlights systemic tradeoffs in cloud architecture:

  • Centralization risk: The US East region hosts a disproportionate share of critical infrastructure, making it a single point of failure for many services.
  • Dependency chains: Even services not directly hosted on AWS can be affected if they rely on third-party tools or APIs that are.
  • Recovery limitations: While AWS offers tools for multi-region failover, not all customers implement them due to cost, complexity, or performance constraints.

Neither FanDuel nor Coinbase reported data loss, but both acknowledged service degradation during the window. AWS stated that no physical damage occurred and that the issue was resolved through internal recovery procedures.

When to use it

This event underscores the importance of:

  • Designing for region failure: Implementing active-passive or active-active architectures across multiple AWS regions.
  • Testing disaster recovery plans: Regular failover drills can expose gaps in redundancy setups.
  • Monitoring third-party dependencies: Organizations should assess the cloud footprint of their vendors and partners.

AWS has not announced changes to its service-level agreements or compensation for affected customers.

Bottom line: Cloud infrastructure remains critical to financial operations, but reliance on a single provider or region introduces measurable risk. Proactive redundancy planning is no longer optional for latency-sensitive applications.

Similar Articles

More articles like this

Coding 1 min

Open Source Resistance: keep OSS alive on company time

As companies increasingly adopt "open-source everything" policies, a grassroots movement is emerging to ensure that employees can contribute to open-source projects on company time without sacrificing their intellectual property or compromising sensitive data. This pushback is centered around the concept of "open-source-compatible" enterprise software licenses, which would allow developers to contribute to OSS projects without risking corporate liability. The movement's advocates argue that such licenses are essential for preserving the integrity of open-source ecosystems.

Coding 2 min

The limits of Rust, or why you should probably not follow Amazon and Cloudflare

Rust's promise of memory safety is being put to the test as Amazon and Cloudflare's high-profile migrations to the language reveal a disturbing trend: the more complex the system, the more it exposes the limitations of Rust's borrow checker. Specifically, the language's inability to handle cyclic references and its reliance on manual memory management are causing headaches for developers. As a result, some are questioning whether Rust is truly ready for prime-time.

Coding 1 min

The AI Backlash Could Get Ugly

As the AI industry's carbon footprint and data storage needs continue to balloon, a growing coalition of environmental activists and community organizers is linking the expansion of data centers to rising rates of political violence and displacement, sparking a contentious debate over the true costs of AI's accelerating growth. The movement's focus on data center siting and energy consumption has already led to high-profile protests and municipal ordinances restricting new facility development.

Coding 1 min

Software Developers Say AI Is Rotting Their Brains

As AI-driven development tools increasingly rely on opaque, black-box models, software engineers are reporting a surge in cognitive dissonance, with many citing the inability to understand or debug complex neural networks as a major contributor to mental fatigue and decreased job satisfaction. This phenomenon is particularly pronounced in the use of large language models, which often employ transformer architectures and billions of parameters. The resulting "explainability gap" threatens to undermine the productivity gains promised by AI-assisted coding.

Coding 2 min

My graduation cap runs Rust

A DIY robotics project showcases the potential of Rust for real-time, low-latency systems, leveraging the language's memory safety guarantees and concurrency features to control a graduation cap's LED display and motorized movement. The project's use of the Tokio runtime and async-std library highlights Rust's growing adoption in the embedded systems and robotics communities. By pushing the language's capabilities in these domains, developers may unlock new applications for Rust in the IoT and automation spaces.

Coding 1 min

When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug

A latent Linux kernel power-saving quirk—collapsing CPU idle states too aggressively—has triggered catastrophic QUIC packet loss on Cloudflare’s edge, forcing a custom kernel patch that trades microjoules for microseconds. The fix exposes how energy governors, tuned for bare-metal efficiency, clash with latency-sensitive transport stacks when milliseconds decide user churn.