General, News

AWS explains why it suffered a network outage on 7th December

Robin-Leigh Chetty

13th December 2021

On 7th December, Amazon Web Services (AWS) suffered a significant network outage which knocked out several third-party services linked to or hosted by the hyperscaler, including the likes of its digital assistant Alex and the Disney+ streaming service.

As neither of these solutions are officially available in South Africa, local customers remained relatively unaffected by the outage, but it is still noteworthy as it is the second big tech firm to suffer one in recent months, following Facebook in October.

According to AWS, which detailed the reason for the outage in a detailed press statement this week, an “automated activity” is to blame.

“At 7:30 AM PST, an automated activity to scale capacity of one of the AWS services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network,” it explained.

“This resulted in a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network, resulting in delays for communication between these networks. These delays increased latency and errors for services communicating between these networks, resulting in even more connection attempts and retries. This led to persistent congestion and performance issues on the devices connecting the two networks,” added the hyperscaler.

The issue was in fact so severe, that AWS noted that its ability to track the outage in real-time with monitoring software was also impacted, leading to the long downtime last week.

Either way, it looks like there is no concern as far as compromised data is concerned, which is often an area that one worries about whenever a large network the size of Amazon’s is affected.

“We have taken several actions to prevent a recurrence of this event. We immediately disabled the scaling activities that triggered this event and will not resume them until we have deployed all remediations,” says AWS.

“We want to apologize for the impact this event caused for our customers. While we are proud of our track record of availability, we know how critical our services are to our customers, their applications and end users, and their businesses. We know this event impacted many customers in significant ways. We will do everything we can to learn from this event and use it to improve our availability even further,” it concluded.

As trusted a vendor as AWS is, this most recent incident is a reminder for companies to always have a plan B should an issue like this arise.

About Author

Robin-Leigh Chetty

Editor of Hypertext. Covers smartphones, IoT, 5G, cloud computing and a few things in between. Also a keen photographer and dabbles in console games when not taking the hatchet to stories.