advertisement
Facebook
X
LinkedIn
WhatsApp
Reddit

CrowdStrike explains how it bricked millions of PCs

  • CrowdStrike has published a Root Cause Analysis following an error that bricked millions of PCs in July.
  • The problem was introduced in February and went undetected until July, when a software update triggered a memory read error.
  • CrowdStrike has delayed several ways it will improve the deployment of its software into the wild.

On a Friday morning last month, CrowdStrike deployed a software update that would remind the world why software updates should never be pushed out at the weekend.

What followed was a global outage that affected hospitals, airlines and countless other businesses that use Windows and CrowdStrike. Now, weeks after the event, CrowdStrike has published a Root Cause Analysis of the error that crashed millions of PCs.

The problem begins in February when CrowdStrike introduced a new Template Type to detect novel attack techniques that abuse Windows’ interprocess communication mechanisms.

This template and Content Validator defined 21 input parameter field, but due to the limitations of the Content Interpreter – which uses the Template Type – only 20 fields were supplied to be matched against. This problem passed through validation and testing and went undiscovered, until 19th July.

On that fateful day, two additional Template Instances for Windows’ interprocess communication mechanisms were deployed and one of these templates introduced criteria for matching a 21st parameter. All of a sudden, CrowdStrike’s software was looking for something that never existed and as a result, systems crashed as a memory read error was triggered.

The above is a cliff notes summary so we highly recommend you read the full explanation here.

“In summary, it was the confluence of these issues that resulted in a system crash: the mismatch between the 21 inputs validated by the Content Validator versus the 20 provided to the Content Interpreter, the latent out-of-bounds read issue in the Content Interpreter, and the lack of a specific test for non-wildcard matching criteria in the 21st field. While this scenario with Channel File 291 is now incapable of recurring, it also informs process improvements and mitigation steps that CrowdStrike is deploying to ensure further enhanced resilience,” explains CrowdStrike.

While CrowdStrike owns up to its own errors and failings, it did take a beat to criticise Windows and the need for companies like CrowdStrike to run software at the kernel level.

“Significant work remains for the Windows ecosystem to support a robust security product that doesn’t rely on a kernel driver for at least some of its functionality. We are committed to working directly with Microsoft on an ongoing basis as Windows continues to add more support for security product needs in userspace,” the company said.

Following this incident, CrowdStrike says it’s taking the following actions moving forward:

  • Update Content Configuration System test procedures. This work has been completed. This includes upgraded tests for Template Type development, with automated tests for all existing Template Types. Template Types are part of the sensor and contain predefined fields for threat detection engineers to leverage in Rapid Response Content.
  • Add additional deployment layers and acceptance checks for the Content Configuration System. This work has been completed with an updated deployment ring process, ensuring Template Instances pass successive deployment rings before rollout into production.
  • Provide customers additional control over the deployment of Rapid Response Content updates. New capabilities have been implemented and deployed to our cloud that allow customers to control how Rapid Response Content is deployed, with additional functionality planned for the future.
  • Prevent the creation of problematic Channel 291 files. Validation for the number of input fields has been implemented to prevent this issue from happening.

Chief executive officer, George Kurtz also apologised to customers who were impacted by the outage. The CEO reports that 99 percent of Windows sensors are back online so it looks as though most PCs running CrowdStrike are operational once more.

“We are deeply sorry for the impact this had on you. Nothing is more important than regaining your trust and confidence. Since our founding, we have always put customer protection at the forefront. This has been our North Star, and it continues to be our focus every single day,” said Kurtz.

Last week investors filed suit against CrowdStrike claiming to have lost money as CrowdStrike’s share price fell 32 percent in just 12 days.

If anything, CrowdStrike’s analysis of the event will likely fuel further lawsuits. We can’t say we’re surprised at that.

advertisement

About Author

advertisement

Related News

advertisement