CrowdStrike hires outside security outfits to review troubled Falcon code

And reveals more and more about small mistake that bricked 8.5M Windows boxes

CrowdStrike has hired two outside security firms to review its threat-detection suite Falcon that sparked a global IT outage last month – though it may not have an awful lot to find, because CrowdStrike has identified the simple mistake that caused the meltdown.

News of the external review emerged in a root causes analysis [PDF] published on Tuesday by the infosec vendor.

As we learned from CrowdStrike's earlier post-incident write-up of the flawed Falcon update, which boot-looped millions of Windows machines worldwide, the problem began back in February.

That was when the developer added to Falcon the ability to spot and block the novel exploitation of named pipes and other Windows interprocess communication (IPC) mechanisms; seeing such attacks occur in the wild is a strong indication that a box has been compromised, which is good thing to flag up and stop.

That new detection functionality went through the usual development and testing before CrowdStrike pushed it as a new "template type" to customers' Falcon installations in sensor version 7.11.

These template types are as the name suggests: Templates. They are generalized software routines, each picking up a different type of potentially bad activity on a system. For Falcon to use them to detect specific threats, so-called "template instances" are defined and issued by CrowdStrike that customize the template code to identify particular forms of exploitation, intrusions, and other bad stuff.

CrowdStrike explains this architecture thus: "Template types represent a sensor capability that enables new telemetry and detection, and their runtime behavior is configured dynamically by the template instance."

Since March, CrowdStrike has pushed from its cloud to remote Falcon deployments a few template instances that made use of the IPC template type code to detect specific threats. These updates, delivered as so-called Rapid Response Content, were stored in a channel file numbered 291. Falcon would download an updated channel 291 file when made available, and parse its data.

The template instances in that data would tell Falcon how to use the relevant template types to detect particular threats. The instances would do this by passing parameters in regex format to their template type. The template types, with the help of a C++-based content interpreter, use those regex – yes, regular expression – parameters against whatever resources the types are monitoring to determine whether a successful detection was made.

The root causes analysis provides a deeper look at what went wrong next:

The new IPC template type defined 21 input parameter fields, but the integration code that invoked the content interpreter with channel file 291's template instances supplied only 20 input values to match against.

This parameter count mismatch evaded multiple layers of build validation and testing, as it was not discovered during the sensor release testing process, the template type (using a test template instance) stress testing, or the first several successful deployments of IPC template instances in the field.

In part, this was due to the use of wildcard matching criteria for the 21st input during testing and in the initial IPC template instances.

From what we can tell, this means: The template type detecting malicious IPC use had 21 possible input values to customize its actions, though the code plugging the channel file's instance parameters into the interpreter to use with that template type only provided 20. For the initial instances, this wasn't a problem as the instances didn't cause the interpreter to use the missing 21st parameter. All seemed fine. Early testing and validation also missed this.

Then, as CrowdStrike previously explained, two further IPC template instances were automatically deployed to Falcon users in that fateful channel 291 file update on July 19.

One of these instances instructed the interpreter, for the first time, to make use of the 21st parameter, but only 20 were provided to that code. That caused the content interpreter, running in Windows kernel mode unfortunately, to use an uninitialized field – the missing 21st parameter – as a pointer, which caused it to touch unallocated memory and ultimately crash the operating system.

"The attempt to access the 21st value produced an out-of-bounds memory read beyond the end of the input data array and resulted in a system crash," the security shop summarized in its analysis.

CrowdStrike updated its sensor content compiler to ensure that in future template types get the correct number of inputs from instances, and this went into production on July 27.

CrowdStrike also wrote that it has added runtime bounds checking to the content interpreter for Rapid Response updates, to ensure it doesn't read off the end of its input array again. This fix and another check that the array size is correct are being backported to all Windows sensor versions 7.11 and above with a sensor software hotfix. The release will be generally available by August 9.

Additionally, the chastened security vendor is doing more internal testing to ensure flawed files aren't pushed to Falcon customers in the future. Despite the mismatch in parameters, CrowdStrike's validation engine missed that blunder, and allowed a faulty channel file to go out to users.

Further, as CrowdStrike had noted in its earlier analysis, every template instance will henceforth be deployed to customers in a staged roll-out, rather than being pushed to all folks all at once. That will reduce the blast radius of any more broken updates.

It's worth noting the biz is being sued by investors for not using this type of phased approach in the first place.

"Looking ahead, CrowdStrike is focused on using the lessons learned from this incident to better serve our customers," a spokesperson declared. "CrowdStrike remains steadfast in our mission to protect customers and stop breaches."

But not so steadfast that it’s naming the partners it hired to review its programming. Those reviews have commenced, and are focused on the code and processes that led to the July 19 fiasco.

"We are not providing information on the vendors who are doing work for us beyond what is referenced in the root causes analysis," the CrowdStrike spokesperson told The Register. ®

More about

TIP US OFF

Send us news


Other stories you might like