Michael Nagle/Bloomberg/Getty Images
CrowdStrike’s global outage on Friday was bad. It could have been a lot worse.
This article was featured in CNN Business’ Nightcap newsletter. Sign up for free to get it delivered to your inbox. here.
new york
CNN
—
If recent events like an assassination attempt, a new Republican vice presidential candidate, and a sitting president contracting COVID-19 before he had the chance to run for reelection weren’t enough to make you anxious about the fragility of the world order, let’s not forget that a cybersecurity company you’ve probably never heard of made a huge mistake by demonstrating how the internet can suddenly go down without warning.
If you haven’t heard of CrowdStrike before, you won’t soon forget it: The company was the company behind arguably the largest computer outage in history, triggering the kind of technology collapse its products are designed to prevent, thanks to a single bug in a routine software update.
CrowdStrike says the flawed update has been rolled back, but the problems it caused are nothing like the old-fashioned “turn it off, turn it back on” solution that many of us are familiar with. As my colleague Brian Fung reported, the bugs that put Windows computers into blue screen of death mode can be fixed, but often require painstaking human effort.
Now might be a good time to treat your IT staff to some nice coffee and bagel spread, because administrators will likely need to assess all affected devices (which could number thousands, depending on your organization), reboot them into safe mode, and then manually delete the problematic files.
“You can’t automate that,” Kevin Beaumont, a security researcher and former Microsoft threat analyst, said in a post on X. “So this is going to be very painful for CrowdStrike customers.”
Even if your business had nothing to do with CrowdStrike, the outage could have ruined your day.
Imagine a cafe that uses a third-party online reservation service, outsources delivery orders, and accepts credit and debit cards through a POS that’s connected to a payment processor’s back-end system. You don’t have to be a CrowdStrike customer to be affected by the company’s mistakes, and Friday’s outage was frustrating in exactly that way.
There have been terrible outages before, and there will surely be more in the future, but the scale of the CrowdStrike outage highlights just how interconnected the world is through networks that most of us don’t understand and that are largely self-regulating.
“There are institutions on which we depend so much that we don’t realize how dependent we are until they fail,” said Stuart Madnick, a professor of information technology at MIT’s Sloan School of Management.
Microsoft estimated that the CrowdStrike outage affected about 8.5 million Windows devices. Airlines canceled 5,000 flights worldwide on Friday, with delays continuing through the weekend and into Monday. Hospital and government services were limited, and 911 communications were knocked out in some areas.
It’s easy to pin all the blame on them — CrowdStrike’s sloppy system updates, airlines’ failure to build robust backup protocols, or Microsoft’s monopoly on the personal computing market — but IT experts told me there are broader systemic issues at play here.
The concentration of cybersecurity companies now “creates some big points of failure,” said Anil Khurana, executive director of the Baratta Center for International Business at Georgetown University’s McDonough School of Business. “That’s not a bad thing in itself, because the proliferation of companies makes it more difficult to diagnose.”
But companies need “better models of operational redundancy and backup,” Khurana said. “Our technology platforms have a mix of legacy and modern systems, and the weakest link determines the performance of the whole system. I call this the ‘Trump of Trumps model.'”
While safeguards are currently in place, regulators around the world are failing to manage cybersecurity risks. Khurana said IT systems are indeed critical infrastructure and “should be subjected to the same rigor, testing and oversight that we see in companies like Boeing and JP Morgan.”
I asked Madnick whether the world should expect more major blackouts.
“It’s pretty bad as it is,” he said. “Can it get any worse? Yes, it can.”
Manually rebooting millions of devices is a tedious and time-consuming task, but Friday’s outage was ultimately a one-time mistake by the company, which responded quickly to fix it.
Bad actors looking to do serious damage could use the software to “cause computers or other equipment to explode, catch fire or burst into flames – destroyed rather than simply restarted.”
Well, there’s one nightmare scenario that would make us all long to live in a cave. But before we stock up on canned goods, Madnick offers a different look at our modern predicament.
“These technologies give us a lot of advantages that are really beneficial 99 percent of the time,” he said. Most importantly, he said, prepare for the 1 percent chance that things go wrong.