“You don’t push updates on a Friday,” a computer scientist told a BBC reporter after an information technology failure caused a global blackout that disrupted not only air travel but also hospital and emergency service systems, court proceedings, financial and banking services and restaurants. More than 1,500 flights a day have been cancelled for the past three days.
An estimated 8.5 million electronic devices were affected by a flaw in a security update deployed Friday by cybersecurity firm CrowdStrike, rendering them unusable and causing costly delays, communications outages and technical problems around the world.
The outage was caused by an incomplete update to CrowdStrike’s Falcon sensor, which is designed to catch potential communications between information technology (IT) hackers and malicious software they may have installed. “The configuration was fundamentally not ready for public release, and the errors we’re seeing are due to interactions with Microsoft products, particularly the Windows operating system,” said Yameen Huq, director of cybersecurity at the Aspen Institute. Apple and Linux systems that received the same CrowdStrike update were not affected. But when the update was rolled out to millions of Windows devices, those devices were taken offline, causing major disruption and a complicated recovery process.
As technology experts rush to bring millions of affected devices back online, is this global outage a temporary phenomenon that serves as a cautionary tale of the digital age, or will it become more commonplace as we become more digitally advanced?
Coding error glitches are nothing new, but what caused this mistake to go from a minor disruption to a continent-spanning disaster? “It’s likely a combination of three big factors,” Haq said. “The process, the human talent, and the underlying technology. … Where they’ll spend their time now is looking at which of these forces were at play here, and how big a factor.” At this point, it’s too early to tell which factor was responsible for the mistake. CrowdStrike traced the issue to a bug or coding mistake that they called a “logic error” that caused the Windows system to crash. “It’s going to be really important to look at that particular process and see where in the steps we could have detected and remedied this,” Haq said.
In an ideal scenario, CrowdStrike could have fixed its predecessor’s logic errors and widespread mistakes by simply rolling out a new update. And rebooting after CrowdStrike’s fix update brought some Microsoft users’ devices back online. But only some. “Many of our customers are rebooting their systems, and they’ll boot up and become operational,” CrowdStrike CEO George Kurtz said in an interview. But some unlucky users “may experience some systems not automatically recovering, so it may take some time.”
That’s the fundamental problem facing the resolution process: If a device doesn’t automatically react to a new update, it’s likely that it will have to be done manually. “That’s the hard part, right? That’s the part that right now has to be done as manually as possible by IT,” Haq said. dispatchIf a blue screen appears [error]This is a typical result of this error, but it’s not easy to go online and fix that problem, for example.”
CrowdStrike said its “team is fully mobilized” and “actively assisting customers,” while Microsoft also announced that it has hundreds of experts working directly with customers to resolve the issues.
Technology disruptions have impacted the economy and will continue to do so. Travel delays (more than 1,500 flights were canceled in three days) undoubtedly inconvenienced customers who had important and expensive events planned. Who will end up footing the bill? “CrowdStrike has insurance, Microsoft has insurance, the airlines have insurance,” says Betsy Cooper, director of the Aspen Institute’s Technology Policy Hub and founding executive director of the Center for Long-Term Cybersecurity at the University of California, Berkeley. That said, “It’s going to be very complicated to determine where the legal liability lies, and we’re going to see years of litigation.”
But there are also macroeconomic implications: A single update had ripple effects across both geographic boundaries and across various manufacturing and service industries. The error highlighted emerging technologies and the interconnectedness of the global economy. “A single mistake by one company that works with the big tech companies can have outsized effects around the world, in part because these systems are increasingly interconnected, and a change in one system can affect many different industries and software types,” Cooper said. dispatch“I think these kinds of disruptions are inevitable in the future,” she said. “All we can really do to get ahead of them is prepare.”
If coding errors are inevitable, as Cooper suggests, how do you properly prepare for such a scenario? By compartmentalizing and preparing, Cooper said. “You want to make sure all your systems aren’t dependent on one complex piece of software,” she explained. To limit a company’s exposure to risk, they should use different software for financial services and data storage, for example. “If you have an issue with one system, you want to make sure the impact is limited and doesn’t ripple through the entire organization.”
But others blame the tech industry for its concentration in the hands of a few companies, saying the impact of the flawed update would not have been so widespread if there had been more viable alternatives to Microsoft and CrowdStrike. “Microsoft’s massive global outage today is the result of its software monopoly becoming a single point of failure for many parts of the global economy,” George Lakis, executive director of NextGen Competition, a tech industry group that opposes market consolidation, said in a statement. “For decades, Microsoft has pursued a strategy of vendor lock-in that has stifled the diversification of IT capabilities in the public and private sectors.”
Just as biological diversity helps prevent an entire species from being infected by a single disease, is it a good thing for the tech industry to produce more diverse technological systems? “There are both costs and benefits to that,” says Hack. One advantage of having a market dominated by several large companies (rather than a bunch of smaller ones) is simply scale. Big companies have bigger resources, which means more time, attention, and investment in services. “They end up using software that has more oversight,” he says. But if a mistake slips through those oversights, the impact can be more widespread. “Risky behavior is going to be at a much larger scale, by definition.”