The mission of the hackathon is to create a program that can scan millions of lines of open source code and identify and fix security flaws without human intervention. Successful competitors stand to win millions of dollars in the two-year competition, sponsored by the Defense Advanced Research Projects Agency (DARPA).
The contest is one of the clearest signals yet that the government sees flaws in open source software as one of the country’s biggest security risks, and that it believes artificial intelligence is essential to solving them.
Free, open-source programs such as the Linux operating system help run everything from websites to power plants, and while the code is not inherently inferior to the code found in proprietary programs from companies like Microsoft or Oracle, there is a shortage of skilled engineers to test it.
As a result, poorly managed free code has been at the root of some of the most expensive cybersecurity breaches in history, such as the 2017 Equifax disaster, in which the personal information of half of the U.S. population was exposed. The incident led to the largest data breach settlement in history, costing the company more than $1 billion in remediation and fines.
While humans may not be able to keep up with all the code woven into every industry sector, DARPA hopes machines can.
“The goal is to build an end-to-end ‘cyber inference system’ that leverages large-scale language models to find vulnerabilities, prove they are vulnerabilities, and apply patches,” explained one of the project’s professors, Jan Shoshtashvili of Arizona State University.
To get there, the team is wrestling with the often harsh realities behind their lofty AI ambitions: The students impose “sanity checks” to try to discern illusions, verifying that patches actually fix the problems they’re supposed to, and having two AI systems debate with each other about the best way to fix something before a third AI decides the winner.
“AI is like a 3-year-old with infinite knowledge,” says Lucas Dresel, a graduate student at the University of California, Santa Barbara and co-captain of the team. “You need to give it actionable feedback.”
Team Shellfish is one of about 40 participants in a competition called the Artificial Intelligence Cyber Challenge (AIxCC) run by DARPA, the Defense Department research agency in charge of developing and defending against covert weapons.
“Open source is so prevalent in critical infrastructure sectors that we want to redefine how we secure widely used and valuable code bases,” said Andrew Carney, DARPA project manager for the competition.
DARPA helped create the internet to overcome communications bottlenecks, but it became painfully clear that the net also brought with it major weaknesses.
With no built-in security, pervasive interconnectivity allows anyone, or anything, to start from anywhere, to find a way into the machines that run our modern world. Once inside, a user can pose as an employee or system administrator, steal national or trade secrets, and shut down the place or hold it for ransom.
Hackers claim more victims than ever before: Number of data breaches reported to the US government The U.S. Internet Crime Complaint Center, run by the FBI, tripled between 2021 and 2023. Power plants, water supplies. Criminal organizations, fat on illicit profits, are not afraid to destroy hospitals and send desperate patients elsewhere.
Open source software, whether written by students or visionary geniuses, is as pervasive as the Internet itself and, by some estimates, is embedded in 90% of commercial software.
Like all software, this software has bugs, some of which can be exploited to take control of your machine.
Several large open source projects are run by volunteer armies roughly the size of Wikipedia and are generally in good health, some with maintainers who are subsidized by users at large corporations as a job.
And then there’s everything else, including programs written as homework by authors I barely remember.
“Open source has always been ‘use at your own risk,'” says Brian Behlendorf, who founded the Open Source Security Foundation after decades of maintaining open source security. Pioneering free server software, Apache, and other projects at the Apache Software Foundation.
“It’s not freedom of speech, or even freedom of beer,” he said. “It’s freedom of a puppy, which needs to be cared for and fed.”
Two very different incidents have recently highlighted the risks.
The first vulnerability exists in Log4j, a small program for tracking system activity that is used by thousands of software developers and installed on millions of machines.
In 2013, a user proposed adding code to Log4j, and the small team at the Apache Foundation that maintains Log4j approved it. In November 2021, a Chinese engineer noticed a serious design flaw in the added section that could allow a system takeover, and reported the issue to the Apache Group.
While Apache was working on a patch to fix the problem, an unidentified researcher discovered the pending changes and developed a malicious tool to take control of computers running Log4j. Apache rushed to release the patch, setting off a race between thousands of defenders and anyone trying to exploit the flaw before it was fixed.
Many Log4j instances have yet to be patched, and on Thursday the National Security Agency and others warned that North Korean spies were compromising U.S. web servers that were still running older versions.
The White House Cyber Security Review Board concluded that only better coding and thorough audits could have stopped the spread of the Log4j flaws, and that open source efforts like Apache’s “need continued financial support and expertise.”
The Department of Homeland Security’s Cybersecurity and Infrastructure Security Agency (CISA) has offered small grants to startups and encouraged companies to open up about what’s inside their software, but these efforts have been slow to move forward.
The vulnerability only came to light recently in March, when a Microsoft engineer traced a slight increase in processor utilization to a recently updated open-source tool for Linux. He discovered that the tool’s official maintainer had planted a spy backdoor, and blew the whistle in time to stop the backdoor from being built into the most popular version of Linux.
In what was a nightmare scenario for security experts, the anonymous maintainer gained control of the project after years of contributions, with the help of a secret ally who lobbied the previous manager to hand over power.
With open source security becoming a top priority for CISA and national security agencies, OpenAI and Microsoft have unleashed ChatGPT and generative artificial intelligence onto the world.
By democratizing programming, new tools have made it possible for non-programmers to create software. AI has also helped existing programmers, including criminal hackers, who can pick up tricks to exploit vulnerabilities more quickly and deliver more convincing bait, such as emails that appear to be sent from regular contacts with shared interests.
AI can also facilitate defensive activities, such as analyzing large volumes of logs to detect anomalous behavior and summarize security incidents, and it can also flag security mistakes during programming.
But knowing where security holes in open source programs are before attackers find them is the Holy Grail for DARPA and AIxxCC participants.
DARPA ran a Cyber Challenge at its Def Con hacking convention in 2016, pitting programs against each other in a “capture the flag” competition, hacking into an artificial environment.
In this year’s competition, teams will use AI-enhanced programs to understand and improve millions of lines of real-world code.
Shellphish was one of seven teams that wrote papers outlining an approach good enough to win $1 million in funding, a step that will culminate in the semifinals at Def Con in August, which attracted 40 entries; the winner will be awarded an additional $2 million in 2025.
Part of Shellphish’s initial $1 million went towards an Airbnb home in Brea where the hackers stayed for three weeks in June and another two in July. More money was spent on a massive test environment using 5,000 central processing unit cores.
Shellphish is more than just a collective of hackers: Despite deep ties to two public universities with a shifting membership, the team has been around for 20 years and the founders are still involved.
Giovanni Vigna, originally from Italy, taught computer security, including offensive and defensive techniques, at the University of California, Santa Barbara. He founded the Capture the Flag team in 2003 to generate interest and develop students’ skills. The team won the DEFCON competition in 2005 and hosted it for the next four years.
As his students graduated and spread across Arizona and elsewhere, some stayed involved or got their own students involved.
Shellphish competed in the original Cyber Grand Challenge in 2016 but was eliminated before the finals.
“We had all these great tools, but we didn’t have time to integrate them,” Shoshtaishvili recalls. “So my No. 1 advice was ‘don’t get nerd-baited.'” (Nerd-baiting refers to distracting engineers with interesting problems.)
Central to this effort are tools known in the security world as “fuzzers,” which feed all kinds of data into a program to see how it responds when something unexpected happens.
Even the most sophisticated fuzzing tools have a hard time finding the most subtle flaws and deliberate backdoors, team members acknowledge. At their best, Shellfish Master and other programs will be able to quickly find many of the low-hanging fruit and remove them before malicious hackers can exploit them.
“AI will be able to solve problems that would take humans months to solve,” Dressel said.
Under the terms of the DARPA competition, all finalists must release their programs as open source so that software vendors and consumers can run them.
Yang compared the expected progress to security milestones such as mandatory software updates and browser “sandboxes” that prevent web programs from escaping the browser and running elsewhere on a user’s device.
AI won’t be able to make all software safe, he says, but it will give humans time to try things out.
After spending one final, sleepless night debugging and hastily making last-minute fixes, Shellphish submitted their program just before the 9 a.m. deadline. They’ll find out in a few weeks at the next Def Con in Las Vegas whether they’ve made it to the finals. Win or lose, their AI-assisted code will be available to others, improving security for everyone.