In response to the lawsuit, defendants including Meta, OpenAI, and Bloomberg have argued that their actions constitute fair use. The lawsuit against EleutherAI, which initially scraped and published the books, was voluntarily dismissed by the plaintiffs.
The remaining lawsuits are still in their early stages, with questions surrounding permits and payment remaining unresolved. The Pile has since been removed from official download sites, but is still available on file-sharing services.
“Tech companies continue to exercise tyranny,” said Amy Keller, a consumer protection lawyer and partner at the law firm DiCello Levitt, who has filed lawsuits on behalf of creators whose work has allegedly been purchased by AI companies without their permission.
“People are concerned about the fact that they didn’t have a choice in this,” Keller said. “I wonder if that’s really the problem.”
Imitating a parrot
Many creatives feel anxious about their future.
Professional YouTubers monitor unauthorized uses of their work and regularly file takedown notices, and some worry it’s only a matter of time before AI starts producing similar content to the ones they create, or even outright knockoffs.
Pac-Man is The David Pakman ShowPacman saw the power of AI recently while scrolling through TikTok. He came across a video labeled as a Tucker Carlson clip, and was astonished to see it. It sounded like Carlson, even the cadence was word for word what Pacman said on his YouTube show. He was equally as surprised that only one of the video’s commenters realized it was fake, a clone of Carlson’s voice reading Pacman’s script.
“This is going to be a problem,” Pakman said in a YouTube video he made about the fakes. “You can do this to basically anyone.”
EleutherAI co-founder Sid Black wrote on GitHub that he used a script to create YouTube subtitles. The script downloads subtitles from YouTube’s API in the same way that a YouTube viewer’s browser downloads subtitles when they watch a video. According to the GitHub documentation, Black used 495 search terms to sift through the videos, including “funny blogger,” “Einstein,” “black protestant,” “social protection services,” “Infowars,” “quantum chromodynamics,” “Ben Shapiro,” “Uighur,” “frutarian,” “cake recipes,” “Nazca Lines,” and “flat Earth.”
Although YouTube’s terms of service prohibit “automated means” from accessing the videos, more than 2,000 GitHub users have bookmarked or endorsed the code.
“If that’s what YouTube wants to do, there are lots of ways to prevent this module from working,” machine learning engineer Jonas DePova said in a GitHub discussion, where he published the code Black used to access YouTube subtitles. “So far, that hasn’t happened.”
In an email to Proof News, DePois said he hadn’t used the code since he wrote it for a project as a college student several years ago and was surprised that people were finding it useful. He did not respond to questions about YouTube’s rules.
In an emailed response to a request for comment, Google spokesman Jack Maron said the company has taken “steps to prevent abusive, unauthorized scraping for many years.” He did not respond to questions about whether other companies use the material as training data.
Among the videos used by the AI company: Einstein ParrotThe channel has about 150,000 subscribers. Marsha, the parrot’s caretaker, who didn’t want to use her last name for fear of endangering the famous bird’s safety, said she initially thought it was funny when she learned the AI model had ingested the mimicking parrot’s words.
“Who wants to use the voice of a parrot?” Marcia says. “But I know that parrots are very good at speaking. They speak in my voice. So the parrot is speaking in my voice. And the AI is imitating the parrot’s voice.”
Once data is captured by an AI, it can’t be forgotten, and Marcia was worried that her bird’s information could be used in a variety of unknown ways, such as to create a digital replica of her parrot or to put a curse on it.
“We’re entering uncharted territory,” Marcia said.