Apple, NVIDIA, and Anthropic reportedly used YouTube transcripts without permission to train AI models

Some of the world’s largest technology companies trained AI models on datasets that included unauthorized transcripts of more than 173,000 YouTube videos, a new study has found. Proof News The dataset, created by nonprofit EleutherAI, contained transcripts of YouTube videos from over 48,000 channels and was used by companies like Apple, NVIDIA, and Anthropic. The findings of the investigation highlighted an uncomfortable truth about AI: that AI technology is built on data siphoned from creators without their consent or compensation.

The dataset does not include any YouTube videos or images, but it does include video transcripts from some of the platform’s biggest creators, such as Marques Brownlee and MrBeast, as well as major news publishers, such as: The New York Times, BBCand ABC NewsSubtitles from Engadget videos are also part of the dataset.

“Apple sources data for its AI from multiple companies,” Brownlee wrote on X. “One of these companies harvests a ton of data and transcripts from YouTube videos, including mine,” he added. “This will be a long-term, evolving issue.”

Apple sources data for its AI from multiple companies

One of them scraped a ton of data and transcripts from YouTube videos, including mine.

Apple technically avoids the “flaw” because it doesn’t scrape.

But this will be an evolving issue for a long time https://t.co/U93riaeSlY

— Marques Brownlee (@MKBHD) July 16, 2024

A Google spokesperson told Engadget that YouTube CEO Neal Mohan’s previous comments that companies that use YouTube data to train AI models violate the platform’s terms of service still stand. Apple, NVIDIA, Anthropic, and EleutherAI did not respond to Engadget’s requests for comment.

Until now, AI companies have not been transparent about the data they use to train their models. Earlier this month, artists and photographers criticized Apple for not disclosing the origins of the training data for Apple Intelligence, the company’s proprietary generative AI that will be included in millions of Apple devices this year.

In particular, YouTube, the world’s largest video repository, is a treasure trove of audio, video and images, as well as transcripts, making it an attractive dataset for training AI models. Earlier this year, OpenAI’s Chief Technology Officer Mira Murati said: The Wall Street Journal The company was asked about whether it used YouTube videos to train OpenAI’s upcoming AI video generation tool, Sora. “I won’t go into the details of the data used, but it was publicly available or licensed data,” Murati said at the time. Alphabet CEO Sundar Pichai also said that companies that use YouTube data to train AI models would violate the platform’s terms of service.

If you want to see if the dataset includes subtitles for YouTube videos or your favorite channels, visit Proof News’ search tool.

Update, July 16, 2024 3:17 PM PST: This story has been updated to add a statement from Google.

Source link

What's Hot

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

Apple, NVIDIA, and Anthropic reportedly used YouTube transcripts without permission to train AI models

Generative AI coding startup Magic raises $320M in investment from Eric Schmidt, Atlassian and others

It’s time for streaming services to tackle AI music

Nvidia CFO says ‘enterprise AI wave’ has begun and Fortune 100 companies are leading the way

California Passes Landmark Bill to Regulate Large-Scale AI Models | Artificial Intelligence (AI)

Google employees say AI conferencing tool gives executives easy questions

Salesforce rises as software company bets on AI tools to drive growth

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

The Supreme Court has indicated it would side with Trump if the election is close.

AdsPower: See you at Affiliate World Europe 2024 in Budapest!

TEMU Affiliate Program 2024: Earn up to £100,000 per month!

Hard Bacon files for bankruptcy as Google search changes strain affiliate marketing business

Getting Started in Affiliate Marketing: How to Make Passive Income in 2024

Our Picks

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

Most Popular

Working It guide to AI at work

Meta AI is fun, accessible, and free. Maybe it’s time to make AI chatbots a part of your life | Technology News

Generative AI Might Be Overrated

Subscribe to Updates

What's Hot

Apple, NVIDIA, and Anthropic reportedly used YouTube transcripts without permission to train AI models

Related Posts