Microsoft's AI voice cloning tech is so good you can't use it

Microsoft’s research team has unveiled VALL-E 2, a new AI system for speech synthesis capable of generating “human-level” voices with just a few seconds of audio, indistinguishable from the source.

“(VALL-E 2 is) the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech (TTS), achieving human parity for the first time,” the research paper reads. The system builds on its predecessor, VALL-E, which was introduced in early 2023. Neural codec language models represent speech as sequences of code.

According to the team, what sets VALL-E 2 apart from other voice cloning techniques is its “Repetition Aware Sampling” sampling method and adaptive switching between sampling techniques. These strategies improve consistency and address the most common issues of traditional generative voice.

“VALL-E 2 consistently synthesizes high-quality speech, even for sentences that are traditionally difficult due to their complexity or repetitive phrases,” the researchers wrote, noting that the technology could help generate speech for people who are losing the ability to speak.

As impressive as it is, the tool will not be made available to the public.

“We currently have no plans to integrate VALL-E 2 into a product or broaden public access,” Microsoft said in its ethics statement, noting that such tools carry risks such as impersonating voices without consent and using convincing AI voices in scams and other criminal activities.

The research team stressed the need for a standard method to digitally label AI generations, acknowledging that detecting AI-generated content with high accuracy still remains a challenge.

“If the model is generalized to unseen speakers in the real world, it should include a protocol to ensure that the speaker approves of the use of his or her voice and a model for detecting synthesized speech,” they wrote.

That being said, VALL-E 2’s results are very accurate compared to other tools. In a series of tests conducted by the research team, VALL-E 2 outperformed human criteria in terms of robustness, naturalness, and similarity of the generated speech.

Image: Microsoft

VALL-E-2 was able to achieve these results with just 3 seconds of audio. However, the research team noted that “using 10-second speech samples resulted in even better quality.”

Microsoft isn’t the only AI company that has showcased cutting-edge AI models without publishing them. Meta’s Voicebox and OpenAI’s Voice Engine are two impressive voice cloners that also face similar restrictions.

“There are many interesting use cases for generative speech models, but due to the potential risks of misuse, we are not making the Voicebox model or code publicly available at this time,” a Meta AI spokesperson said. Decrypt Last year.

OpenAI also explained that it is trying to address the security issue first before launching its synthetic voice model.

“Consistent with our approach to AI safety and our voluntary commitments, we are choosing to preview this technology, but not to release it broadly at this time,” OpenAI explained in an official blog post.

This call for ethical guidelines is spreading throughout the AI community, especially as regulators begin to worry about the impact of generative AI on our daily lives.

Edited by Ryan Ozawa.

Source link

What's Hot

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

Microsoft’s AI voice cloning tech is so good you can’t use it

Generative AI coding startup Magic raises $320M in investment from Eric Schmidt, Atlassian and others

It’s time for streaming services to tackle AI music

Nvidia CFO says ‘enterprise AI wave’ has begun and Fortune 100 companies are leading the way

California Passes Landmark Bill to Regulate Large-Scale AI Models | Artificial Intelligence (AI)

Google employees say AI conferencing tool gives executives easy questions

Salesforce rises as software company bets on AI tools to drive growth

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

The Supreme Court has indicated it would side with Trump if the election is close.

AdsPower: See you at Affiliate World Europe 2024 in Budapest!

TEMU Affiliate Program 2024: Earn up to £100,000 per month!

Hard Bacon files for bankruptcy as Google search changes strain affiliate marketing business

Getting Started in Affiliate Marketing: How to Make Passive Income in 2024

Our Picks

Travel the World for Less with Home Exchange: Explore Like a Local, Live Like a Local

How to watch CNN’s Harris Waltz interview | 2024 US Election

New Zealand damages boat on land on first day of America’s Cup

Most Popular

Working It guide to AI at work

Meta AI is fun, accessible, and free. Maybe it’s time to make AI chatbots a part of your life | Technology News

Generative AI Might Be Overrated

Subscribe to Updates

What's Hot

Microsoft’s AI voice cloning tech is so good you can’t use it

Related Posts