Kyutai, a French AI company, has developed a new AI-powered chatbot called “Moshi” that offers similar features to ChatGPT’s now-delayed GPT-4o “advanced voice mode.” Moshi can understand the tone of your voice and interpret it. It can also be used offline.
Based on a 7 billion parameter large-scale language model (LLM) called Helium, the chatbot is currently available to everyone and can speak with different accents and 70 different emotional and oratory styles. Moshi can also handle two audio streams simultaneously, meaning it can listen and speak at the same time.
The AI chatbot, named after the Japanese way of answering a phone call, has a response time of just 200 milliseconds, making it faster than GPT-4o’s advanced voice mode, which typically takes between 232 and 320 milliseconds.
Kyutai says its goal was to teach Moshi different nuances and tones of human conversations. To improve the voice quality, the company even collaborated with a professional voice artist.
However, unlike GPT-40, Moshi is quite small and was developed from scratch in six months by a team of just eight researchers. It was reportedly trained on 100,000 synthetic dialogues using Text-to-Speech technology.
Kyutai says its goal is to make the chatbot an open-source project, meaning the model’s code and framework are available to everyone, so that users can use the chatbot safely without having to worry about privacy. While Moshi is faster than GPT-4o, the company says it’s a research prototype and is a way for them to showcase the bot’s response time and ability to reproduce not just sentences but also tones and voices.
It turns out that Kyutai is also working on an AI-based audio identification, watermarking, and signature tracking system that will eventually be integrated into Moshi. And while it may not be the ChatGPT competitor we were hoping for, it’s certainly a big step in developing open-source models that can work offline.
© IE Online Media Services Pvt Ltd
First posted on: 06-07-2024 at 16:53 IST