OpenAI’s delay in launching ChatGPT’s impressive voice mode has upset many fans of the AI chatbot, but they may have been beaten to the punch. French AI developer Kyutai has unveiled a real-time AI voice assistant called Moshi.
Moshi is designed to deliver realistic conversations with users via voice, like Alexa or Google Assistant, but it is powered by the large language models underlying ChatGPT and its rivals, in this case the Helium 7B model. According to Kyutai, Moshi can speak with different accents and has 70 different emotional and speaking styles. The AI can even handle two audio streams simultaneously, allowing Moshi to listen and speak simultaneously.
Kyutai’s development of Moshi involved refining over 100,000 synthetic dialogues made using text-to-speech (TTS) technology. The goal was to help Moshi learn the nuances and tones of human communication. The brand even collaborated with a professional voice artist to improve the quality of Moshi’s voice.
This AI assistant integrates both text and audio training, optimized for multiple backends, meaning it can run on devices like laptops without needing to interact with the cloud. The company pitches this as a way to preserve privacy and security by preventing sensitive data from being transmitted over the internet. You can see a demo from Moshi here.
Open discussion
Kyutai announced that Moshi would be an open-source project, including the model’s code and framework, providing a foundation for future innovations. The open-source approach could also help mitigate the security and ethics complaints that large AI companies face about their closed-source models. Kyutai’s backers, including French billionaire Xavier Niel, support the open-source approach.
Kyutai is also working on AI audio identification, watermarking, and signature tracking systems that will be integrated into Moshi. These features will identify AI-generated audio files, promoting accountability and traceability while ensuring that AI-generated content can be monitored and verified.
Moshi is still in development, but the voice mode of the presentation is impressive. The voice approach could serve as a catalyst for other voice-activated versions of ChatGPT rivals or accelerate the addition of LLM to Alexa and other voice assistants if Moshi grows and becomes popular.
If you’d like to try Moshi, a demo is available online and you can also sign up for early access to the full chatbot.