A new, open source text-to-speech model called Dia has arrived to challenge ElevenLabs, OpenAI and more
1 min read
Summary
Start-up Nari Labs, comprising just two engineers, has developed Dia, a 1.6 billion parameter TTS (text to speech) model designed to produce dialogue that closely resembles natural speech.
Accessible to anyone via GitHub or Hugging Face, the model supports a range of nuanced controls, including emotional tone, speaker tagging and nonverbal audio cues, all from a plain text prompt.
It rivals the podcast feature of Google’s NotebookLM, according to its creators, and surpasses the quality of open offerings from the likes of ElevenLabs and Sesame.
Dia is provided under an open source Apache 2.0 licence, with usage prohibited for impersonation, misinformation or unlawful activities.
The developers encourage responsible experimentation and have created a consumer version aimed at non-technical users who want to remix or share content.