Summary

  • Start-up Nari Labs, comprising just two engineers, has developed Dia, a 1.6 billion parameter TTS (text to speech) model designed to produce dialogue that closely resembles natural speech.
  • Accessible to anyone via GitHub or Hugging Face, the model supports a range of nuanced controls, including emotional tone, speaker tagging and nonverbal audio cues, all from a plain text prompt.
  • It rivals the podcast feature of Google’s NotebookLM, according to its creators, and surpasses the quality of open offerings from the likes of ElevenLabs and Sesame.
  • Dia is provided under an open source Apache 2.0 licence, with usage prohibited for impersonation, misinformation or unlawful activities.
  • The developers encourage responsible experimentation and have created a consumer version aimed at non-technical users who want to remix or share content.
  • Dia can be accessed here https://github.

By Carl Franzen

Original Article