Summary

  • New York-based startup Hume AI has created an AI voice model called Octave, which can produce lifelike, emotionally nuanced speech for use in various forms of content, from audiobooks to video game dialogue.
  • Octave is a large-language and speech model that has been trained on both text and speech and emotion tokens, meaning it can understand words in context and adjust tone, rhythm and cadence accordingly.
  • The user can adjust the voice on a sentence level with text prompts and the model can interpret character traits and style from a script, adjusting vocal inflections to match implied emotions.
  • Hume AI is charging roughly half what competitors do for usage of Octave, which is designed to produce offline text-to-speech content.
  • The model was preferred in terms of audio quality, naturalness and how well the speech matched descriptions of the desired voice in a benchmark study carried out by Hume AI.

By Carl Franzen

Original Article