ElevenLabs’ new speech-to-text model Scribe is here with highest accuracy rate so far (96.7% for English)
1 min read
Summary
ElevenLabs, the AI voice cloning and generation start-up founded by former Palantir alumni, has released Scribe v1, a speech-to-text model that boasts the highest accuracy across multiple languages.
The model outperforms Google, OpenAI and Deepgram in accuracy, achieving record-low error rates, and is able to transcribe and distinguish between 32 different speakers in the same audio file.
Scribe is currently available through the ElevenLabs website and API, with pricing set at $0.40 per hour of audio, with a 50% discount for the next six weeks.
The start-up has also unveiled plans for a low-latency version which will be aimed at real-time applications, including its use in communication tools.
The development of Scribe and Octave, an opposite text-to-speech model from rival Hume AI, highlights the growth in competition surrounding AI-driven audio models.