A new AI translation system for headphones clones multiple voices simultaneously
1 min read
Summary
A new AI system called Spatial Speech Translation enables headsets to translate multiple speakers’ words into the wearer’s language in real-time.
The system works with off-the-shelf noise-cancelling headphones and identifies speakers by dividing the surrounding area into sections and using a neural network to pinpoint the direction from which speech is coming.
It then uses public data sets to translate the language and also clones the speaker’s voice tone and characteristics so the translated response sounds like it’s from the original speaker.
The technology is a significant improvement on existing systems that only work with single speakers, are not real-time and use automated robotic translation.
The next step for researchers is to reduce the time it takes for AI translation to occur following speech.