Developers: | ElevenLabs |
Date of the premiere of the system: | February 2025 |
Branches: | Information Technology |
Content |
History
2025: Product Announcement
At the end of February 2025, ElevenLabs introduced the Scribe v1 open source artificial intelligence model, designed to convert speech into text. According to the developers, the neural network provides very high accuracy, surpassing many analogues in this indicator.
Scribe v1 supports 99 languages, including Russian. The highest level of accuracy with an error rate of less than 5% is achieved for 25 languages, which include English (declared accuracy is 97%), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish and Vietnamese. All other languages are divided into groups with high (from 5% to 10% errors), good (from 10% to 20% errors) and average (from 25% to 50% errors) accuracy levels. ElevenLabs says the Scribe v1 AI model has surpassed Google's Gemini 2.0 Flash and Whisper Large V3 in multiple languages in FLEURS and Common Voice tests.
Scribe v1 does not just convert voice information into text information - it understands the audio stream. An AI model can detect nonverbal events such as laughter, sound effects, music and background noise, and analyze long audio contexts to accurately diarize (separate announcers) even in the most difficult conditions, says Flavio Schneider, lead researcher at ElevenLabs. |
The Scribe v1 model can add word-level timestamps for subtitles and automatically label audio events such as audience laughter. It is claimed that in one audio stream, AI can recognize up to 32 speaking people. ElevenLabs provides customers with the ability to directly transcribe video content. Developers can use Scribe v1 via a program interface (API). Service link. [1]