Developers: | Soft Systems Bank (BSS, BSS) |
Last Release Date: | 2024/02/21 |
Technology: | Speech technology |
The main articles are:
- Speech Recognition (Technology, Market)
- Speech technology: On the path from recognition to understanding
- Speech synthesis
2024: Optimizing Voice Assistant Creation
BSS has finalized its TTS, a software solution for speech synthesis. Now, to create a new voice, an hour's recording of the announcer is enough instead of the previously required 20 hours. BSS announced this on February 21, 2024.
TTS (Text To Speech) technology allows you to recreate (synthesize) the voice of a particular person to voice a virtual assistant. For example, the voice of an announcer or a well-known personality, if they are part of the image of the company.
Usually, a hybrid TTS approach is used to create a unique voice, where replicas of a living person are combined with synthesized speech. The announcer voices static lines, thereby preserving the naturalness and richness of intonations. Synthesis is used to announce variables: dates, surnames, tariffs, addresses, etc. This approach allows the voice assistant to correctly convey emotions and intonations in conversation with the client.
Previously, creating a unique voice required 15 to 20 hours of audio material. For reference: in one studio day, it is possible to record from 1 to 2 hours of material. Usually, the development time is increased, since it is necessary to dock the schedule of the studio, announcer and the timing of the project. Customers become dependent on the announcer's schedule: you need to find 10-15 free days with the announcer to record materials.
Saving customers time and optimizing their costs became the main tasks of the BSS team when developing an updated version. So a big update was prepared with a change in the basic training technology. This version of TTS requires 1-2 hours of audio to create a unique voice. This is just 1 working day of the studio.
BSS developers accelerated the process of creating a unique voice, preserved the quality of recording, and also made it possible to optimize customer costs.
Customers like the naturalness of speech when they combine static phrases recorded by the announcer with synthesized dynamic fragments that sound the same voice. An increasing number of companies are choosing this approach when introducing voice assistants. For February 2024, we are conducting several projects using hybrid TTS with training on one hour of audio and plan that this will become massive. In turn, we are working to improve the quality of recording and the emotionality of synthesized speech, - commented Alexander Krushinsky, director of the voice digital technologies department at BSS. |