The name of the base system (platform): | Artificial intelligence (AI, Artificial intelligence, AI) |
Developers: | SberDevices (SalyutDevices, formerly SberDevices) |
Date of the premiere of the system: | 2021/05/14 |
Last Release Date: | 2023/06/07 |
Technology: | Speech technology |
Main article: Speech technology: on the path from recognition to understanding
2023: Full Access for All
Sberbank on June 7, 2023 announced the opening of full access to its SaluteSpeech speech synthesis and recognition platform for everyone. Previously, for commercial use, platforms were available only to legal entities, and private users could apply them exclusively to non-commercial projects in a limited format.
The Freemium tariff for individuals will allow several categories of users to solve many problems related to speech generation and text decoding. Using the application software interface (API), authors of works can create audiobooks, bloggers - voice content or, conversely, turn audio recordings into text. For example, quickly decipher interviews.
Medium and small businesses can use the launched tariff to create mobile and web applications, make pre-recordings for telephony, voice presentation materials, transcribe audio recordings of meetings and much more. Large businesses have the opportunity to easily and quickly test the SaluteSpeech platform before deciding on the further implementation of speech technologies in their business processes.
{{quote 'author = said Denis Afanasyev, Director of B2B-products division of Sberbank Salute. | Understanding the value and demand of speech technology, our goal was to provide people with quick and easy access to the SaluteSpeech platform. Now all users can implement any, including commercial projects related to decryption and speech generation in the shortest possible time,}}
With the Freemium tariff, users will have access to 100 minutes of recognizing audio recordings and generating 200,000-character speech for a month. Each month, this limit resumes. If the user spends the basic limit before the expiration of the monthly period, he can buy additional packages and continue working with the SaluteSpeech platform. In this case, the cost of 1000 minutes of speech recognition is 1200 rubles, speech generation for 1,000,000 characters is 1000 rubles.
The promotion and distribution of the solution is carried out by the partner company SberDevices (SalyutDevices LLC).
2022
Public access to SaluteSpeech for nonprofit projects
The SberDevices team has made public access to its SaluteSpeech speech synthesis and recognition platform for nonprofit projects. Previously, only legal entities and individual entrepreneurs could access the platform. For non-commercial use of platform speech technologies, only registration on the portal is required. This was announced by Sberbank on November 29, 2022. Open access to the SaluteSpeech platform will help users quickly and efficiently solve many tasks. Students can transcribe lectures, bloggers - voice videos, authors - create audiobooks.
There are also scenarios for using the public version for business. Medium and small businesses can voice presentation materials or transcribe audio recordings of meetings or interviews for subsequent work with text. Big business gets the opportunity to quickly test Sberbank's technology before deciding on further cooperation.
The team speaks a lot at events in front of students and tells them about speech technologies, now they can fully test them and create their first projects, which, I am sure, will begin something big and important in the future. It is also an excellent tool for people who have a everyday need to translate text into voice and vice versa. I note that there are many of them. And, of course, another important task of this project is to give businesses the opportunity to test the SaluteSpeech platform. Thanks to free access, everyone at a convenient moment for them can connect to the platform and assess the quality of its technologies, |
Public access to the platform is only available for non-commercial use. Speech recognition in this format for individuals is limited to 100 minutes per month. For example, the duration of one lecture is 60 minutes, and a business meeting on average lasts from 40 to 60 minutes. You can synthesize text for 200,000 characters per month - enough to create a small audiobook of about 100 pages.
Access to SaluteSpeech for commercial projects eliminates these limitations. When working with the cloud version of the platform, the client pays by the number of seconds for speech recognition and by the number of characters for its generation. If the customer deploys the system on-prem, the customer pays for the number of licenses.
The SaluteSpeech platform allows you to connect voice technologies in an interactive voice menu (IVR), answering machine, chat rooms, telemarketing companies, voice interaction interfaces, for voicing content and commands or voice input on devices and sites. Platform services recognize and synthesize speech, and are also able to perceive hints that help to understand the user as correctly as possible depending on the specific situation. Speech recognition and synthesis technologies can be used together or individually.
Integration with VS Robotics Voice Analytics
The speech recognition technology of the SaluteSpeech platform, developed by the SberDevices team, is integrated with the voice analytics system from VS Robotics, which increases customer efficiency and increases sales to the business. This was announced by Sberbank on November 22, 2022. Read more here.
SDK for connecting SmartSpeech technologies in AR and VR applications
The SberDevices team has created a set of Software Development Kit (Software Development Kit), which allows developers of applications with augmented or virtual reality to add high-quality voice control in Russian. The solution allows you to connect to applications without additional development, including those working with VR helmets, SmartSpeech recognition and speech generation technologies from SberDevices. Read more here.
SmartSpeech YourVoice service - creating the company's own voice
On June 9, 2022, the SberDevices team presented a service - SmartSpeech YourVoice - that allows you to voice non-standard text of any volume and complexity in a short time, taking into account intonations and pauses, synthesize it and use it in telephony, on the site, in a mobile application and other channels of interaction with customers.
For companies that want to have a unique voice that forms an association with their brand from the consumer, the development team offers a replenished file of ready-made votes, from which you can redeem a certain voice, after which it will be removed from the catalog and will become inaccessible to other customers.
At the same time, using SmartSpeech YourVoice in one month, you can create your own voice using a specific announcer, a well-known personality, or, for example, the voice of the company's CEO. This will require a four-hour recording of the announcer, whose voice will be used for further synthesis. It is this timing that allows you to teach the system to voice any amount of text of different complexity.
With the help of SmartSpeech technology from SberDevices, a created or ready voice is transformed into natural speech and can be used in all channels of the company, which saves business time, financial costs and other resources necessary for speech synthesis and content voicing. The announcer cannot always be available for recording in the studio, and each of his trips entails additional financial costs. Using SmartSpeech YourVoice solves this problem and allows you to sound the necessary content at any time.
In addition, the SmartSpeech API provides for the use of seven publicly available voices, including in English, from which everyone can choose the appropriate option for their tasks and needs.
"SmartSpeech YourVoice is another marketing business tool that increases brand value for the end consumer. After all, the use of the same voice in all communication channels with the client creates confidence and increases overall loyalty to the business. It is also important that the use of a unique voice can become an additional means of protection. When contacting fraudsters, the client will be able to quickly to identify voice them and stop communicating in a timely manner, which is especially important for the banking sector. Our technology also provides for the ability to superimpose sound accompaniment on the speaker's speech background: bird singing, sea noise or other sound track that will help you immerse yourself in the desired atmosphere, " told Denis Afanasyev, Director of B2B-products division Salut SberDevices |
At the heart of the Sberbank Telegram bot for text decoding of audio files and voice messages
On March 31, 2022, Sberbank announced that the SberDevices team had launched a free Telegram bot that converts voice messages in Russian into text format. The bot runs on SmartSpeech speech recognition technology and will become a useful tool for journalists, copywriters, translators and other professions who work with texts. It will save hours, reduce routine and significantly increase efficiency in content production. Read more here.
2021
SberCloud Cloud Availability
The SmartSpeech service from SberDevices, which allows speech technologies to be used in business without additional costs for their development and purchase of equipment, has become a partner product based on machine learning and artificial intelligence technologies available on the high-performance SberCloud infrastructure. Sberbank announced this on August 11, 2021.
SmartSpeech is capable of highly accurate recognition and synthesis of speech in Russian. With it, you can create new and equip existing products and services with advanced speech technologies.
SberCloud customers can use the SmartSpeech service when creating chat bots, voice assistants, interactive voice menus, call centers, online stores, support services, site voicing and a huge number of other scenarios.
Now in SberCloud it has become possible to develop, deploy and scale their products on the ML Space platform and integrate them with the ready-made speech synthesis and recognition service SmartSpeech.
If the product needs to further learn the synthesis or speech recognition model, then in ML Space SberCloud, a pre-trained model for speech recognition is already available in the DataHub ML tools marketplace, as well as a large Russian-language manually marked Golos speech datacet, on which the SberDevices team worked.
SberCloud's SmartSpeech service and ML Space platform greatly simplify and accelerate the introduction of speech technologies and interfaces into products and services.
Until the end of 2021, the SmartSpeech service will be available for free, and by August 2021 it can be used for full-fledged work.
{{quote "Any Russian entrepreneur, startup or large company can receive a grant of up to a million rubles from SberCloud to create and introduce machine learning technologies on the ML Space platform into their products.
The launch of the first AI partner service is an important stage in the development of SberCloud, and it is no coincidence that such a product was the SmartSpeech service, developed by colleagues from SberDevices using our ML Space cloud platform. Such a partnership once again shows the strengths of the Sberbank ecosystem. We are confident that the integration of SmartSpeech and SberCloud cloud services will give our customers the best tools to work with speech technologies - a key area of artificial intelligence and machine learning.
Speech recognition and synthesis are already actively used in ecosystem products and services, and the cloud implementation of SmartSpeech in SberCloud makes it available to the maximum number of business users from various sectors of the economy, "said Evgeny Kolbin, CEO of SberCloud.}}
Speech technologies are in great demand, their implementation and use is much cheaper than the work of operators and announcers, optimizes and makes the process of interaction with customers much more efficient. Switching to an interactive voice menu or answering machine helps you cope with routine requests faster and unload contact center employees, allowing them to devote more time to customers. So, when the caller says what service he needs, he is either transferred to the desired operator, or immediately receives a response from the virtual operator. SmartSpeech can also be used on sites, applications or smart devices for voicing content and commands or voice input. We at SberDevices are very glad that it was our service that became the first AI partner service in SberCloud, and we are confident that cloud users will find suitable integration scenarios with SmartSpeech, "said Denis Filippov, CTO SberDevices. |
Launch of SmartSpeech service
On May 14, 2021, Sberbank launched SmartSpeech, a service that will allow businesses to connect voice technologies without special equipment, for example, in the interactive voice menu (IVR), answering machine, chats, telemarketing campaigns or in voice interaction interfaces. Until the end of 2021, access to the service developed by the SberDevices team will be free.
SmartSpeech can be used on sites, applications or smart devices to voice content and commands or voice input. Also, speech synthesis and recognition technologies used in the service are used to create IVR (interactive voice menu) and answering machine - this optimizes the operation of call centers. The service itself recognizes and synthesizes speech, and is also able to use "prompts" that help to accurately understand the user depending on the specific situation. SmartSpeech is also used in Sberbank itself, for example, it is the basis in the Salute family of virtual assistants. It also allows you to find out the balance of a bank card by number 900 at any time of the day without waiting for the operator.
One of the business tasks that can be solved using SmartSpeech is to quickly, efficiently and cost-effectively solve the issue with which the client is dealing. The caller says what service he needs, and either immediately gets to the desired operator, or receives an answer from the robot. Special models for recognizing silence and noise, the ability to determine the end of the statement and the emotions of the interlocutor allow you to make interaction with the robot alive and empathic, and training acoustic models on a large amount of data helps to qualitatively recognize speech even during a telephone conversation.
You don't have to record "live" speech in advance: it's enough to load the text, and the robot will read it aloud. The service already offers several voices, and their library continues to expand, allowing you to choose from more and more tones, timbres and sound moods suitable for a specific business by 100%. At the same time, SmartSpeech generates the most natural speech: its own stress arrangement model helps to significantly reduce the number of phonetic errors in the synthesis, even complex text - numbers, addresses, names - is easily voiced with its help.
SmartSpeech uses the latest developments in Deep Learning. Neural networks are trained on huge amounts of data using the powers of the Christofari supercomputer from Sberbank. Services are written in the C++ programming language, and neural networks use GPUs for ultra-fast operation. Speech recognition uses ultra-precise architectures such as Jasper, QuartzNet, and others.
To achieve high-quality speech synthesis, the SberDevices team modified the Tacotron 2 architecture, introducing control over the frequency of the main tone of speech, pauses, and also changing the intonation depending on the topic of the text. This uses information obtained from the BERT model, which was previously trained in the Russian language on a large number of texts, due to which speech synthesis sounds difficult to distinguish from the speech of a real person.
The speech technologies underlying the SmartSpeech service are actively implemented in call centers and support services, and their use costs several times less than the work of an operator or announcer. For example, many companies automate call work by recording ready-made replicas, but the work that a person does can be given to a service that operates on the basis of speech technologies. In the same way, companies can voice the texts of sites and applications or add a voice input option, which significantly speeds up the user's interaction with the resource and gives him the opportunity to use this or that service even while driving, "said Denis Filippov, CTO SberDevices |
Companies wishing to test SmartSpeech are provided with a software interface (API) for connecting and using speech services in their products. The API uses HTTP and gRPC protocols, so the code can be very quickly built into almost any system. The use of HTTP REST and gRPC API is convenient when the business has its own integration, for example, its own client for the telephone platform. If you need to integrate the TTS API for a site or application, then HTTP is the easiest and fastest way to perform this task.