RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

Yandex SpeechKit Cloud

Product
Developers: Yandex, Yandex.Cloud
Last Release Date: 2023/12/19
Branches: Internet services
Technology: SaaS - Software as a service,  information security - Biometric identification,  Speech technologies

Content

Main article: Voice biometrics

2023

Presentation of 8 additional votes

On December 19, 2023, the Yandex Cloud cloud platform introduced 8 new voices with different emotions in the Yandex SpeechKit service. Now companies will be able to use friendly, strict intonation or even whispers in the synthesis of speech. This will allow development companies to change the color of speech synthesis depending on the business scenario and increase customer satisfaction and conversion in voice channels. In addition, a parameter has appeared in the service that allows you to change the height of the voice.

Companies will be able to choose the optimal voice for the business scenario. For example, to survey satisfaction, use friendly speech, and to collect feedback on the quality of services after a client's complaint, choose an empathetic, serious intonation of the voice. Different types of voices affect sales conversion and customer perceptions of voice communications, according to research by voice robot developer Tomoru. Thus, the female voice works best in recruiting - 68% of conversions, and the male voice is more often used in online education - 53% of conversions.

The diversity of voices in Yandex SpeechKit made robot dialogues less formulaic, native. When creating new voices, the developers of Yandex SpeechKit changed not only the operation of the machine learning model, but also the text base used by the announcers. This made it possible to improve the sound of voices in interrogative and exclamation sentences, which are a difficult task for speech synthesis.

File:Aquote1.png
Speech synthesis is a popular technology for automating communications in contact centers and beyond. As developers, it is also important for us that dialogues with voice robots are human and comfortable for ordinary people. In the future, we plan to provide users with even more new voices, "said Grigory Atrepyev, CPO of the Yandex Cloud cloud platform.
File:Aquote2.png

Speech recognition in Uzbek

On June 15, 2023, Yandex Cloud announced the development of a neural network that can recognize speech and reproduce it in Uzbek. Companies in both Russia and Uzbekistan can already use an additional language in the Yandex SpeechKit service to create voice assistants, automate call centers and voice analytics.

Speech synthesis models were trained on the pre-recording of the voice of a real announcer. At the same time, they are able to synthesize speech not only from the text, but also from its phoneme recording, which indicates all the features of sounds in words. This will allow customers to adjust in synthesis the pronunciation of individual words - complex surnames, product names, borrowed expressions. To do this, they will need to specify a phoneme parsing of the desired word in the text using a special syntax.

As well as for speech synthesis, they learn from examples to recognize neural networks. To train them, Yandex Cloud specialists collected a datacet with thousands of hours of audio and their decryptions, including short and long phrases, as well as names, addresses, dates and numbers.

Neural networks for the synthesis and recognition of the Uzbek have been working with the Latin alphabet, which has been used in Uzbekistan for more than 20 years. At the same time, the algorithms had to learn some sound features of the letter. For example, the letter "X" means solid [x] in the alphabet, and in foreign words it can be pronounced as [x].

{{quote 'Scenarios related to artificial intelligence, in particular with speech technologies, are actively developing in Uzbekistan. According to one of our partners, the volume of the speech technology market in Uzbekistan can reach up to 395 million minutes of processed speech per year. The appearance of the sixteenth language in Yandex SpeechKit is a big step to create dialogue scenarios for both Uzbek and Russian companies, "said Grigory Atrepyev, Product Director at Yandex Cloud. }}

Models are available on the site and are configured by standard tools in the API. Synthesis and recognition of the Uzbek language using Yandex SpeechKit has already begun to use a number of companies.

2022

Integration with Just AI Conversational Platform

On September 20, 2022, Just AI announced the integration of solutions with Yandex Cloud to launch voice AI projects in the customer's loop. Now users of the Yandex SpeechKit speech recognition and synthesis service will be able to deploy full-scale voice AI projects not only in the cloud, but also in their own loop. Read more here.

Add automatic punctuation

In Yandex SpeechKit (a service for synthesis and) speech recognition , it became possible voices to automatically place punctuation marks when translated into text. The recognized neuronet text is as close as possible to the literary text and is more easily perceived by the reader. The company Yandex announced this on April 20, 2022.

This will improve the experience of users in scenarios where a person directly interacts with speech technologies. For example, communication with the voice assistant, automatic transcription or subtitle formation.

The punctuator is developed using two sequential machine learning models. The first translates the voice into the text, the second places punctuation marks in accordance with the norms of the Russian language. For April 2022, the model places all the main punctuation signs of the Russian language.

2020: Yandex SpeechKit Pro

On September 23, 2020, the platform company Yandex.Cloud presented the specialization of the SpeechKit service - Yandex SpeechKit Pro. This is a program for development companies, whose participants will have access to new tools for creating robots and voice assistants focused on working in a particular industry or company. Such robots will be able to recognize words and commands on a certain topic with the maximum level of accuracy, according to Yandex.. Cloud New tools will help optimize service scenarios in, in, bank or to medicine in delivery. SpeechKit Pro also allows you to create individual features of voice: robot intonation and manner of communication.

By 2020, speech synthesis and recognition have become the most popular ML service on the Yandex.Cloud platform. According to the developers, SpeechKit consumption has grown by 120% since the beginning of the year. The number of active projects exceeded 500. An ecosystem of solution developers and integrators has already been formed in Russia, which, by order of companies from various fields, create and implement voice robots to help process incoming and outgoing calls, voice control systems in applications and customer service terminals, and solutions for analyzing the effectiveness of business communications. For September, these are more than 20 companies, most of which are permanent partners of the Yandex.Cloud platform. According to partners, over the past two years, the main motives for the introduction of voice robots in Russian companies have been cost reduction and rapid scaling of solutions.

File:Aquote1.png
"Together with our partners, we have come a long way, in two years we have made Speech Technologies from an exotic service an applied business tool. Now we are taking the next step and discovering the next level of Yandex speech technologies for partners. Development companies will have access to the advanced capabilities of SpeechKit, and solution customers will be able to choose a supplier with the most suitable expertise, "commented Alexey Bashkeev, head of the Yandex.Cloud platform.
File:Aquote2.png

Together with the business interest in the capabilities of speech technologies, the requirements for recognition accuracy in specific scenarios of interaction between voice robots and humans, the ability to quickly adapt developments to new tasks have also grown. For example, it is fundamentally important for a delivery company that the robot is not confused in assessing the values ​ ​ of the phrases "transfer the order" or "enter the order," and for telecommunications companies - so that it distinguishes the phrases "turn on the service" and "turn off the service" without errors. The priority of the business is accuracy in its area, the ability to develop experience in applying in a specific business scenario based on objective indicators.

To solve these problems, Yandex.Cloud provides partners with additional development tools within the SpeechKit Pro specialization. Partner companies will now be able to use audio markup, train customized speech recognition models on customer data, monitor speech recognition quality metrics, and tailor recognition models to a specific data stream.

SpeechKit Pro has already been specialized by Neuro.net, Just.ai, AtsAero, Naumen, Robovoice and Voximplant.

2019: Inclusion in the Standalone IP PBX from MCN Telecom

On July 29, 2019, MCN Telecom announced that it had added Yandex's Yandex SpeechKit service. Cloud to Standalone IP PBX, which made it possible to provide large customers with the Voice Assistant product in Russian. This functionality can be useful for banks, financial organizations, online stores - companies that use artificial intelligence (AI) in sales, etc. Read more here.

2014: SpeechKit Cloud Announcement

On August 4, 2014, Yandex introduced SpeechKit Cloud, a cloud speech recognition service. With its help, developers can teach their products to understand a person's voice.

The company said that SpeechKit Cloud support can be added to various programs, services and devices: from a computer game to a car navigation system.

SpeechKit Cloud is based on Yandex SpeechKit speech recognition technology, which Yandex launched in 2013. As of August 4, 2014, it is used in 400 mobile applications for Android, iOS and Windows Phone.

SpeechKit Cloud "understands" Russian and Turkish. Voice requests are processed on high-load Yandex servers.

The service infrastructure is designed with high loads in mind to ensure the availability and uptime of the system with a large number of simultaneous calls.

Interaction Model (2014)

Interaction with SpeechKit Cloud is implemented through the HTTP API. Without installing additional software, the following functions are performed:

  • voice input in computer games and applications;
  • voice control in the car cabin - for example, a navigation system;
  • interactive voice menu IVR in telephony;
  • voice interface of Smart Home systems;
  • voice interface of electronic robots;
  • voice control of household appliances, etc.