RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2022/08/04 08:35:55

Smart Voice Services

With all the variety of digital communications, voice communication remains one of the most important channels of business interaction with its customers. How does the number change voice services, and when will it be possible to talk humanly with voice assistants? The article is included in the TAdviser review "Artificial Intelligence Technologies"

Content

File:Aquote1.png
Therefore, Mosenergosbyt continues to develop voice communication channels, for the most part in the direction of automation. Already, the share of telephone calls processed automatically is 63%, and the prospects for improving the efficiency of voice communications have not been exhausted.
File:Aquote2.png

File:Aquote1.png
The phone call remains the most preferred channel for consultation at the moment. But we, indeed, note an increase in the share of customers who are more comfortable asking questions in multimedia channels.
File:Aquote2.png

The Tricolor contact center translated more than 40% of such calls from the voice channel into this format.

Communication channels that bank customers intend to use for simple and complex requests in the coming years

Source: Deloitte Digital, Global Contact Centre Survey, 2019

This heated, highly competitive market is growing, according to the analytical agency Meticulous Research, by 17.2% every year, and its volume on a global scale will amount to $26.8 billion by 2025 . Experts of the Russian company BSS believe that the growth of the voice direction was largely facilitated by the situation in the economy caused by the spread of the coronavirus - it stimulated the introduction of solutions based on speech technologies.

The Russian market for speech technologies and services, according to Frost & Sallivan analysts, may reach up to 20 billion rubles by 2024. At the same time, according to expert estimates, so far voice solutions make up only a small part of the total volume of the Russian AI market - about 10%.

As he said, Dmitry Dyrmovsky the general director, the Speech Technology Center (CRT) largest company in Russia in the field of speech technologies, the highest growth dynamics are demonstrated by the directions of intelligent dialogue robots, speech analytics, and facial. biometrics Thus, the direction of analytics of dialogue communications increased in 2020 relative to 2019 by 48%, the volume of sales for the to projects introduction of CST products to improve customer service banks financial in institutions increased by 76%, in - transport industries eight times, and in areas and - constructions more power industry than 10 times. According to the CST forecasts, the demand for solutions will continue in,, and. fintech telecom health care

Current Voice Recognition Technical Level

File:Aquote1.png
Classical speech recognition, built on separately trained acoustic and language models, rarely showed accuracy above 75% out of the box. And at the same time it required hundreds of hours of annotated recordings of conversations to adapt models to the subject area and achieve acceptable accuracy, "the expert explains.
File:Aquote2.png

In current systems based on end-to-end models, the accuracy of voice recognition exceeds the 85% immediately out of the box, and the addition of 3 - 5 hours of annotated (marked) audio recordings on the subject of a particular subject area allows you to achieve recognition accuracy at the 95% level.

File:Aquote1.png
This percentage should mean that on arbitrary audio recordings the system must show a WER (Word Error Rate) error level of no more than 5%, that is, contain no more than five words with transcription errors (recognized sounds and words) per hundred. Even a system tailored to a specific domain will make more mistakes if, say, a person vaguely uttered the ending of a word or uttered something incoherent or illiterate at all. If we talk about an arbitrary domain (text from any subject area), then the WER indicator will rise to 10-15%, or even higher.
File:Aquote2.png

In this regard , benchmark testing, which Nanosemantics conducted last fall, is of interest. It compared 17 voice systems of 14 different vendors (including,, Yandex,, Google,,, etc.) Tinkoff Amazon Azure who Sber 3iTech worked on a test set data with non-specific vocabulary. These systems were compared with four Nanosemantics models: two of them were trained on telephony data and two more on live speech from electronic devices.

File:Aquote1.png
Given these features, as well as the cost of services, the ability to install in a loop/work in the cloud and you need to choose the right solution, - advises Stanislav Ashmanov.
File:Aquote2.png

File:Aquote1.png
Speech technologies, and, in particular, speech recognition, have been actively developing in recent years, the quality of models is growing. And this is due, first of all, to the accumulation of large volumes of raw and marked up data from large vendors, as well as the emergence of new architectures in Open Source that offer new ways to train models.
File:Aquote2.png

In particular, the big breakthrough was the concept of wav2vec2 (and hereinafter - data2vec), for the training of which it was possible to use a relatively small amount of marked data, and a very large amount of data for preliminary training without a teacher.

File:Aquote1.png
It is much easier to prepare audio recordings with voice without exfoliation, which simplifies the entrance for new developers, "says Stanislav Ashmanov. And he adds: The barrier remains the solid computing power necessary to train such networks, although with even an old video card and time reserve, you can get quite tolerable recognition quality.
File:Aquote2.png

Alexander Boltachev, ML developer of Globus IT, says that all approaches to speech recognition include solving two main problems. First, recognizing elementary parts called tokens. Many approaches use symbols as tokens. Secondly, compiling a meaningful sentence from tokens. Moreover, on this tap, those elementary parts that were recognized from speech can be used as tokens, but they can also constitute other types of tokens.

To compile a meaningful text, linguistic information is used, extracted from a large set of tests using special linguistic models (LM). Such models can be either a separate part of the system or part of the model for speech recognition. This is where the main problem lies, says Alexander Boltachev: when creating LM, it is necessary to decide what to use as tokens.

File:Aquote1.png
In many narrow-directional virtual assistants, words are used as tokens for LM. This helps to get a deliberately good result within the framework of a specific specificity, without having a very large set of text data, the expert explains. - However, due to the fact that LM allocates linguistic information connecting specific words, this approach is limited to the provided dictionary and is not able to form new words.
File:Aquote2.png

File:Aquote1.png
Yes, such systems will no longer be limited by the dictionary, but in order to get an acceptable quality of sense recognition, it is necessary to have huge data sets, - comments Alexander Boltachev. - They cannot be found anywhere in the public domain, and the cost of such sets will be very high even for relatively large companies. Also, such systems can be quite often mistaken, and can also make up non-existent words - and all because they are made too flexible.
File:Aquote2.png

The approach based on the division of words into sub-words, in particular, the Byte Pair Encoding (BPE) algorithm, has become popular: it breaks down sentences in the training data set into the most frequently encountered tokens and in its idea is similar to the Hafman algorithm. {{quote 'These approaches allow you to get away from the problem of dictionary limitations, require less data to allocate the necessary dependencies than the use of characters, and are also less susceptible to the problem of forming non-existent words, as they operate with frequently encountered character combinations, - emphasizes Alexander Boltachev. }}

File:Aquote1.png
This allows you to train the robot to understand what was said and extract information from speech based on 2-3 dozen examples when asked, while a couple of years ago it was necessary to manually mark tens of thousands of audio recordings.
File:Aquote2.png

… and understanding the meaning of what was said

The second most important technological achievement of speech analysis is associated with the recognition of the meaning of spoken words (Natural Language Understanding, NLU), that is, artificial intelligence (AI) algorithms designed to understand natural language.

File:Aquote1.png
This is logical, because command recognition is easier than merged text, and reliability is significantly higher. A lot of things are already controlled by the voice - from phones to cars. There is noticeable progress in the field of voice dialing and voice search systems, - says the specialist.
File:Aquote2.png

With the help of a voice chat bot, you can automatically receive the readings of utility meters or find out the contract number, adds Stanislav Ashmanov.

In general, the widespread introduction of voice interfaces - from medicine to voice biometrics - Alexey Lyubimov considers one of the breakthrough directions of the near future.

The further development of voice technology is being pushed by several trends. In particular, tools for processing the natural language, which sounds in the natural environment, are actively developing. According to Dmitry Dyrmovsky from the CST, one of the important trends in speech services is associated with the exit of technologies from Call centers to offices and streets: the request for recognition of a "speech cocktail" will intensify: difficult acoustic conditions, spontaneous speech of several announcers, moreover, speaking at the same time and interrupting each other.

So, in the plans of the company "Nanosemantics" - the expansion of the format of audio recordings made in the car, in offices and cafes, with background music or the sound of a working TV, as well as speech with various accents.

One of the most significant trends is omnichannel customer service, which implies the integration of voice with other communication channels.

Omnichannel Opportunities

File:Aquote1.png
After the most frequent, massive reasons for calls are automated in the voice channel, the question arises of where to develop next, - says Anton Sunkin from Mosenergosbyt. - Simply complicating dialogue scenarios is ineffective, since a person perceives an extremely limited amount of information by ear. For ourselves, we see an exit in migration between voice and text channels: a subscriber in a voice channel designates his request, and if the response to this request is associated with the provision of a large amount of information, or this information is more easily perceived in a text, tabular version, then the client is offered a transition to the messenger, where the dialogue continues. The voice in its pure form, in communication with a "living" person remains for those cases when the client is not able to formulate his request.
File:Aquote2.png

  • is Omnicanality a tool that can seriously change even such classic services as call analytics: it is not just the words or tone that they were uttered that are analyzed, but the meaning of the phrase or dialogue. In these situations, natural language processing systems (Natural Language Processing,) are used. NLP

The presence of a database of statistical data of various types will contribute to the development of the market in the direction of multimedia synchronization, voice and video analytics, says Sergey Andronov, director of the Center for Network Solutions "Jet Infosystems." And Dmitry Dyrmovsky is sure that large enterprises, banks already have a huge amount of client data, the analysis of which will create a unique personal experience when a client applies through different communication channels.

File:Aquote1.png
This made the contact center more productive and made it possible to more efficiently select advertising channels, more accurately predict and configure advertising campaigns, "says Alexey Lyubimov. - And the deployment of 3i TouchPoint Analytics in the IT circuit of Home Credit Bank has increased the efficiency of client service, telemarketing and recovery department. So, in telemarketing, sales conversion doubled. And in the first month from the moment of implementation, the quality of customer service increased by 15%.
File:Aquote2.png

It is worth noting that the omnichannel approach implies special requirements for the technical component of the voice system - seamless service on all channels and the ability to switch from a voice channel to a multimedia channel without losing any data. In other words, having entered into communication, both contact center operators and software robots must always understand the context of the call and continue the service process in the right direction.

AI in the contact center

One of the classic options for using intelligent speech technologies in Call centers of companies was called Voice of Customer. This successful, capacious phrase in a condensed form describes a whole class of IT solutions designed to improve the efficiency of the Call Center. We are talking about systems for recording operator negotiations with customers and analyzing these negotiations.

For advanced contact centers, in which omnichannel customer service is implemented, natural language recognition technologies apply to all communication channels: correspondence by e-mail and in instant messengers, messages on corporate sites, forums, in social networks. To the classic functionality of identifying positive/negative in the statements of clients, recognition of their topics, identification of key features of a specific appeal and automatic selection of an employee who should enter into communication with the client on the issue of interest to him was added.

File:Aquote1.png
Voice robots fully comply with the given conversation scenario. At the same time, the bank is protected from the negative impact of the human factor, for example, from violation of 230-FZ, from possible data leakage through an employee of the Call Center, from accidental disclosure of information, for which the bank can be fined, - emphasizes Ivan Barchuk, director of the data collection, storage and analysis department of VS Lab.
File:Aquote2.png

Analysts Gartner predict in the Market Guide for Speech-to-Text Solutions 2020 study that by 2025, 40% of all incoming voice calls to Call Centers will use Speech-to-Text (STT) technology for subsequent analytics and improvement. business processes

What aspects of contact centers do companies intend to develop?

Source: Deloitte Digital, Global Contact Centre Survey, May 2019

Market research data confirms: the main expectations of the corporate sector in terms of the development of Call Centers are associated with the capabilities of AI to identify all kinds of useful data from speech, automation of processes (for example, the ability to redirect interactions to operators based on predicting the result of communications with a robot), advanced business analytics, for example, "Voice of Customer" etc.

Speech analytics

Gartner analysts in their 2019 report Cool Vendors in Speech and Natural Language noted that promising advanced speech recognition solutions can identify a lot of useful information from audio messages, in addition to recognizing specific words.

  • Analysis of emotions. The first thing Call Centers began to experiment with was identifying dissatisfied or angry customers who need to immediately remove the negative. Emotion analysis is still a field of experimentation, both for professional voice developers and corporate clients

Tinkoff Bank has created a voice robot with empathy. He knows how to fill in pauses in a conversation, pushing in a "human way": "yeah, so-so," etc., as well as ask shortened open questions in response to one or another phrase of the interlocutor like "Why so?," etc.

An interesting project of this kind was announced in the spring by the Australian university Monash University: it is developing an application for smartphones designed to help translate into another language in real time, which will be used together with smart glasses. The developers of a new solution focused on the interaction of people speaking different languages ​ ​ include in its functionality, in addition to automatic translation, also recognition of emotional signals contained in the interlocutor's speech, as well as body language and facial expression. It is expected that the system will be able to recognize the bewilderment or negative reaction of another person in real time and make a recommendation to correct the situation. It is planned to release a prototype application in March 2023.

It is worth noting that the scale of the "skills" of such robots is achieved through narrow specialization. At Tinkoff Bank, it was created for a narrow task - calling customers with a small survey on the quality of services. Another narrow niche for the application of AI technologies is working with objections. It is not easy for a real employee here to lead the conversation in the right direction, but the software system comes to the rescue - it offers tips, for example, from the corporate knowledge base, to help turn the conversation in the right direction.

  • Determining the gender, age of the speaker. Voice assistant Alice from Yandex learned to distinguish adults from children and form different messages for different categories.
  • Identifying intentions from speech. VTB launched predictive models for a smart voice assistant in pilot mode in February: it will not only identify the client at the time of the call, but also analyze his history of interaction with the bank, including previous appeals and issued services, and immediately form a hypothesis, which is associated with the current appeal.

Stanislav Ashmanov from Nanosemantics says that determining the user's intentions is a task that has long been well solved in the event of a limited number of possible nodes in the dialogue. For example, within the DialogOS "Nanosemantics" dialog platform, a high percentage of correctly recognized intentions is achieved, including by combining the rule-based approach and neural networks for classification. {{quote 'The latter are based on transformers, which, like wav2vec2, are pre-trained without a teacher on large volumes of texts and, in a sense, they learn the grammar of the language, stable constructions, in what context what words are used, in what form. And then they will not study for long on specific examples for a limited number of dialog nodes, after which they can understand the intentions in messages that are not captured by the rule-based approach, "explains Stanislav Ashmanov. }}

Voice assistants and bots

Today, the words of K. Prutkov can be said about voice assistants: they are like flux - their completeness is one-sided. Smart voice assistants take on routine features in the first place and close most of the standard questions that customers deal with in the company. So, in Mosenergosbyt, using the voice menu, customers can transfer readings of electricity metering devices, find out the status of the account, their electricity tariff, get an explanation about the reasons for the debt, apply for additional services or find out the status of the execution of this application. Among other things, during the dialogue with the automated system, contact numbers are checked.

File:Aquote1.png
The efficiency of debt collection by an automated service is comparable to the efficiency of operators, but has much greater productivity, which allows you to completely abandon operators during primary calls, says Anton Sunkin.
File:Aquote2.png

In banks, the popular field of application of AI is collector robots. The first to take this path was Sberbank, introducing in 2016 such a voice bot in the subsidiary Asset BC. A year after the start of his work, it turned out that the effectiveness of the robot in the field of communication with debtors is 24% higher than that of operators. The robot collector also took office at VTB. The bank especially noted the tirelessness of the bot, which is able to make an almost unlimited number of calls per day.

Unlike a "live" operator, which can easily cover a wide range of topics in a conversation with a client, voice assistants always have a narrow specialization, which is explained by the specifics of their preparation for work - for this, special dictionaries of terms and arrays of texts where these terms are found are used. For example, a head assistant sold at VTB can help the client transfer money by phone number and between their accounts, replenish the account for communication services. Recently, he has acquired a new skill - to order at the request of the client - SIM a card "" and VTB Mobile get all from him all the necessary information time express and address of delivery. It is planned that in the future the voice assistant will learn to buy for users "" VTB Online tickets for or plane train, book a hotel room, book a table in a cafe and purchase movie tickets.

The robot BSS was introduced into the Rent-a-Ride service, which hosts offers for short-term rentals and rental of cars private owners. The company faced a problem: a quarter of Rent-a-Ride customers prefer to apply by phone, but operators could not outsourcing contact center cope with the flow of calls - customers had to wait for a response on the telephone line, and there were errors in the registration of applications. After the introduction of the voice robot, the load on service managers decreased by more than 20%, customer segmentation improved, conversion growth was 30%, and revenue from applications received by phone increased 1.5 times.

At Home Credit Bank, according to Natalia Bibetko, head of the Service Process Automation Department, bots total more than 65% of customer issues on the inbound/outbound line and sell new products. She names among the unique abilities of the voice assistant "Maria" the ability to make voice identification by phone, help in generating PIN codes, make full and partial early repayment, pay a loan by phone, find out the date and amount of payment, clarify the receipt/debit, etc. Moreover, this can be done during the call not only from a mobile, but also a city phone.

A virtual assistant working for Tele2 helps customers in the digital environment: social networks, instant messengers, chats, a mobile application and on the company's website. Today he advises clients on 2 thousand topics.

During the project to create the Anton bot for Rosbank, the developers of the virtual assistant from CTI focused on the personalization of the software robot: it received a personal voice, and not only technical characteristics are used to assess its consumer qualities, but also metrics such as "knowledge," "intelligibility," "sociability" and even "charisma."

File:Aquote1.png
In this vein, any products and voice services that are associated with mass service of citizens are of great importance, he emphasizes. - High-quality service not only reduces the burden on employees of a particular industry (sphere health care, social policy, MPSC state management, key), industries economies but also significantly increases the share of self-service citizens.
File:Aquote2.png

File:Aquote1.png
More complex systems can act as prompters, or determine the operator's adherence to a dialog scenario, he says. - We should expect that voice bots will be able to carry out more complex dialogs in the near future, increasing the percentage of Call Center automation.
File:Aquote2.png

This is facilitated, notes Stanislav Maslov, head of robotization and custom development at Softline, by increased competition among speech platform suppliers, which makes appropriate solutions more accessible to customers, as well as tools for Low-code modeling of dialogs, which allows companies to reduce the cost of implementation and independently support solutions.

At the same time, a number of factors significantly affect the effectiveness of voice robots.

File:Aquote1.png
The bot cannot be configured once and forever. Without developing criteria for assessing the performance of the bot, it is impossible to track its effectiveness. Without taking into account external and internal factors, without "running" the system on real dialogues, without the involvement of specialized specialists who can take into account all the nuances, a technological tool with great potential can seem like an expensive trinket, - says Leonid Perminov, head of the Contact Centers department at CTI.
File:Aquote2.png

Secondly, a successful robot must be able to work in the company's information environment. The fact is that to answer a really important question for the client, information directly related to the client is usually needed, information from documents that are stored in an unstructured form, the ability to logically connect elements of knowledge to each other. At this level, only certain unique developments of voice robots are still capable of working. The problems lie not only in the lack of the necessary context-sensitive integrations at the logical level, but often in the insufficient digital maturity of the company - business processes are chaotic, and data is not available to applications.

File:Aquote1.png
Nowadays, a very small number of requests are abstract reference and informational in nature, which can be answered based on information in the internal knowledge base, says Anton Sunkin from Mosenergosbyt. - To fully respond to a request, you need to operate with information related directly to the client, perform logical and arithmetic actions with it, and only so you can formulate an answer that will fully satisfy the client. And for this internal systems are often not ready.
File:Aquote2.png

Anatoly Dyubanov says that as part of the introduction of the voice self-service system into the work of the Unified Registry 122, it was necessary to develop interaction algorithms and integration modules from the medical information system (MIS), as well as change the internal business logic of the MIS necessary to generate those arrays of information that are used in the operation of voice self-service services.

Home Bot, implemented in Home Credit Bank, is able to independently conduct a dialogue on complex scenarios and record its results in the bank's internal systems without additional checks by employees. He uses information from different systems and independently makes changes to them as a result of the dialogue. To do this, in particular, Home Bot is integrated with RPA bots that help solve customer issues.

File:Aquote1.png
For example, an analyst does not need to listen to all calls and manually score each operator, he explains. - The system can automate, if not all, then a very large part of the work with high confidence, and it, for the most part, only sometimes needs to be validated and calibrated. In some cases, the level of obtained automation of processes reaches about 80%, or even higher. The direction will undoubtedly develop in the future, both towards improving the quality of existing solutions, and expand into new tasks that can be automated.
File:Aquote2.png

To improve customer service and recovery efficiency, Home Credit Bank chose a voice-to-text technology with subsequent analytics based on the 3iTech product, which was deployed in the bank's contour. Moreover, the main scope of work of the six-month implementation fell on integration into the bank's ecosystems for the end-to-end analysis process. Part of the analytics - the duration of the call, pause, etc. - is available immediately, and for deeper analysis, the results are uploaded to the Big Data store.

File:Aquote1.png
There is no need to confuse voice assistants, expert systems, and, for example, entertainment self-learning systems. They have different goals and a different effect from the introduction, - calls Anatoly Dyubanov.
File:Aquote2.png

It makes sense to process simple requests with the help of smart and fast voice assistants, and if a person who finds himself in a difficult situation needs a number of in-depth consistent consultations of a specialist, then this task can only be done by a person or a mature expert system.

File:Aquote1.png
Compared to the initial level, advanced neural network architectures take into account a wide context, bringing the quality of language modeling to a fundamentally new level, he explains. - Such solutions are only a step away from passing the Turing test. That is, the methods of forming a dialogue are quite suitable - the only question is implementation.
File:Aquote2.png

File:Aquote1.png
There are companies that long and expensive on a huge building of billions of words teach the neural network the structure of the language. Then it can be quickly and cheaply twisted for a specific task, for example, for the private task of dividing reviews into negative and positive ones.
File:Aquote2.png

Analysts Gartner predict a period of market transformation in their Market Guide for Speech-to-Text Solutions study, published in 2020. We are talking about the fact that over the next five years we will have to see the further evolution of developer proposals into broader voice services. They will have the form of some synergistic packages - multimodal complexes of various natural language processing technologies (Natural Language Technologies, NLT). In other words, the peculiarity of voice solutions is that their development follows the path not of technical integration of individual technologies, but of their synergistic unification.

Evolution of Speech-Text systems towards synergistic packages

Source: Market Guide for Speech-to-Text Solutions, Gartner, 2020

Niche voice solutions, according to Gartner analysts, will remain in demand, but market dominance will move to NLT ecosystems. And the suppliers of these broad packages of technologies will be large cloudy providers solutions and services. AI According to Gartner analysts, it is they who will concentrate in their hands both linguistic resources and acoustic models, as well as specific mechanisms for processing natural language: speech to text (STT), text to speech (TTS), extraction of meaning from text, automatic translation, generation of natural language (NLG). There will also be conversational platforms that provide support for dialogue communications between a person and a robot.

Perhaps the achievement of this synergy is the main challenge of current voice technologies and services. It is necessary both on the part of the service developer (combining technological capabilities within a single customized service) and on the part of customers (comprehensive readiness of various services and processes to work in the format of a man-machine dialogue with consumers).

Russian conversational intelligence market: today and tomorrow

Research and consulting company Gartner noted in the report "6 Trends on the Gartner Hype Cycle for the Digital Workplace," published in 2020, that the conversational AI market has passed the peak of inflated expectations. A plateau of productivity should be expected in the future for 5-10 years: for chat bots and virtual assistants - by 2022-2025, for conversational user interfaces (CUI) - by 2025-2030.

Today's moment is interesting for the topics noted by Just AI specialists in their study "The Conversational AI Market in Russia 2020-2025," prepared last summer, that all tools and platforms of conversational AI have entered the active development phase:

  • Speech technologies: speech synthesis and recognition, voice cloning, speech biometrics, voice activation, etc.
  • Technologies and platforms for voice processing: NLP (Natural Language Processing), NLU (Natural Language Understanding), DM (Dialog Management), integration, ML models, data.
  • Low-code/No-code bot constructors: means of visual development of dialog scripts in voice or text channels.
  • Speech analytics: speech analysis platforms to determine the quality of dialogue between people.

At the same time, NLP/NLU solutions, virtual assistants, bots are in the phase of active pilots in large corporations with revenues of more than $1 billion. Gartner estimates that for large businesses, the risks of introducing conversational AI technologies are already minimal, and the benefits are high. The average business is still at the planning stage - it needs customizable solutions for a specific need. Small businesses will capture the market last, relying on boxed, requiring minimal adaptation of the solution and service partners.

File:Aquote1.png
With the increase in the availability of models and datacets, the market is gradually commodizing, and with the advent of new players and inhouse developments, it will face significant price pressure, "Just AI comments on the current state of the Russian conversational intelligence market.
File:Aquote2.png

File:Aquote1.png
They do not always compete with each other: a significant part of the players specializes in individual industries, types of customers and technologies and can dominate their segments, even having a small share in the market as a whole, Just AI explains.
File:Aquote2.png

According to the results of the Just AI study, the largest segment in the Russian conversational AI market is solutions for state and municipal institutions. In fact, the CST group of companies dominates here. In speech technologies (ASR/TTS/Biometry), the largest market volume falls on the CST, Yandex.Cloud, Tinkoff, ASM Solutions, 3iTech. Speech Analytics, CallScoring, 3iTech, and Tinkoff and Rostelecom are leading in the field of speech analytics platforms. In the No-code/Low-code segment of constructors, the most notable players at the moment are Just AI and Botmother, in the NLP/NLU/DM platform segment - Just AI, Nanosemantics, CST, in outgoing telephone communications - Neuro.net and Zvonobot.

Due to the comfortable threshold for entering the market, most players work in the segment of custom assistants, solutions for client support, skills for assistants, incoming IVR, recruiting solutions and HR. Companies with a wide variety of technological backgrounds and revenue are presented here - from large vendors and integrators to small independent studios.

File:Aquote1.png
Spoken AI solutions targeted at certain business tasks and industries, such as voice directory search for retail, virtual assistants for housing and communal services, robotic calls for customer returns, chatbots for hotels, etc. - will add 100-120% annually, medicine, HoReCa, e-commerce, tourism, beauty industry, etc., - says Just AI.
File:Aquote2.png

At the same time, the NLP/NLU/DM platform segment will continue to grow, mainly due to the involvement of new business segments and developers in the field of conversational AI: retail, insurance, transport, HoReCa, which came after the leaders - IT companies, banks, telecom operators. In addition, the business expands the scope of NLP platforms: following text chat bots, it goes to voice channels, begins to use text and voice prompters in contact centers, replaces traditional IVR with IVR with understanding of natural language, adds voice control to mobile applications and creates custom voice assistants, niche NLP solutions for marketing, HR and other areas appear.

Just AI experts consider the rapid promotion of smart speakers and screens from Yandex, Sberbank and Mail.Ru to be a key trend in the development of the conversational AI market - more than 20 million units by 2025.

Speech synthesis

File:Aquote1.png
Most companies will have voice robots, "predicts Dmitry Dyrmovsky from the CST. - Robots will have advantages that can conduct a dialogue in a natural language, which will entail the development of speech synthesis technologies.
File:Aquote2.png

Analysts predict annual growth in the global speech synthesis market by more than 30%. Obviously, in this part we also have to see a lot of discoveries: in the field of synthesized voice control (breathing, pauses, intonation, stress, etc.), as well as in terms of the ability to create full-fledged voices on a small amount of source data.

File:Aquote1.png
New technologies provide the natural sound of synthesized phrases, hybrid synthesis allows seamless gluing of voice-recorded announcers and generated replicas.
File:Aquote2.png

File:Aquote1.png
The 2022 trend is the transfer of emotions, that is, controlled synthesis, so that the speech of an assistant or virtual character sounds joyful or sad, evil or friendly, depending on the needs of the project.
File:Aquote2.png

File:Aquote1.png
Hybrid synthesis makes the sound of generated phrases indistinguishable from announcer recordings and provides seamless gluing of variables with the main part of the replica, Just AI explains.
File:Aquote2.png

You can access this solution from bots created in other services through the API. Setting up a script from scratch takes several hours, the company says. Two female voices and one male voice are available. Service payment - for each synthesized replica.

A significant trend is the cloning of voices. Analysts predict that the global vote cloning market will grow annually by more than 30% in the coming years.

The corporation presented its platform for creating custom neural voices. Microsoft The first marketplace of votes appeared on the Russian market, Aimyvoice which Just AI launched at the end of last year. There you can choose from almost two dozen a voice suitable for your tasks, for example, Krosh from Smesharikov. You can also upload to Aimyvoice in open or closed mode a model of a specific voice for speech synthesis and receive income from its use in various projects, such as voicing audiobooks, video games podcasts, voice assistants, bots, telephone projects. IVR The main thing is that the owner of the voice agrees to use it. For example, the voices of the famous dubbing actress Tatyana Litvinova were published on the Aimyvoice markeplace, she became the first actress to receive income for each synthesized minute of speech. But Krosh's voice is in limited access and is available for synthesis after concluding an agreement with copyright holders.

File:Aquote1.png
Artificial intelligence technologies interact perfectly with other end-to-end digital business technologies, - comments Alexey Lyubimov from the 3iTech.- Artificial intelligence, including conversational intelligence can be integrated into business processes in several directions at once. So, in the components of robotics and sensorics, you can use solutions for speech recognition and synthesis, as well as in the field of virtual and augmented realities. Speech analytics is used in the big data component and other areas. There are examples of the successful introduction of conversational intelligence in almost all promising areas of application of AI in industry.
File:Aquote2.png

The only question is the readiness of the business processes themselves to use end-to-end digital technologies, the expert says, without this, even chat bots will only be fashionable features that do not bring any benefit to the business, except for the release of a couple of employees.

Next Overview Material > >
> Browse Home > > >

Other Review Materials

Other materials on the topic of AI