RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

3i Speech Transcriptor (3i ASR)

Product
Developers: 3iTech (formerly 3i Technologies)
Last Release Date: 2021/03/03
Technology: IB - Biometric identification,  Speech technologies,  Application development tools

Content

Main articles:


3i Speech Transcriptor is a special software that is designed to solve the problem of converting speech into text transmitted through media channels (TV, radio) and telephone channels (traditional, cellular, IP-telephony).

3i Speech Transcriptor provides the developer with an API.

2021: Reduced probability of speech recognition error by 20%

The company 3iTech March 3, 2021 announced the improvement of the acoustic model of the speech recognition engine 3i ASR. This made it possible to reduce the probability of a speech recognition error by 20%, and thereby improve the quality of solving business problems.

The acoustic model is used in speech recognition systems to convert a speech signal into letter hypotheses, which are then built into coherent text using a language model. The accuracy and correctness of solving business problems depends on the quality of the received texts. Updating the acoustic model of the 3i ASR speech engine allowed to reduce the probability of error (WER, Word Error Rate) on test samples covering a large range of acoustic conditions, including complex ones, by 20%. To do this, a modification of the neural network architecture was carried out, which allowed not only to increase accuracy, but also to significantly reduce the number of parameters being trained. In addition, the process of moving to a continuous cycle of self-training of models was launched.

The 3i ASR voice engine is used in the voice analytics system3i TouchPoint Analytics and in the 3i VOX platform, which are used to solve various business problems: from building enterprise voice analytics systems to intelligent talk bots. The company's development is 3iTech used to monitor the quality of service and identify the best sales practices in banks, retail, and contact centers of large developers.

File:Aquote1.png
In terms of speech recognition quality, our products are among the best on the Russian market. We use technology in our development. machine learning Our specialists regularly teach acoustic and language models. Changing the structure of the acoustic model not only significantly increased the accuracy of processing, voice data but also opened up the possibility of moving to its self-training. That is, in the future, our systems will be able to improve automatically, "says the Alexey Lyubimov founder and CEO of 3iTech.
File:Aquote2.png

2020: Optimizing the speech model for understanding youth slang

The company 3iTech May 22, 2020 announced that it has optimized the speech model, which is used in the speech recognition system 3i ASR. Now the platforms that are built on this speech engine will be able to "understand" youth slang and confusing spoken speech.

The speech engine 3i ASR is used to create a wide range of products, for example, chat bots and voice assistants; is used to create an intelligent "first line" in contact centers and technical support services. Voice systems often have to deal with slang or incoherent speech, which complicates the recognition and correct "understanding" of what is said. The inclusion of reduced, specific and other layers of colloquial vocabulary in the language model increases the correctness of recognition and expands the possibilities of using both the 3i ASR speech engine and platforms based on it, the 3iTech emphasized.

File:Aquote1.png
In a real language situation, people often use specific words and expressions: this is youth slang, and established abbreviations, and parasite words. The way we speak in life is not at all like television broadcasting or dialogues from fiction. Therefore, it is sometimes difficult for intelligent systems to "understand" people. We improved the speech model to include layers of human conversational culture, "said Alexei Lyubimov, chairman of the board of directors of 3iTech.
File:Aquote2.png

3iTech uses 3i ASR to create specialized systems and software complexes. For example, a platform speech analytics 3i TouchPoint Analytics and cloudy AI a platform are built on its basis, 3i VOX which are already used in, in retail, in banks telecommunication companies. Solutions based on the ASR voice engine 3i are already used in contact centers and client offices.

2019

3i ASR 2.0 Development

On September 19, 2019, 3i Technologies announced that its experts had developed the speech recognition engine 3i ASR 2.0, which will significantly improve the quality of the company's products and services. With 3i ASR 2.0, systems can better understand live human speech. The engine will be used both in the products and services of the company, and in those that are already represented on the market.

The engine is based on end-to-end architecture neuronets using and (machine learning machine learning). 3i ASR 2.0 is trained on a sample of several thousand hours with augmentation (data introducing different types of distortion). This made it possible to significantly reduce the level of relative error and improve the quality of live speech recognition.

The computing infrastructure is implemented with acceleration on the GPU, which allows you to get multiple increases in performance relative to the CPU. The presented engine allows you to recognize arrays of speech information more than a hundred times faster than their real sound.

File:Aquote1.png
Speech recognition technologies and systems created using them change the usual services. Everyone has already encountered the fact that the voice system can be found in the contact center by calling technical support. We are well understood by electronic devices when we dictate in our voice, for example, a search query. The products of 3i Technologies control the dialogues of employees of transport companies with customers, or the communication of retail workers. By "voice," we identify phone scammers. Every day, the scope of speech technology is expanding, with customers becoming increasingly demanding on the quality of recognition and speed of processing voice information. And our engine is a tangible step forward,
File:Aquote2.png

The engine will be integrated into the cloud service for professional voice data processing 3i Speech Recognition, into the cloud platform for voice analytics 3i TouchPoint Analytics and other products and services of the company. Migration to this engine will happen seamlessly.

Personal IT Platform Integration

On January 25, 2019, 3i Technologies announced the signing of a cooperation agreement with Prof. IT in the field of creating voice intelligent services and chat bots. ProProfIt developers gained access to the voice platform and 3i Technologies tools. More details here.

Technology. Characteristics. Modifications

Technology

According to information for January 2019, the speech recognition capabilities of 3i Speech Transcriptor are based on DNN and WFST technologies - deep neural networks and weighted finite state transducer.

Main characteristics

Our voice recognition technology provides:

  • high speed of speech signal processing due to paralleling calculations;
  • proper quality of speech recognition;
  • flexible adjustment of the speech recognition module to the channel type (television and radio), telephony (traditional, cellular, IP-telephony)) and\or language by using trained models distributed independently;
  • Dicto-independent recognition of fusion speech, including in the presence of accent, external noises, non-speech sounds, music;
  • recognition of files or voice streams of unlimited length by dividing records by pauses within speech and * recognizing the resulting pieces in individual CPU streams;
  • a large dictionary of recognized words, including hundreds of thousands of words, which is almost enough to recognize any text of common vocabulary.

It is possible to adapt existing language models and develop new ones to the requirements of the customer.

Modifications

As of January 2019, two product modifications are available, focusing on various sources of input:

  • Phone - Voice Data Processing from Telephone Channel
  • Broadcast - processing of voice data from the media (broadcasting) channel

System requirements (minimum)

Notes