RSS
Translated by

3i Speech Transcriptor (3i ASR)

Product
Developers: 3iTech (earlier 3i Technologies)
Last Release Date: 2020/05/22
Technology: Cybersecurity - Biometric identification,  Speech technologies,  Development tools of applications

Content

3i Speech Transcriptor is the special software which is intended for the solution of a problem of conversion of the speech to the text which is transferred on media channels (TV, radio) and on channels of telephone communication (traditional, cellular, the IP telephony).

3i Speech Transcriptor provides to API developer.

2020: Optimization of speech model for understanding of a youth slang

Company 3iTech 22 of May, 2020 announced that it optimized speech model which is used in the system of speech recognition 3i by ASR. Now platforms which are built on this speech engine will be able "to understand" a youth slang and confused informal conversation.

Speech engine 3i ASR is used during creation of wide range of products, for example, of chat-bots and voice assistants; it is used for creation of the intellectual "first line" in contact centers and technical support services. The voice systems quite often should deal with a slang or rambling speech that complicates recognition and correct "understanding" of told. Inclusion in a language model reduced, specific and other layers of colloquial lexicon increases correctness of recognition and expands possibilities of application both the speech engine 3i ASR and constructed on its base of platforms, emphasized in 3iTech.

«
In a real language situation people often use specific words and expressions: it both a youth slang, and the settled reductions, and filler words. How we speak in life, is not similar to TV broadcasting or dialogs from fiction at all. Therefore it is sometimes difficult to intelligent systems "understand" people. We improved speech model, having turned on in it layers of human colloquial culture", - Alexey Lyubimov, the chairman of the board of directors of company 3iTech noted.
»

3iTech uses 3i ASR for creation of the specialized systems and software packages. For example, on its basis are constructed the platform speech analytics 3i TouchPoint Analytics and cloud AI- the platform 3i VOX which it is already used v retail, v banks, v telecommunication the companies. Solutions based on speech engine 3i ASR are already applied in contact centers and client offices.

2019

Development 3i ASR 2.0

On September 19, 2019 company 3i Technologies reported that her experts developed the speech recognition engine 3i ASR 2.0 which will allow to improve considerably quality of work of products and services of the company. Thanks to 3i ASR 2.0 systems will be able to understand the live human speech more precisely. The engine will be used as in the released products and services of the company, and in those which are already presented at the market.

The engine is constructed based on architecture end-to-end using neuronets and machine learning (machine learning). 3i ASR 2.0 is trained at selection in several thousands of hours with augmentation of data (introduction of different types of distortions). It allowed to reduce considerably the level of a fractional error and to increase quality of recognition of the live speech.

Computing infrastructure is implemented with acceleration on GPU that allows to receive a repeated gain of performance concerning CPU. The provided engine gives the chance to distinguish arrays of the voice information in more than hundred times quicker than their real sounding.

«
The sensing technologies of the speech and a system created with their use change usual services. All already faced that a speech system can be met in contact center, having called technical support. Electronic devices when we by voice dictate, for example, search query perfectly understand us. Products 3i Technologies control dialogs of staff of transport companies with clients, or communication of workers of retail. On "voice" we identify telephone swindlers. B every day scope of application speech technologies extends, at the same time customers become more and more exacting to quality of recognition and processing speed of the voice information. And our engine is a notable step forward,
tells Lyubimov Alexey, the chairman of the board of directors 3i Technologies
»

The engine will be integrated into a cloud service for professional processing of speech data 3i Speech Recognition, into a cloud platform of speech analytics 3i TouchPoint Analytics and other products and services of the company. Migration on this engine will happen seamlessly.

Integration into the Personal IT platform

On January 25, 2019 company 3i Technologies announced signing with IT Prof company of the agreement on cooperation in the field of creation of voice intellectual services and chat-bots. Developers "IT Prof" got access to the voice platform and tools 3i Technologies. Read more here.

Technology. Characteristics. Modifications

Technology

According to information for January, 2019 of a possibility of speech recognition 3i Speech Transcriptor are based on DNN and WFST technologies — deep neural networks (deep neural networks) and the weighed finite state machines (weighted finite state transducer).

Main Characteristics

The sensing technologies of the speech used by the company provide:

  • high processing speed of a voice signal, due to parallelization of calculations;
  • due quality of speech recognition;
  • possibility of flexible configuration of the module of speech recognition on channel type (television and radio), telephony (traditional, cellular, the IP telephony)) and\or language due to use of the trained models extended independently;
  • diktoronezavisimy continuous speech recognition, including with accent, external noise, nonverbal sounds, music;
  • recognition of files or flows of the speech of unlimited length due to division of records on pauses in the speech and * recognitions of the turned-out pieces in separate flows of the CPU;
  • the big dictionary of recognizable words including hundreds of thousands of words that is almost enough for recognition of any text of the general lexicon.

Adaptation of the existing language models and development new under requirements of the customer is possible.

Modifications

For January, 2019 two product modifications focused on different sources of input data are available:

  • Phone is processing of speech data from express speaker circuit
  • Broadcast is processing of speech data from the media (telebroadcasting) channel

System requirements (minimum)

Notes