RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

3i Speech Recognition API

Product
Developers: 3iTech (formerly 3i Technologies)
Date of the premiere of the system: 2017/02/14
Last Release Date: 2017/07/20
Technology: IB - Biometric identification,  Speech technologies,  Application development tools

Content

3i Speech Recognition API is a cloud-based service for speech recognition from media content and subsequent professional processing.

2017

3i Speech Recognition: final version

On July 20, 2017, the 3i Technologies consortium announced the completion of the formation of the functionality of the cloud service for professional voice data processing 3i Speech Recognition, designed to work with media content and voice information flows in contact centers. The service allows you to accurately translate "speech" into text format with more than 90% accuracy, while editing it in the user interface.

At the final stage, 3i Speech Recognition was supplemented with a personal account, storage and editorial interface in which you can process text information. The service allows you to work with recordings of TV and radio broadcasts, materials of production studios, calls to contact centers, etc. According to the developers, 3i Speech Recognition allows you to load up to 18 hours of audio information into personalized cloud storage, processing it several times faster than real sound time. Then a "slender" text is issued, divided into sentences with punctuation marks placed. When you listen to the source material, the system automatically highlights the spoken fragment in the text block.

In general, according to Alexei Lyubimov, chairman of the board of directors of the 3i Technologies consortium, the service is aimed at specialists working with voice information and will be useful to quality control services in contact centers or consultants who create telemarketing scenarios, as well as TV channel divisions that decrypt television broadcasts.

The service features high accuracy of speech recognition, automatic placement of punctuation marks, a convenient editing interface that allows you to make edits to recognized text, and the ability to integrate with the most common software platforms for automating the joint activities of working groups.

3i Speech Recognition uses language and acoustic models built with the use of machine learning Recurved neural networks Neural Network (RNN) and Weighted Finite State Transmitter (WFST) technologies. The computing infrastructure is implemented with acceleration on the GPU, which allows you to get multiple increases in performance relative to the CPU.

Language models for improving recognition quality can be adapted to a narrow subject area. For example, to translate certain topics into the text of TV shows or to process highly specialized telemarketing scenarios.

The service supports Russian, English, Chinese, German and Spanish. In addition, a machine translation service can be integrated into 3i Speech Recognition.

The test version of the service is available at the link.

3i Speech Recognition Beta Presented

On February 14, the press service of the 3i Technologies consortium announced the development of a cloud service for professional speech data processing 3i Speech Recognition API. The service with an accuracy of more than 90% translates television and radio broadcasts, media archives of TV channels and radio stations into text format.

3i Speech Recognition API works with audio and video of any duration, processes files downloaded to the cloud several times faster than real sound time and generates text at the output, broken into sentences, including punctuation marks.

Sound oscillogram, (2015)

Beta version 3i Speech Recognition API is open for public testing.

File:Aquote1.png
This is a specialized service focused on processing precisely television or radio content. We developed unique models that made it possible to achieve very high accuracy of recognition. We hope that the service will be useful to professionals who work with media content. In the future, it can become part of high-tech solutions for the mass consumer, for example, the basis for translating foreign channels and subtitling in real time. The consortium companies already have all the technologies to create such a product.

Alexey Lyubimov, Chairman of the Board of Directors of the 3i Technologies consortium
File:Aquote2.png

The service uses language and acoustic models created using Recurved machine learning neural networks Neural Network (RNN) and Weighted Finite State Transducer (WFST) technologies. The computing infrastructure is implemented on the basis of the GPU, which gives a multiple increase in performance, compared to the CPU.

Language models for improving the quality of recognition can be adapted to a narrow subject area, for example, for translating into text "economic" or "industry" programs in which speakers use professional vocabulary.

Beta version of 3i Speech Recognition supports Russian and English. According to the creators of the service, 3i Speech Recognition will be useful for software developers, system integrators, specialists in the field of creating and processing media content (broadcasting companies, production studios, creative agencies, freelancers, etc.).