Developers: | Nanosemantics Lab |
Date of the premiere of the system: | 2021/03/15 |
Technology: | Information Security - Biometric Identification, Speech Technologies |
The main articles are:
- Speech technology: On the path from recognition to understanding
- Biometric identification technologies
NLab Speech is a set neural network algorithms of audio signal processing and text analysis trained and calibrated on a large number of hand-marked speech. data
2021: Bringing the Solution to Market
An expert in the field of artificial intelligence, Nanosemantics has entered the speech recognition market. A resident of the Skolkovo Foundation Information Technology Cluster presented NLab Speech technology. With its help, you can, for example, reduce the cost of call centers, facilitate filling out documents and improve the quality of life of people with disabilities. The Skolkovo Foundation announced this on March 15, 2021.
As of March 2021, the NLab Speech accuracy indicator (reverse Word Error Rate) is more than 82% on noisy data from telephony. And the data processing speed in the Nanosemantics cloud reaches 6 real-time factor - this is 40-80% higher than the speed of competing cloud services.
We are already on a par with the leaders in the accuracy of voice technologies working in Russian, and strive to surpass them qualitatively. There are all prerequisites for this: we are improving language and acoustic models, a neural network-punctuator. We collect even more high-quality data for training neural networks. Also, to improve the accuracy of speech recognition, we plan to introduce audio classification in NLab Speech by gender, age, speech speed, pitch, volume and speaker emotions. Moreover, we plan to add a classification of places by the noise of the speaker's environment. At the same time, the development of English, Chinese and Korean ASR is underway, said Stanislav Ashmanov, CEO of Nanosemantics
|
It is difficult to overstate the improvement in the quality of voice robots based on automatic speech recognition (ASR) from Nanosemantics for organizations that trust customer service to machine learning. With high-level voice and word recognition capabilities, the voice assistant replaces dozens and hundreds of call center employees, reducing the company's staff costs and speeding up customer service. The implementation of ASR will greatly facilitate and optimize work in other areas of the business. For example, health workers with the help of voice filling out documents will be able to quickly compile anamneses, and people with disabilities will improve their quality of life through voice technologies, |
The team worked on the creation of the technology for more than two years. To prepare a large array of training data, Nanosemantic has developed a platform for marking them up Nanosemantics Marker. With its help, the data is converted into a format suitable for training neural networks.
Unlike humans, the neural network in NLab Speech analyzes the sound signal as an image: each audio is matched by its spectrogram, after which the neural network translates the spectrogram into text assumptions about what was said in the audio. The best option is determined using a language model that takes into account the frequency indicators of the joint occurrence of words.
To train acoustic models, more than 12 thousand hours of audio were collected from various sources: call centers, voice messages, audiobooks, webinars. Datasets have also been prepared to train models that show better results on recordings from the microphones of user devices, such as smartphones and laptops. I had to take into account reverb and equalization when working with audio recordings from different sources and obtained during recording under different conditions.
For March 2021, Nanosemantics 's NLab Speech speech recognition technology is a self-sufficient technology that repeats a person's speech capabilities and does not require participation in third-party services. Fast and scalable speech recognition works on both processors and video cards. NLab Speech includes both file-based speech recognition and streaming. The first produces only the final result, and the streaming one, including the intermediate words after each spoken word, which are adjusted depending on the continuation of speech (the same principle is used, for example, in Apple Siri). Among other things, ASR from Nanosemantics works with the main communication protocols: websocket, grpc and mrcp - this provides the flexibility of NLab Speech to integrate the service to a specific client. There is also a breakdown of stereo recordings by dialog replicas for ease of use of ASR results in voice analytics systems. NLab Speech automatically corrects text writing, corrects errors, and punctuates.