The name of the base system (platform): | Artificial intelligence (AI, Artificial intelligence, AI) |
Developers: | Nanosemantics Lab |
Date of the premiere of the system: | 2023/11/14 |
Technology: | Speech technology |
The main articles are:
- Speech Recognition (Technology, Market)
- Speech technology: On the path from recognition to understanding
2023: Technology introduction to define key phrase
The KWS technology developed by Nanosemantics in conjunction with a voice activity detector - Voice Activity Detection (VAD) can improve the accuracy of voice request recognition and improve the quality of digital assistants - smart speakers and digital assistants on online platforms and in applications. The results of testing conducted for a commercial customer showed that the accuracy of recognition of a key phrase by a digital assistant increased 12 times compared to the previous model due to the introduction of a combination of VAD and KWS technologies developed by Nanosemantics.
A qualitatively better result was achieved due to the chosen architecture of the CNN-Transformer model, the logic of processing streaming audio, as well as a voluminous and diverse data set. VAD technology can "distinguish" a person's speech from other noises, and then direct the necessary passages with offsets to further recognition by the definition model of the key phrase - KWS.
The model learns to determine the selected key phrase on which the digital assistant is activated. For training, consonant facial expressions and a large number of different distortions (augmentations) are used - they allow you to achieve the model's resistance to interference and words similar to the key phrase.
When developing turnkey to improve KWS quality metrics data , the Nanosemantics team of assemblers and markers collected database audio with the recorded female and male catchphrase by voices in different versions: neutral, loud, whisper, slowly, turning away from the device, and so on. In addition, a breakdown is made by sound quality: part of the audio recordings are recorded in ideal "studio" quality, the other part - with extraneous noise in various rooms and street conditions. The total data duration for the datacet exceeded 100 hours.
VAD and KWS hardly drain the battery, and can also be launched on most smartphones, including offline, due to their small volume - the weight of VAD based on the CNN BilSTM model is 0.5 MB, and KWS - 4 MB.
Due to high-quality work with data and the use of optimal neural network architectures, the bundle of VAD and KWS "Nanosemantics" modules can significantly improve the quality of the assistant's work in recognizing a key phrase, which is important for activating voice assistants. This solution is integrated into applications and platforms in all segments - from retail to banks, and is also used independently in smart speakers. It depends on the accuracy of the KWS technology how well you will be "understood" by the voice assistant, turning on at the moment when you really asked him about it, "said Pavel Sukhachev, director of Data Science at Nanosemantics. |