RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

GigaAM (Giga Acoustic Model)

Product
The name of the base system (platform): Artificial intelligence (AI, Artificial intelligence, AI)
Developers: SberDevices (SalyutDevices, formerly SberDevices)
Date of the premiere of the system: April 2024
Branches: Information Technology

2024: Product Announcement

In early April 2024, SberDevices introduced a set of Open Source machine learning models for speech and emotion recognition. The development available to everyone for free was called GigaAM (Giga Acoustic Model).

One of the models - Audio Foundation Model - is trained in a variety of Russian speech. It is suitable for adaptation to various tasks of working with sound, including speech and emotion recognition, definition of an announcer, etc.

SberDevices unveils set of open machine learning models for speech and emotion recognition

The open model for recognizing Russian-language requests GigaAM-CTC, according to the developers, allows 20-35% less errors in words in short requests compared to popular solutions such as NeMo-Conformer-RNNT and Whisper-Large-v3.

Another model from the GigaAM set is called GigaAM-Emo. This acoustic emotion model, according to SberDevices, showed the best result on the largest Dusha datacet among well-known models.

All models are placed in the public domain with a non-commercial license and can be used to prepare theses and scientific articles. SberDevices adds that improved versions of these models are available to businesses on the company's platform for speech synthesis and recognition, and SaluteSpeech API individuals can use them in the application. SaluteSpeech App

As noted in the company, one of the problems of machine learning is the collection of training data. For the tasks of speech technologies, this issue is all the more acute, since the data used are of a complex nature. For example, it is difficult for a person to determine the speaker's emotion from an audio recording and analyze the content of speech in noisy conditions, so several experts can mark the same audio recording. This slows down the markup process and increases its cost. GigaAM models solve this problem.[1]

Notes