Developers: | HSE - St. Petersburg (St. Petersburg branch of HSE) |
Date of the premiere of the system: | August 2025 |
Branches: | Information Technology |
Content |
History
2025: Datacet Launch
Researchers at the National Research University Higher School of Economics (HSE) in St. Petersburg have developed and released a multimodal emotional datacet for teaching artificial intelligence systems to analyze human emotions. This was reported to the Higher School of Economics at the end of August 2025.
Anastasia Kolmogorova, head of the Language Convergence Laboratory, and researcher Elizabeth Kulikova created a data set that includes 909 fragments of video recordings with a total duration of 173 minutes with markup on six basic emotions.
The created resource presents a new approach to systematizing linguistic knowledge for the digital age. Dataset replaces the traditional word-interpretation pair with the text-fragment-emotional-label format.
Kolmogorova explained that large language models are able to capture hidden patterns that people feel at the subconscious level, but cannot formalize. Well-organized and marked up data are adapted to work with neural networks and change the professional tasks of linguists.
The study covers four different formats for presenting information: full video as a base, isolated audio, text decryption, and silent video without audio. Each passage received a marketer rating in six basic emotional categories.
The experiment revealed unexpected patterns in human emotion recognition. The study participants most consistently determined the emotional coloration when reading the written text. Listening to audio recordings alone resulted in a greater scatter of opinions, with silent video showing the worst consensus results.
Detailed analysis showed the specifics of the manifestation of different emotions in different communication modalities. Joy and surprise are most accurately recognized through sounding speech thanks to intonation features. Anger is identified by text in 72.9% of cases versus 67.4% for audio format.
Fear turned out to be the most verbal emotion - it is recognized by text and audio in 87% of cases, which indicates the critical importance of verbal markers. In terms of mimic manifestations, fear is practically unreadable - only 3.5% of successful recognition.[1]