RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

MERA (Multimodal Evaluation for Russian-language Architectures) Benchmark for evaluating fundamental models

Product
Developers: AI-Russia Alliance
Date of the premiere of the system: 2023/11/22

2023: Benchmark announcement

On November 22, 2023, Sberbank proposed to the Alliance in the field of AI the concept of an independent MERA benchmark for evaluating fundamental[1]. It will help objectively evaluate large language models, which is especially true with the increase in their number and capabilities.

Benchmarks are often used to test models, which are a set of complex problems, the solution of which reflects the abilities of models in various domains, tasks or modalities. Through this assessment, users can understand how to apply the model, and researchers can gain objective information for further learning, adapting or developing language models.

The language models behind services such as GigaChat, ChatGPT and others need objective comparison and independent evaluation. The team of SberDevices, Sberbank's partner company, in collaboration with partners from the academic community, developed a testing methodology. It includes 21 tasks in an instructive format for testing fundamental models. The tests cover questions of knowledge about the world, logic, causal relationships, the ethics of artificial intelligence, the memory of models and much more. Teams from Sber AI, Skoltech AI, HSE participated in the creation of the tests. This is how the open MERA benchmark appeared: Multimodal Evaluation for Russian-language Architectures. This is the concept of a single independent leader board with fixed tasks verified by experts and standardized configurations of industrial pots and parameters.

{{quote 'author = said Sergey Markov, Head of the Department of Experimental Machine Learning Systems of the Salyut General Services Division of Sberbank. | Every day the boundaries of the use of artificial intelligence are expanding. It is more important than ever for us to have an up-to-date idea of ​ ​ the real abilities of language models. A set of tests is an opportunity for the industry and the academic community to explore the abilities of fundamental models, objectively evaluate them, and develop collaborations both within the Russian Federation and in the international arena. We invite other companies, including members of the Alliance in the field of AI, to join the discussion of methodology and fix generally accepted industry standards,}}

Notes