Developers: | AI-Russia Alliance |
Date of the premiere of the system: | 2023/11/22 |
Last Release Date: | 2024/09/25 |
Content |
2024: API Support
The Alliance in the Field of Artificial Intelligence has presented an updated version of the MERA benchmark: it includes a dynamic leaderboard, an updated measurement code base, a more advanced prompts system and improved datacets, API support, as well as measurements of dozens of new models, including those created by OpenAI. The Alliance announced this on September 25, 2024.
The updated version of the benchmark includes 15 main tasks, of which the rating is built, and 8 open public datacets.
Since the release of the first version of the benchmark, dozens of model developers have used it, sending over 1000 sabmites. The improvement of MERA was made possible thanks to user comments and feedback from members of the NLP community. The further development of MERA provides for the emergence of tasks in it for assessing image recognition, audio and video materials.
2023: Benchmark announcement
On November 22, 2023, Sberbank proposed to the Alliance in the field of AI the concept of an independent MERA benchmark for evaluating fundamental[1]. It will help objectively evaluate large language models, which is especially true with the increase in their number and capabilities.
Benchmarks are often used to test models, which are a set of complex problems, the solution of which reflects the abilities of models in various domains, tasks or modalities. Through this assessment, users can understand how to apply the model, and researchers can gain objective information for further learning, adapting or developing language models.
The language models behind services such as GigaChat, ChatGPT and others need objective comparison and independent evaluation. The team of SberDevices, Sberbank's partner company, in collaboration with partners from the academic community, developed a testing methodology. It includes 21 tasks in an instructive format for testing fundamental models. The tests cover questions of knowledge about the world, logic, causal relationships, the ethics of artificial intelligence, the memory of models and much more. Teams from Sber AI, Skoltech AI, HSE participated in the creation of the tests. This is how the open MERA benchmark appeared: Multimodal Evaluation for Russian-language Architectures. This is the concept of a single independent leader board with fixed tasks verified by experts and standardized configurations of industrial pots and parameters.
{{quote 'author = said Sergey Markov, Head of the Department of Experimental Machine Learning Systems of the Salyut General Services Division of Sberbank. | Every day the boundaries of the use of artificial intelligence are expanding. It is more important than ever for us to have an up-to-date idea of the real abilities of language models. A set of tests is an opportunity for the industry and the academic community to explore the abilities of fundamental models, objectively evaluate them, and develop collaborations both within the Russian Federation and in the international arena. We invite other companies, including members of the Alliance in the field of AI, to join the discussion of methodology and fix generally accepted industry standards,}}