RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

Yandex: RATE (Refined Assessment for Translation Evaluation) Metric for evaluating machine translation

Product
The name of the base system (platform): Artificial intelligence (AI, Artificial intelligence, AI)
Developers: Yandex
Date of the premiere of the system: 2025/12/04
Technology: Data Quality

Main Article: Data Quality Management

2025: Method Presentation to Evaluate and Improve Machine Translation

Yandex researchers have developed a new method for assessing the quality of machine translation. The company announced this on December 4, 2025. This development can be used to improve models that already translate texts quite accurately, but do not always naturally. For example, in an informal dialogue, the model can translate "sorry, my bad" as "I apologize, it's my fault" instead of "sorry, wrong." The user will notice that the neural network has chosen an overly official tone, but the existing translation assessment systems ignore such errors. This method helps to pay attention of neural networks to such shortcomings.

This Yandex evaluation system is called RATE (Refined Assessment for Translation Evaluation). It is not used directly for further training of translation models. But RATE allows you to assess with high accuracy where exactly modern models are wrong and what needs to be improved so that their translations become more accurate and natural for the user.

Unlike other metrics, RATE evaluates translation according to the three main criteria for the user: the accuracy of the transmission of meaning, the naturalness of the language and the correspondence to the style of the original. This allows you to use the method for any type of text. For example, with its help, in the news you can check the accuracy of the transmission of facts, in the posts of social networks - to reveal the excessive formality of phrases, and in artistic texts - to evaluate the style and smoothness of speech. RATE not only notes the error itself, but also assesses its significance - from small inaccuracies to severe distortions.

According to Yandex, a comparison on the data of the WMT international competition showed that RATE detects seven times more errors than other assessment methods - MQM (Multidimensional Quality Metrics) and ESA (Error Span Annotation - error range annotation). The results of the experiment were evaluated by highly qualified AI trainers. The comparison proves that other metrics do not detect many shortcomings in the translations of neural networks that users notice.

The experiment showed that machine translation models have made significant progress in accuracy. At the same time, the standard of naturalness and fluidity of speech is still human translation, although the large Yandex language model has already approached this level, ahead of models such as Claude-3.5 and GPT-4.

File:Aquote1.png
When we worked with other methods, we lacked detail. MQM is too complex, and ESA only notices gross errors. These metrics help verify accuracy, but do not allow you to assess how natural translation is. Namely, today it has become the main criterion for the user in the perception of translation. RATE allows you to assess both the accuracy and naturalness of the translation, gives a more complete picture of its quality and can tell developers how to develop the model to improve the translation, "said Ekaterina Yenikeeva, head of the translation quality assessment team at Yandex.
File:Aquote2.png

Yandex already uses RATE to improve its models, adapting their translations to different scenarios - from business correspondence to informal communication. RATE also helps Yandex create algorithms focused on live human speech, not just formal criteria.