The name of the base system (platform): | Artificial intelligence (AI, Artificial intelligence, AI) |
Developers: | |
Technology: | Speech technologies |
2020: Disclosure of source codes of Facebook M2M-100 - the text translation system
In the middle of October, 2020 Facebook opened source codes of the first system of machine learning which translates the text from one language on another, without relying on intermediate transfer in English. The M2M-100 tool became the first multilingual model of machine translation which can work with any pair from 100 languages directly.
Still multilingual models of machine translation relied on English as on some kind of intermediary because of widely available data for training. Such models in most cases not bad cope with a task, but when case concerns more difficult phrases, transfer is often inexact. Facebook states that M2M-100 saves better value as translates directly without the addressing the English word-per-word translation.
Transfer between such quantity of different language pairs - a difficult task as models need access to the large volume of high-quality data for training. The researcher of AI in Facebook Angela Fan explained that her command created the huge data set containing more than 7.5 billion offers in 100 different languages.
These data were collected using instruments of intelligent data analysis open source, such as ccAligned, ccMatrix and LASER, and then were subdivided into 14 different language groups on the basis of such parameters as linguistic classification, geography and cultural similarity. In each of these 14 language Facebook groups defined from one to three "languages bridges" which form a basis for the translation into other languages of this group. The command of the Fanns also used the equipment known as "back translation", for creation of synthetic data in addition to already got parallel transfers.
Facebook wishes to replace all existing models with M2M-100 to improve quality of transfers in the applications.[1][2]
Notes
- ↑ [1] Facebook open-sources its M2M-100 multilingual model to improve translation accuracy M2M-100
- ↑ on GitHubf