Developers: | EGO of Transleyting |
Date of the premiere of the system: | May, 2019 |
Technology: | Office applications |
The "Terminological EGOTech Portal as the Instrument of Normalization of the Text Body (Dataset) for Machine Learning in the field of Natural Languag Processing" project was included in the spring of 2019 into a portfolio of the innovative developments of Skolkovo Foundation. Transleyting EGO group acted as the developer of the project. The direction of development – "Strategic computer technologies and the software".
2019: Project description EGOTech
The essence of the provided EgoTech Terminal technology (working project title) is a creation of the tool for processing, normalization and the analysis of text data for machine learning.
The artificial intelligence is applied mainly in a business intelligence: during creation of the systems of computer vision; and also in health care and processing systems of a natural language. By natural languag processing (Natural Language Processing, NLP) also machine translation is considered. Neural networks, a basis of artificial intelligence, it is necessary to train. Training of neural network is a process in which settings of network are configured by modeling of the environment which this network is built in. Net data arrays as networks are very sensitive to their quality are necessary for machine learning of neural networks. Process of data scrubbing is called "normalization". Different criteria of normalization can be applied to each specific purpose. The main are: processing of not language elements, the uniform use of terminology, deduplication, a marking, reduction of data in a tabular form and so forth that is performed, including, using tools of the analysis and processing of language information.
The EgoTech Terminal tool is used for collecting of text data, their analysis, processing (in particular, formations of the thematic body) and normalization for training of a neuronet, including for a training of machine translation systems. Using this tool, the user receives:
- access to the acquired and cleaned text data;
- access to tools for creation and processing of the thematic body (domain-adaptive dataset) for training of machine translation systems;
- access to instruments of processing, normalization and analysis of text data.
The main need of the potential customer – to get qualitative and fast translation of large volumes of the text with the smallest costs. Using this tool, the client gets qualitative and fast industry translation due to formation of the cleaned data; accesses to the acquired and cleaned text body, tools for creation and editing thematic databases (domain-adaptive dataset).
In spite of the fact that data scrubbing (including for machine translation) is demanded technology in many industries ready and is market the confirmed analogs so far is not present. For this reason creators of a product expect to occupy a certain niche in the market of artificial intelligence.
The EGOTech portal is created counting upon the maximum openness and convenience, both for users, and for developers of machine translation systems. Sale of a subscription according to the SaaS model and rendering services in training of machine translation systems under needs of the customer is supposed. The need for such products and services will grow in process of market grouping of computer assisted translation of thematic material. Active promotion of the project in the Russian market will begin since 2020, and since 2023 steps will be taken for an exit to international market of artificial intelligence.