The name of the base system (platform): | Artificial intelligence (AI, Artificial intelligence, AI) |
Developers: | Moscow Institute of Physics and Technology (MIPT), Institute of Artificial Intelligence (AIRI), Russian Academy of Sciences (RAS) |
Date of the premiere of the system: | 2024/02/16 |
Technology: | Robotics |
Main article: Robots (robotics)
2024: Announcement of the method of training robotic systems to determine the optimal algorithm of actions
Scientists from the Moscow Institute of Physics and Technology, the Institute of Artificial Intelligence AIRI and the Federal Research Center "Informatics and Management" of the Russian Academy of Sciences have developed a method for controlling a robotic system that performs its actions based on text instructions and visual information. The work is published in the IEEE Access journal. This was announced on February 16, 2024 by representatives of the Moscow Institute of Physics and Technology.
As reported, as a result, the robotic system was able to navigate in an unfamiliar environment and independently determine the algorithm of actions optimal for solving the task. Scientists believe that the further development of the technique will allow the creation of robots for autonomous execution or complex multi-step operations without human participation. According to scientists, this is a non-trivial task that has not yet been solved by anyone in the world. All developments in this area are still at the prototype level.
As a model, we used a roboruca with six degrees of freedom. Our goal was to teach her to sort by colors on her own and collect them into a given area. The roboruca should have its actions on the basis of text instructions and data from video cameras. explained the essence of the scientific work Alexey Staroverov, one of the authors of the study, graduate student of the Center for Cognitive Modeling MIPT |
According to him, the principle of operation of the manipulator training algorithm resembles the GPT model. Only unlike "smart chat," where the user, having set the command, receives the generated text, instead gives a sequence of actions. At the same time, as the scientist noted, for an electronic computing device that controls the manipulator, it is important after each action to receive feedback from video cameras in order to plan its next action based on the information received.
The novelty of the work is that we used ready-made language models to train the robot - algorithms that help translate natural speech into code that is understandable to the control system. It is neural networks that are trained on large amounts of text data. In our case, the multimodal RozumForme model was applied. Unlike others, it can generate an answer to both text requests and those made in the form of images. told Alexey Kovalev, co-author of the work Junior Researcher of the Federal Research Center "Informatics and Management" of the Russian Academy of Sciences |
He explained that during the work, a fine-tuning of the language model was carried out. Scientists have further trained the neural network so that it can "understand" the colors of the cubes, the distances to them and other parameters of the surrounding reality. The configuration was carried out in a virtual environment, and then an advanced language model was used to control the manipulator in a real environment. Step-by-step adaptation made it possible to adjust the language model so that it, receiving feedback from video cameras, could, based on learned algorithms, independently plan further actions and solve the tasks assigned to it.
Robotics initially implies a multimodal approach to information processing. That is, machine intelligence needs to be taken into account and synchronized, for example, personnel from video cameras, with data from lidars (devices for determining distances). This is commonly referred to as information integration. Such tasks are solved by different methods. However, the use of language models for these purposes has demonstrated the promise of the method. commented on the significance of the study Alexander Panov, group leader, leading researcher at the Institute of Artificial Intelligence AIRI and the Federal Research Center "Informatics and Management" |
According to scientists, the further goal of the work will be to teach the model to remember longer sequences of actions. This in the future will help robots perform that require a non-standard approach for a robotic system and assessment of the situation. For example, washing dishes, distinguishing objects and acting carefully, or tidying up an apartment, distinguishing between different rooms and objects and dividing them by purpose.