RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

AIRI: OmniFusion Multimodal Language Model

Product
The name of the base system (platform): Artificial intelligence (AI, Artificial intelligence, AI)
Developers: Institute of Artificial Intelligence (AIRI)
Date of the premiere of the system: 2024/04/17
Technology: Speech technology

2024: Introduction of OmniFusion 1.1

The AI Research Institute (AIRI) in April 2024 presented an open version of the OmniFusion model - OmniFusion 1.1. A technical report with an article by the model has already come out on top in the Daily trending papers section on HuggingFace. For April 2024, open source for training and weight are available for use and can be applied, among other things, in the development of commercial products.

OmniFusion is an advanced multimodal AI model designed to expand the capabilities of traditional language processing systems by integrating additional modalities, data such as images, and in the future - audio, 3D- and. video content

As of mid-April 2024, the model recognizes and describes images. With its help, you can explain what is depicted in the photo, find out the recipe for preparing a dish from a photo of the ingredients, analyze the map of the room or find out how to assemble the device from a photo of its individual parts. The model is also able to recognize text and solve problems.

As of April 10, 2024, the model can analyze the medical image and indicate some kind of problem on it. Of course, in order for such a model to help make diagnoses, it must be additionally trained at specialized datacets with the involvement of experts from medicine. Each expert is a professor of medicine or surgery with ongoing practice, a Board of Directors certificate and impeccable credentials. In the event that the training is through information from search engines, the result may be harmful to future users of the AI model.

The architecture of the model is based on the technique of combining a previously trained large language model and its "eyes" - visual encoders that allow you to encode information in an image into a numerical vector called embedding. OmniFusion is trained by the FusionBrain scientific group from AIRI with the participation of scientists from Sber AI and SberDevices[1] of [2].