RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

Sberbank ruDALL-E Multimodal neural network

Product
The name of the base system (platform): Artificial intelligence (AI, Artificial intelligence, AI)
Developers: Sberbank, SberDevices, Cloud technologies (SberCloud SberCloud)
Date of the premiere of the system: 2021/11/02
Last Release Date: 2021/12/15
Technology: Big Data

Content

The main articles are:

2022: Based on the Kandinsky model

Sberbank on June 14, 2022 presented a model for generating images by text description in Russian - Kandinsky. This is an improved version of the multi-modal neural network ruDALL-E, which generates pictures by description in Russian. Read more here.

2021

Availability on ML Space

On December 15, 2021, Sberbank announced that the ruDALL-E Neural Network, which generates images by description in Russian, has become available on the ML Space platform.

The industrial version of the ruDALL-E neural network from SberDevices and Sber AI, which creates images based on a text description in Russian, has appeared in access on the ML Space platform in the hub of pre-trained models and DataHub datasets from SberCloud. It contains 12 billion parameters and is suitable for creating commercial materials: illustrations for advertising, architectural and industrial design, vector and stock images.

ruDALL-E is a multimodal neural network that generates original images according to a given Russian-language description, modeling the joint distribution of texts and images. The ruDALL-E training project has become the largest neural network computing project in Russia and the CIS. The model exists in two versions: ultra-large - ruDALL-E XL with 1.3 billion parameters - you can use it for free by downloading its code and a set of parameters from Github; and the giant neural network ruDALL-E 12B XXL with 12 billion parameters, which is available in the ruGPT-3 & family DataHub ML Space collection and allows you to create an unlimited number of new images according to a given description with a lower degree of abstraction and higher quality.

The image is created in several stages: first, the neural network receives the description at the input and generates a given number of pictures, then the other selects the most suitable for the description, after which they increase in size without loss of quality.

File:Aquote1.png
Two months ago, we released the ruDALL-E XL model to the public, made a demo site and a skill for Salute's assistants so that you can "play" with it, creating any pictures on demand. Now the external audience can use the XXL version of the model, which allows you to receive not only abstract images, but also any number of high-quality unique illustrations that can be used in different areas, - said David Rafalovsky, CTO Sberbank Group, Executive Vice President.
File:Aquote2.png

Ability to generate pictures by description in foreign languages

On November 11, 2021, the company Sberbank Group"" announced that the site of the open neural network ruDALL-E, which generates images based on a text description, can now work with texts not only in, Russian but also in other languages. In "" mobile application and Salute on devices Sber , you can create a picture even on to voice demand, the translation to English received a demo site on which you can try the model.

According to the company, in the week since the release of ruDALL-E, users around the world have generated more than 3 million images using ruDALL-E, using various machine translation systems to form Russian-language requests, and now they will be able to make requests in English and other languages. When entering text, the model independently determines the input language and generates the corresponding image.

The prototype for the creation of ruDALL-E was neuronet DALL-E for English, which was introduced OpenAI in 2021. At the same time, researchers from the American the company did not publish the model in the public domain, limiting themselves to a general description architecture and a wide range of examples of the model's work, selected manually. Based on the publication of Open AI , the teams SberDevices and Sber AI, with the assistance SberCloud , created a similar solution and launched neural network training on a platform ML Space based on, supercomputer Christofari receiving a similar result for the Russian language, and later a multilingual version.

The model exists in two versions: ruDALL-E XL, containing 1.3 billion parameters, and ruDALL-E XXL with 12 billion parameters. Smaller features can be used for free by downloading it from the Github and Hugging Face service. Both models can also be found in ML Space in SberCloud's hub of pre-trained DataHub models and datacets.

A large model can serve to create interior design options, stock images or vector illustrations, materials for, advertizing copywriting, how time a reduced version in the Salute application and on the demo site is intended to rather entertain users and show them the capabilities of the neural network. To create an image on Sber devices or in the Salute application, it is enough to say: "Open Dally" or "Launch an artist."

File:Aquote1.png
"After the launch of ruDALL-E, we saw a lot of interest in the model from the audience. Therefore, we decided to create a multilingual version of the service that will simplify the way for the user to create an image. On November 11, 2021, almost anyone around the world can use the model. You can set the task of the neural network with your voice by calling it in the Salute application and on Sber devices, "

says David Rafalovsky, Sberbank Group, executive vice president.
File:Aquote2.png

Create ruDALL-E

Sberbank on November 2, 2021 informed TAdviser about the creation of a neural network ruDALL-E, which is capable of creating images based on a text description in Russian. It can be used to create interior design options, stock images or vector illustrations, advertising materials, copywriting, architectural and industrial design.

A neural network has been created in Russia that generates pictures according to the description in Russian

The neural network simultaneously learns from two types of data - pictures and texts, and allows you to create an unlimited number of images according to a given description. There are two model options:

  • ruDALL-E XL containing 1.3 billion parameters;
  • ruDALL-E 12B with 12 billion parameters.

The ruDALL-E XL model can be used for free by downloading it from the GitHub service. Both models will also soon be available on the ML Space platform at SberCloud's hub of pre-trained DataHub models and datacets.

The creation of images using ruDALL-E takes place in three stages: first, one neural network accepts text for input and generates a given number of pictures, then the next selects which of them are the most successful and most consistent with the description, and the third increases them in size without loss of quality. Thus, it is possible to obtain an unlimited number of new images suitable for these characteristics.

The architecture of the DALL-E model for English was first introduced by OpenAI in 2021, but this model was never fully made publicly available. Based on the publication of OpenAI, the teams of SberDevices and Sber AI, with the assistance of SberCloud, reproduced the code and launched neural network training on the ML Space platform based on the Christofari supercomputer, receiving a similar result for the Russian language. The result is a model that works with the Russian language: the training took 23 thousand GPU-hours on an array of data from 120 million text-image pairs. The ruDALL-E training project has become the largest neural network computing project in Russia and the CIS, noted in Sberbank.

File:Aquote1.png
In addition to contributing to progress in the field of AI, image generation closes two important business needs - the ability to get a unique picture for your own description, as well as at any time create the required number of licence-free illustrations. At the same time, the creation of "multimodal" neural networks that are trained on several types of data at once, even in the era of big data and huge search opportunities, will be very popular, since it solves problems at a fundamentally different level. The technology is still very young, the first steps in this direction were taken only in 2020, and back in 2018-2019, even the formulation of this kind of task could not be imagined. ruDALL-E can be considered a real breakthrough for the Russian-speaking industry,
said David Rafalovsky, executive vice president of Sberbank, CTO Sberbank, head of the Technologies block.
File:Aquote2.png

Links

ruDALL-E site