Sberbank Kandinsky Neural network for generating images by description

Product

The name of the base system (platform):	Sberbank ruDALL-E Multimodal neural network
Developers:	Sberbank
Date of the premiere of the system:	2022/06/14
Last Release Date:	2024/04/04
Technology:	Big Data

Content

2024
- Kandinsky 3.1 availability to all users
- Kandinsky 3.1 completed on an enlarged image datacet
2023
2022
- Kandinsky 2.0 - Russian diffusion model for generating images by text in different languages
- Representing an Image Generation Model by Text Description

The main articles are:

2024

Kandinsky 3.1 availability to all users

The capabilities of the Kandinsky 3.1 neural network have become available to all users.

The updated version is further trained on the datacet of aesthetic images, which made it possible to improve the quality of picture generation. This was announced on April 22, 2024 by the First Deputy Chairman of the Board of Sberbank Alexander Vedyakhin.

Шаблон:Quote 'author = said Alexander Vedyakhin, First Deputy Chairman of the Management Board of Sberbank.

Adding a query enhancement (beautification) feature simplifies the process of creating images. Now there is no need to be a professional industrial engineer - this function helps to create a detailed industrial engineer for the user: it is enough to write only a few words of a description of the desired image, the rest will be done by the GigaChat Pro language model built into the updated version of the neural network - it expands and enriches the details of the industrial engineer.

Also, due to the new approach to learning and high-quality dating, the inpainting function has been significantly improved, which allows you to edit individual parts of the image.

In addition, users now have the opportunity to use the main Telegram bot, including the fast model Kandinsky 3.1 Flash. The image generation time with this version of the model has decreased by more than 10 times compared to the base version.

Kandinsky 3.1 completed on an enlarged image datacet

Sber has improved his neural network, which creates images by text description in Russian and English. Sberbank announced this on April 4, 2024. The updated version of Kandinsky 3.1 is further trained on an increased image dating, which made it possible to improve the quality of generation. The first to get access to Kandinsky 3.1 was a limited range of users: artists, designers, bloggers.

source = Sberbank

A year ago, Kandinsky 2.1 was released. During this time, we have constantly developed our neural network, which helps people create new images and gives absolutely everyone opportunities for creativity. Compared to the previous model, Kandinsky 3.1 has become even faster, more convenient and more realistic. Kandinsky 3.1 is a flexible, multifunctional and absolutely free tool that will turn any person into an artist and creator. Soon everyone will be able to test the new capabilities of the neural network. Like previous versions, the model will be free and available on different surfaces,

said Alexander Vedyakhin, First Deputy Chairman of the Management Board of Sberbank.

One of the key features of the version was the faster picture generation speed: the time of one generation was reduced by almost 10 times, and the generation resolution can be increased to 4K. It is also possible to improve the text query using the language model. Users will again have access to the functions of creating various variations of images, mixing pictures and text, creating stickers and the ability to make local changes in the picture without changing the entire scene composition (ControlNet).

You can find out technical details about the model, approaches to training and see examples of generations in the article on Habra.

Also in the near future there will be a model Kandinsky Video 1.1 for generating video based on text descriptions. Our team managed to significantly improve the quality of generation by increasing the volume of training data set of text-video pairs and architectural improvements of the model. The changes also made it possible to double the video resolution compared to Kandinsky Video 1.0.

2023

Creating more than 200 million generation on text requests

Sberbank on January 18, 2024 summed up the work of the Kandinsky generative model in 2023. According to the developers, the neural network has created more than 200 million generations for text requests, and its audience has exceeded 12 million unique users. The model took first place in terms of growth and became the second most popular among developers after Stable Diffusion according to the AI resource Hugging Face, which contains the best open source solutions.

source = Sberbank

Kandinsky understands requests from a large list, especially in 100 languages, users can create photorealistic images in unlimited quantities, applying a variety of styles. The model is also able to change individual objects and entire areas in the picture, mix several drawings, finish the image, create pictures in endless canvas mode (inpainting/outpainting). The neural network knows the domestic cultural code well: architectural attractions, objects and elements of folk art.

In addition, Kandinsky users can create four-second animated videos and full-fledged videos up to eight seconds long. Thus, the generative model of Sberbank helps to implement almost any creative idea.

You can evaluate the capabilities of the neural network on the fusionbrain.ai platform, in - and in Telegram VK-bots, on the site rudalle.ru. The model works in the service, GigaChat is available mobile applications SberBank Online in and, Salute as well as on smart Sber devices on the voice command "Launch the artist." You can generate animation and videos on the fusionbrain.ai platform and in the Telegram bot - for this you need to leave a request for access.

source = Sberbank

The model was developed and trained by Sber AI researchers with the partner support of scientists from the AIRI Institute of Artificial Intelligence on the combined Sber AI datacet and SberDevices.

Introduction to the Pulse HR platform

The Pulse HR platform from Sberbank has introduced the GigaChat service and the Kandinsky neural network, for corporate clients the functionality of these solutions will be available in the first half of 2024. Sberbank announced this on November 29, 2023. Read more here.

Kandinsky 3.0

Sberbank on November 22, 2023 presented the next version of the generative model for creativity - Kandinsky 3.0, which, in comparison with the previous ones, better understands the user's text request. The neural network is now able to create even more photorealistic images, generate full-fledged art paintings and art with sketches. The model works with queries from a wide list of topics.

According to the developers, Kandinsky 3.0 knows elements of the domestic cultural code better than previous versions. Thus, the quality of generation the Russian and Soviet famous personalities and characters, architectural attractions, objects cultures and elements of folk art, Russia for example, Gzhel painting, has significantly improved. In addition, the updated model has optimized the function of editing images and the ability to add them in infinite canvas mode (inpainting and outpainting).

Kandinsky 3.0 creates high-resolution images - 1024 x 1024 pixels, while it can synthesize pictures with a selected aspect ratio. To train the neural network, the developers used an updated datacet in the amount of 1.5 billion text-image pairs containing data that underwent multi-stage filtering procedures, which ultimately led to a noticeable increase in the quality of generation.

Users of the Kandinsky 3.0 neural network can also create videos by text description in animation mode. On one request, video is generated four seconds long with the selected animation effect, at 24 frames per second and a resolution of 640 x 640 pixels. Synthesis of one second of video takes about 20 seconds on average. To expand the capabilities of the base model, different types of animation of images were implemented, which allowed you to move objects, zoom in and out, revive static in all possible ways. Animation modes are based on the function of redrawing an image by text description (image2image).

Kandinsky 3.0 understands queries in more than 100 languages, and users can create images in an unlimited number of styles. The model was developed and trained by Sber AI researchers with the partner support of scientists from the AIRI Institute of Artificial Intelligence on the combined Sber AI datacet and SberDevices.

You can evaluate the capabilities of the neural network on the fusionbrain.ai platform, in Telegram- and in VK boats-, on the website rudalle.ru. The model works in the service, GigaChat is available mobile applications SberBank Online in and Salyut, as well as on Sber smart devices on the voice command "Launch the artist." You can generate animated videos in the Telegram bot - for this you need to leave a request for access.

Ability to generate screensavers in SberBank Online

In the update SberBank Online on - Android smartphones you can realize your creative potential and generate screensavers to log into the application using neuronets Sberbank Kandinsky. Sberbank announced this on November 22, 2023. More. here

Presenting Kandinsky Video - a generative model for creating full-fledged videos by text

Sber presented the Kandinsky Video neural network - the first generative model in Russia for creating full-fledged videos based on text description. This was announced on November 22, 2023 to TAdviser by representatives of Sberbank.

The Kandinsky Video architecture consists of two key blocks: the first is responsible for creating key personnel that make up the plot structure of the video, and the second is responsible for generating interpolation personnel that allow you to achieve smoothness of movement in the final video. The two blocks are based on an updated image synthesis model based on Kandinsky 3.0 text descriptions. Read more here.

Kandinsky 2.2 with the ability to create videos by text description in animation mode

neuronets Sberbank Kandinsky 2.2 users have the ability to create videos by text description in animation mode. One text description generates a 4-second video with the selected animation effect, with a frequency of 24 frames per second and a resolution of 640x640 pixels. Synthesis of one second of video takes about 20 seconds on average. Sberbank announced this on October 12, 2023.

The generation of animated videos works in test mode and is available to the most active users of Kandinsky 2.2, who will receive an invitation in the near future. By the end of 2023, absolutely everyone will be able to assess the capabilities of the neural network.

In order to generate a video in animation mode, you need to describe with text what you want to see. Next, the bot will offer a choice of 16 scene animation options, and after that the neural network will generate an animated video. Generation of composite scenes is also available: the user can enter several text descriptions (up to three), then select his own animation mechanics for each, and after that the model will create a "mini-film."

Since the release of the Kandinsky 2.2 model, users have already generated more than 50 million images. Now they have even more opportunities for creativity completely free of charge. The launch of the video function in animation mode is an important step in the development of our neural network and for the entire global industry of multimodal models of artificial intelligence. We will continue to improve Kandinsky further, and the quality will only improve in the next updates,

noted Alexander Vedyakhin, First Deputy Chairman of the Management Board of Sberbank.

The basis for video synthesis is the image generation model based on Kandinsky 2.2 text descriptions. To expand its capabilities, different types of animation of images were implemented, which made it possible to move objects, approximate and distance them, revive static in all possible ways. All animation modes are based on the functions image2image (redrawing the image by text description) and inpainting/outpainting (editing part of the image inside and outside it), which have already been implemented in the base model.

The neural network was developed and trained by Sber AI researchers together with scientists from the Institute artificial intelligence AIRI on the combined Sber AI datacet and the company. SberDevices

2 million unique users in 6 days

Kandinsky 2.1 - a free generative model from Sberbank has become one of the fastest growing artificial intelligence (AI) services in the world. Sberbank announced this on April 10, 2023. According to developers , the Russian neural network took only four days to reach the mark of 1 million unique users. This is faster than the result of the ChatGPT service from OpenAI, which took five days.

Since the release of Kandinsky 2.1, more than 10 million images have already been generated, and the number of unique users has reached 2 million. Also Kandinsky 2.1. got into the top 5 trends of world repositories according to the web service. GitHub

Kandinsky 2.1 is capable of generating images from their natural language text description in a few seconds. The model knows 101 languages and opens up new opportunities for creativity: it can mix several drawings, finish an image, create a picture in endless canvas mode (inpainting/outpainting).

The neural network inherited the weights of the previous version, trained on one billion text-image pairs, and was additionally trained on 170 million text-image pairs of high resolution. Then she completed her studies on a separately assembled datacet of two million pairs of high-quality images. This set includes pictures with descriptions in such traditionally difficult areas for neural networks as texts and faces of people. The neural network has also been improved by a new trained model of autoencoder, which is also used as a decoder for vector representations of images. This dramatically improved the generation of high-resolution images: faces, complex objects, and so on. Thanks to this, the model contains 3.3 billion parameters instead of two billion in Kandinsky 2.0.

1 million unique users

In just 4 days after the release of the updated version of Sberbank's Kandinsky 2.1 generative model, the audience of the neural network reached 1 million unique users who have already generated over 5 million images. The leaders are such requests as: "cat," "love," "space" and "happiness." This was reported on April 7, 2023 by Sberbank.

You can test the neural network in a Telegram bot, on the model promo page, on the fusionbrain.ai and on the ML Space platform in the hub of pre-trained models and data sets DataHub. You can also evaluate the capabilities of Kandinsky 2.1 in the Salute mobile application and on smart Sber devices using the "Launch the Artist" command.

Kandinsky 2.1 can generate images from their natural language text description in a few seconds. The model knows 101 languages and can finish parts of the image, mix several drawings, create a picture in endless canvas mode.

Kandinsky 2.1 with the ability to mix several drawings

Sberbank on April 4, 2023 introduced the Kandinsky 2.1 neural network, which is capable of creating high-quality images in just a few seconds based on their text description in natural language. It can also mix multiple patterns, change them according to the text description, generate images similar to the specified one, pick up missing parts of the picture and form images in endless canvas mode. The model understands requests in 101 languages (including Russian and English) and is able to draw in various styles.

The neural network was developed and trained by Sber AI researchers with the partner support of scientists from the AIRI Institute of Artificial Intelligence on the combined Sber AI datacet and SberDevices.

The presented Kandinsky 2.1 model inherited the weights of the previous version, trained on 1 billion text-image pairs, and was additionally trained on 170 million text-image pairs of high resolution. Then she completed her studies on a separately collected datacet of 2 million pairs of high-quality images. This set includes pictures with descriptions in such traditionally difficult areas for neural networks as texts and faces of people.

The neural network has also been improved by a trained model of an autoencoder, which is also used as a decoder for vector representations of images. This dramatically improved the generation of high-resolution images: faces, complex objects, and so on. Thanks to this, the new model contains 3.3 billion parameters instead of 2 billion in Kandinsky 2.0.

In addition, Kandinsky 2.1 uses not only an encoded text description, but also a special representation of the image by the CLIP model. In this form, the neural network forms a representation of the picture based on text information and submits it to the input of the main generative model.

The model can visualize any content and can be used in various industries. For example, in the banking sector it can be used to create personalized marketing solutions, vivid images of products, attract and retain the attention of customers.

{{quote 'author = said Alexander Vedyakhin, First Deputy Chairman of the Board of Sberbank. | Teaching Kandinsky 2.1, we took into account the opinions of users and implemented a bold hypothesis, having studied the most advanced concepts. As a result, we have developed a powerful universal solution for a large range of tasks at the level of the best global counterparts. It opens up great opportunities for both business and the population. In fact, this is another important step towards AGI - strong artificial intelligence. I think everyone has a task for Kandinsky 2.1, and therefore the improved model, like its previous version, is in the public domain: anyone can test it, and for free.}}

You can assess the capabilities of the neural network on the model's promo page, using the "Launch the Artist" command on smart Sber devices and in the Salyut mobile application and on the ML Space platform in the DataHub hub of pre-trained models and datacets. The model is also available for use on the Fusion Brain platform and in the Telegram bot.

2022

Kandinsky 2.0 - Russian diffusion model for generating images by text in different languages

On November 23, 2022, Sberbank introduced Kandinsky 2.0, a Russian multilingual diffusion model for generating images from a text description with 2 billion parameters. Neuronet developed and trained Sber AI researchers with the partner support of scientists from AIRI Institute of Artificial Intelligence the combined Sber AI datacet and from SberDevices 1 billion text-image pairs. You can see how it draws using the "Launch the Artist" command on smart Sber devices and in. mobile application Salute

In Kandinsky 2.0, developers used the increasingly famous diffusion approach, since models of such an architecture, unlike transformers, give good results in almost all tasks of generating multimedia content by text description (synthesis of images, video, 3D and audio).

The model is able to handle requests in 101 languages equally quickly and efficiently. Among them are both common Russian and English, and rarer, for example, Mongolian. The system will understand the task even if there are words in different languages in one request.

Kandinsky 2.0 differs from its predecessor in a juicier, deeper and more realistic picture and advanced capabilities. On the FusionBrain website, images can be generated in 20 different styles, including renaissance, classicism, animation, New Year's Eve and even khokhloma. The model also implements the functions inpainting (replacing any part of the image and any object in the image with those generated by the neural network) and outpainting (the ability to recycle the finished image and the background around the picture).

In addition, in Kandinsky 2.0, users can appreciate how linguistic constructions and concepts are the same in terms of meaning, depending on language and cultural color. For example, if you formulate a request for a "national dish" in Russian, the neural network most often draws cabbage soup, and in Japanese it will be miso soup and sushi.

Sberbank continues to develop solutions for the automatic generation of images by description in a natural language - the so-called creative AI. Kandinsky 2.0, which replaced the first version of the model, is a breakthrough in this area. This model allows you to get a special picture for a specific task in a few seconds and freely distribute it without a license, which is very important for business. Generative models are developing very quickly: back in 2018, even the formulation of such a task was difficult to imagine, and in 2022 we have a working model that understands 101 languages and draws realistic images that are often indistinguishable from those created by people.

noted Alexander Vedyakhin, First Deputy Chairman of the Management Board of Sberbank.

Representing an Image Generation Model by Text Description

Sber June 14, 2022 presented a model for generating images by text description in Russian the language - Kandinsky. This is an improved version of multimodal, neuronets ruDALL-E which generates pictures by description in Russian. It can be used to create any kind of images - illustrations, materials for, advertizing architectural industrial and design, and even design in the field of digital art.

In November 2021, the ruDALL-E XL model was released, containing 1.3 billion parameters. Its parameters and code were publicly available, and an image generation service was developed. For six months, this service was used by 2 million unique users, who generated a total of 125 million images. Also in November, an exclusive ruDALL-E XXL model with 12 billion parameters was announced and published in the AI Services marketplace of the SberCloud ML Space platform in December.

In 2022, the Sber AI and SberDevices teams were able to significantly improve the quality of this model by further training it with 179 million textual images using the SberCloud ML Space platform and the Christofari Neo supercomputer. The further-trained Kandinsky model can generate images with an arbitrary aspect ratio, and can also use a special way to increase the resolution of generated images based on the diffusion process for images with an aspect ratio of 1:1 (in addition to the standard approach using Real-ESRGAN). Now the model is much better at creating realistic images, efficiently transmitting various textures, shadows and reflections.

Creating images using the Kandinsky model takes place in three stages. First, one neural network (directly Kandinsky), using a text description, generates a given number of images. Then the second (ruCLIP Large) selects the most successful and most appropriate pictures corresponding to the given text description, and then the third increases them in size (both the diffusion model and the generative-adversarial Real-ESRGAN model are available). The result of the model is a set of generated images of high quality. A distinctive advantage of the Kandinsky model compared to previous versions is the higher degree of detail of the images created.

The model is available in, mobile application Salute on Sber smart devices on request "Turn on the artist."

Шаблон:Quote 'author = said Alexander Vedyakhin, First Deputy Chairman of the Management Board of Sberbank.

Источник — «https://tadviser.com/index.php/Product:Sberbank_Kandinsky_Neural_network_for_generating_images_by_description»

The site content is translated by machine translation software powered by PROMT. The machine-translated articles are not always perfect and may contain errors in vocabulary, syntax or grammar. Read original article
If you find inaccuracies or errors in the results of machine translation, please write to editor@tadviser.ru. We will make every effort to correct them as soon as possible.

Simple Link

How to create a "smart plant": Key characteristics of a modern digital enterprise 10500

Model Studio CS: How to use BIM to give new impetus to the development of the fuel and energy complex 11000