RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

Sberbank Kandinsky Neural network for generating images by description

Product
The name of the base system (platform): Sberbank ruDALL-E Multimodal neural network
Developers: Sberbank
Date of the premiere of the system: 2022/06/14
Last Release Date: 2025/11/20
Technology: Big Data

Content

The main articles are:

2025

Kandinsky 5.0 with two Image Lite and Video Pro models

Sberbank on November 20, 2025 introduced Kandinsky 5.0, a line of next-generation image and video generation models.

Additional models will expand opportunities for creativity - both in the professional field and in personal projects. Users can easily create personalized video greetings, animate photos or invent original visual stories. For professionals - directors, designers, marketers, animation artists - Kandinsky 5.0 will become a powerful tool for the production of promotional materials and commercial video content.

{{quote 'author=said Andrey Belevtsev, Senior Vice President, Head of the Technological Development Unit of Sberbank. | We have significantly improved Kandinsky. The team significantly improved the key parameters - the quality and speed of video generation. Now any user can easily embody their artistic ideas both in video and in images. We have traditionally paid special attention to training the model on a high-quality national datacet. Thanks to this, Kandinsky accurately understands requests in Russian and creates content that meets the cultural context and expectations of users from our country. At the same time, all models of the new line are published in the public domain. Such a step provides engineers and researchers with the opportunity to use them in their developments and stimulates the growth of an open ecosystem of domestic generative technologies,}}

Two models of the Kandinsky 5.0 line have become available to users: Image Lite - a universal model for generating HD images, and Video Pro - a powerful model that generates 5-second SD videos on a text request or a start frame. Both models are confidently oriented in the Russian cultural context, equally well understand requests in Russian and English, and also create inscriptions in Cyrillic and Latin.

A special focus in model training was made on the aesthetics and quality of generation, expressiveness and artistry of the visual content they created. To do this, in the final stages of training, the datacet of high-quality images and videos selected by a large team of designers, artists and art directors was used. Experts carefully chose materials with impeccable composition, style and visual quality. Thanks to this, Kandinsky 5.0 creates not only detailed and accurate visual materials, but also truly expressive, artistic content.

The Kandinsky 5.0 Image Lite and Video Pro models are already available on all surfaces. Gigachat

Sber to open flagship AI models GigaChat and Kandinsky

On November 19, 2025, the resident and chairman of the board of Sberbank, German Gref, announced the placement of flagship models of artificial intelligence (AI) in the public domain. We are talking about the following products: GigaChat 3 Ultra Preview, GigaChat Lightning, the next generation of GigaAM speech recognition models, as well as Kandinsky 5.0 image and video generation models. Read more here.

Integration with Platform V Product 360

Russian developer ON Sber Tech integrated neuronet Kandinsky in the system for data management product information. Platform V Product 360 Users of the solution can now create high-quality images for product catalogs directly in its interface using the built-in AI tool. About this during the session "Game ahead. Digitalization as a competitive asset "at the TsIPR-2025 conference in June 2025, the CEO of Sber Tech said. Maxim Tyatyushev Read more here.

Kandinsky 4.1 Image

On June 5, 2025, {hide} Bank {/hide} Sberbank presented the next version of the Kandinsky 4.1 Image image generation model - now it creates even better and more detailed images, while better following the text instructions of users.

As reported, an AI editor MALVINA (Multimodal Artificial Language VIsion Neural Assistant) has appeared for neuroscientists, which allows you to edit the image by following text instructions. Remove unnecessary objects or text from the picture, change the hair color or age of a person in the image, restore and color the old photo, replace the background, change summer for winter - MALVINA will help to implement any creative idea of ​ ​ a neuroscientist, opening the horizons of human co-creation and generative AI. Moreover, unlike other models designed for photo editing, MALVINA tries to preserve the geometry of the original image: the changes affect only the relevant parts of the picture. Thus, even if major changes are made to the scene, the features of people or the type of objects that are not affected during editing will be preserved. Running models expands the capabilities of Sberbank's GigaChat generative system in the field of creating and editing visual content.

Changing the Time of Day with MALVINA

This version of the Kandinsky image generation model is based on an updated architecture - now it is a diffusion transformer (DiT), which allows you to use various practices in the field of training large transformer models. The technology provides the ability to efficiently scale, which in fact means improving the overall quality and speed of the flagship model.

In addition to training on a large datacet of image-text description pairs, the Kandinsky 4.1 Image model was additionally trained in manually selected images balanced by 9 meta-categories (people, technology, nature and others). This quality data was selected by a team of more than 100 specialists - photographers, artists, designers who not only have a professional art education, but also successfully passed tests for an in-depth understanding of the aesthetic and visual aspects of photography.

As a result of such further training, the aesthetics and correctness of the generated images in all domains have significantly increased. The quality of generating textures and complex objects, for example, of various kinds of technology, has also improved. Kandinsky 4.1 Image more precisely follows complex text instructions - for example, it understands "spatial" queries such as "right/left" and "above/below." If you ask the model to draw "a man in a white cap and a striped jacket sitting on a green chair to the right of a high birch, in the style of Van Gogh," the neural network will take into account all the details.

MALVINA

MALVINA changed background, appeared, basket and glasses

The model understands the styles of famous artists - Aivazovsky, Bosch, Cranach, Kandinsky and others. In addition, the user can create images in arbitrary artistic styles from impressionism to pop art or generate images, for example, in the style of the famous animation Studio Ghibli.

Kandinsky 4.1 Image works better with Russian cultural code: the model qualitatively generates matryoshek, heroes, samovars, knows the heroes of Russian folk tales and films, can portray various dishes of the national kitchens. Even better, the model began to succeed in images in the spirit of Russian folk painting, for example, ggels and khokhloms - now users can experiment with this kind of styles.

The integration of the GigaChat telegram bot with the MALVINA AI editor has opened up the ability for any user to edit any images with simple commands. In a couple of clicks in the picture, you can change the background ("make mountains instead of sky"), color ("make the dragon green") or appearance ("add glasses," "change your hair to red"). In addition, you can remove and add objects ("replace cherries with sweets") and correct defects ("remove scratches").

The model does not just support working with the original image files - when changing them, it tries to preserve all important visual characteristics (shapes, faces, background), including the smallest details and textures of the original frame.

The neural network was trained on a large data array. At the preliminary training stage, the researchers processed more than 10 million examples, and for the further training stage (SFT phase), more than 1.5 million various images were used - both real photos with manual processing and synthetic data generated by special models.

File:Aquote1.png
The updated image editor in GigaChat is a simple and convenient intelligent assistant in implementing any creative ideas. It works with pixel accuracy, preserving the maximum of the original parts, but at the same time allows you to change the background, objects and even the style of the photo. Now users can not spend hours in graphic editors - it is enough to master several text commands. We specially trained the updated Kandinsky model on a variety of at the same time verified and marked up data so that the neural network could work with various scenes: from portraits to landscapes. Integration with GigaChat turns our language model into a universal tool for creativity and work that anyone can master.

Andrey Belevtsev, Senior Vice President, Head of the Technological Development Unit of Sberbank
File:Aquote2.png

2024

Kandinsky 3.1 availability to all users

The capabilities of the Kandinsky 3.1 neural network have become available to all users.

The updated version is further trained on the datacet of aesthetic images, which made it possible to improve the quality of picture generation. This was announced on April 22, 2024 by the First Deputy Chairman of the Board of Sberbank Alexander Vedyakhin.

Шаблон:Quote 'author=said Alexander Vedyakhin, First Deputy Chairman of the Management Board of Sberbank.

Adding a query enhancement (beautification) feature simplifies the process of creating images. Now there is no need to be a professional industrial engineer - this function helps to create a detailed industrial engineer for the user: it is enough to write only a few words of a description of the desired image, the rest will be done by the GigaChat Pro language model built into the updated version of the neural network - it expands and enriches the details of the industrial engineer.

Also, due to the new approach to learning and high-quality dating, the inpainting function has been significantly improved, which allows you to edit individual parts of the image.

In addition, users now have the opportunity to use the main Telegram bot, including the fast model Kandinsky 3.1 Flash. The image generation time with this version of the model has decreased by more than 10 times compared to the base version.

Kandinsky 3.1 completed on an enlarged image datacet

Sber has improved his neural network, which creates images by text description in Russian and English. Sberbank announced this on April 4, 2024. The updated version of Kandinsky 3.1 is further trained on an increased image dating, which made it possible to improve the quality of generation. The first to get access to Kandinsky 3.1 was a limited range of users: artists, designers, bloggers.

File:Aquote1.png
A year ago, Kandinsky 2.1 was released. During this time, we have constantly developed our neural network, which helps people create new images and gives absolutely everyone opportunities for creativity. Compared to the previous model, Kandinsky 3.1 has become even faster, more convenient and more realistic. Kandinsky 3.1 is a flexible, multifunctional and absolutely free tool that will turn any person into an artist and creator. Soon everyone will be able to test the new capabilities of the neural network. Like previous versions, the model will be free and available on different surfaces,
said Alexander Vedyakhin, First Deputy Chairman of the Management Board of Sberbank.
File:Aquote2.png

One of the key features of the version was the faster picture generation speed: the time of one generation was reduced by almost 10 times, and the generation resolution can be increased to 4K. It is also possible to improve the text query using the language model. Users will again have access to the functions of creating various variations of images, mixing pictures and text, creating stickers and the ability to make local changes in the picture without changing the entire scene composition (ControlNet).

You can find out technical details about the model, approaches to training and see examples of generations in the article on Habra.

Also in the near future there will be a model Kandinsky Video 1.1 for generating video based on text descriptions. Our team managed to significantly improve the quality of generation by increasing the volume of training data set of text-video pairs and architectural improvements of the model. The changes also made it possible to double the video resolution compared to Kandinsky Video 1.0.

2023

Creating more than 200 million generation on text requests

Sberbank on January 18, 2024 summed up the work of the Kandinsky generative model in 2023. According to the developers, the neural network has created more than 200 million generations for text requests, and its audience has exceeded 12 million unique users. The model took first place in terms of growth and became the second most popular among developers after Stable Diffusion according to the AI resource Hugging Face, which contains the best open source solutions.

Kandinsky understands requests from a large list, especially in 100 languages, users can create photorealistic images in unlimited quantities, applying a variety of styles. The model is also able to change individual objects and entire areas in the picture, mix several drawings, finish the image, create pictures in endless canvas mode (inpainting/outpainting). The neural network knows the domestic cultural code well: architectural attractions, objects and elements of folk art.

In addition, Kandinsky users can create four-second animated videos and full-fledged videos up to eight seconds long. Thus, the generative model of Sberbank helps to implement almost any creative idea.

You can evaluate the capabilities of the neural network on the fusionbrain.ai platform, in - and in Telegram VK-bots, on the site rudalle.ru. The model works in the service, GigaChat is available mobile applications SberBank Online in and, Salute as well as on smart Sber devices on the voice command "Launch the artist." You can generate animation and videos on the fusionbrain.ai platform and in the Telegram bot - for this you need to leave a request for access.

The model was developed and trained by Sber AI researchers with the partner support of scientists from the AIRI Institute of Artificial Intelligence on the combined Sber AI datacet and SberDevices.

Introduction to the Pulse HR platform

The Pulse HR platform from Sberbank has introduced the GigaChat service and the Kandinsky neural network, for corporate clients the functionality of these solutions will be available in the first half of 2024. Sberbank announced this on November 29, 2023. Read more here.

Kandinsky 3.0

Sberbank on November 22, 2023 presented the next version of the generative model for creativity - Kandinsky 3.0, which, in comparison with the previous ones, better understands the user's text request. The neural network is now able to create even more photorealistic images, generate full-fledged art paintings and art with sketches. The model works with queries from a wide list of topics.

According to the developers, Kandinsky 3.0 knows elements of the domestic cultural code better than previous versions. Thus, the quality of generation the Russian and Soviet famous personalities and characters, architectural attractions, objects cultures and elements of folk art, Russia for example, Gzhel painting, has significantly improved. In addition, the updated model has optimized the function of editing images and the ability to add them in infinite canvas mode (inpainting and outpainting).

Kandinsky 3.0 creates high-resolution images - 1024 x 1024 pixels, while it can synthesize pictures with a selected aspect ratio. To train the neural network, the developers used an updated datacet in the amount of 1.5 billion text-image pairs containing data that underwent multi-stage filtering procedures, which ultimately led to a noticeable increase in the quality of generation.

Users of the Kandinsky 3.0 neural network can also create videos by text description in animation mode. On one request, video is generated four seconds long with the selected animation effect, at 24 frames per second and a resolution of 640 x 640 pixels. Synthesis of one second of video takes about 20 seconds on average. To expand the capabilities of the base model, different types of animation of images were implemented, which allowed you to move objects, zoom in and out, revive static in all possible ways. Animation modes are based on the function of redrawing an image by text description (image2image).

Kandinsky 3.0 understands queries in more than 100 languages, and users can create images in an unlimited number of styles. The model was developed and trained by Sber AI researchers with the partner support of scientists from the AIRI Institute of Artificial Intelligence on the combined Sber AI datacet and SberDevices.

You can evaluate the capabilities of the neural network on the fusionbrain.ai platform, in Telegram- and in VKboats-, on the website rudalle.ru. The model works in the service, GigaChat is available mobile applications SberBank Online in and Salyut, as well as on Sber smart devices on the voice command "Launch the artist." You can generate animated videos in the Telegram bot - for this you need to leave a request for access.

Ability to generate screensavers in SberBank Online

In the update SberBank Online on - Androidsmartphones you can realize your creative potential and generate screensavers to log into the application using neuronets Sberbank Kandinsky. Sberbank announced this on November 22, 2023. More. here

Presenting Kandinsky Video - a generative model for creating full-fledged videos by text

Sber presented the Kandinsky Video neural network - the first generative model in Russia for creating full-fledged videos based on text description. This was announced on November 22, 2023 to TAdviser by representatives of Sberbank.

The Kandinsky Video architecture consists of two key blocks: the first is responsible for creating key personnel that make up the plot structure of the video, and the second is responsible for generating interpolation personnel that allow you to achieve smoothness of movement in the final video. The two blocks are based on an updated image synthesis model based on Kandinsky 3.0 text descriptions. Read more here.

Kandinsky 2.2 with the ability to create videos by text description in animation mode

neuronets Sberbank Kandinsky 2.2 users have the ability to create videos by text description in animation mode. One text description generates a 4-second video with the selected animation effect, with a frequency of 24 frames per second and a resolution of 640x640 pixels. Synthesis of one second of video takes about 20 seconds on average. Sberbank announced this on October 12, 2023.

The generation of animated videos works in test mode and is available to the most active users of Kandinsky 2.2, who will receive an invitation in the near future. By the end of 2023, absolutely everyone will be able to assess the capabilities of the neural network.

In order to generate a video in animation mode, you need to describe with text what you want to see. Next, the bot will offer a choice of 16 scene animation options, and after that the neural network will generate an animated video. Generation of composite scenes is also available: the user can enter several text descriptions (up to three), then select his own animation mechanics for each, and after that the model will create a "mini-film."

File:Aquote1.png
Since the release of the Kandinsky 2.2 model, users have already generated more than 50 million images. Now they have even more opportunities for creativity completely free of charge. The launch of the video function in animation mode is an important step in the development of our neural network and for the entire global industry of multimodal models of artificial intelligence. We will continue to improve Kandinsky further, and the quality will only improve in the next updates,
noted Alexander Vedyakhin, First Deputy Chairman of the Management Board of Sberbank.
File:Aquote2.png

The basis for video synthesis is the image generation model based on Kandinsky 2.2 text descriptions. To expand its capabilities, different types of animation of images were implemented, which made it possible to move objects, approximate and distance them, revive static in all possible ways. All animation modes are based on the functions image2image (redrawing the image by text description) and inpainting/outpainting (editing part of the image inside and outside it), which have already been implemented in the base model.

The neural network was developed and trained by Sber AI researchers together with scientists from the Institute artificial intelligenceAIRI on the combined Sber AI datacet and the company. SberDevices

2 million unique users in 6 days

Kandinsky 2.1 - a free generative model from Sberbank has become one of the fastest growing artificial intelligence (AI) services in the world. Sberbank announced this on April 10, 2023. According to developers, the Russian neural network took only four days to reach the mark of 1 million unique users. This is faster than the result of the ChatGPT service from OpenAI, which took five days.

Since the release of Kandinsky 2.1, more than 10 million images have already been generated, and the number of unique users has reached 2 million. Also Kandinsky 2.1. got into the top 5 trends of world repositories according to the web service. GitHub

Kandinsky 2.1 is capable of generating images from their natural language text description in a few seconds. The model knows 101 languages ​ ​ and opens up new opportunities for creativity: it can mix several drawings, finish an image, create a picture in endless canvas mode (inpainting/outpainting).

The neural network inherited the weights of the previous version, trained on one billion text-image pairs, and was additionally trained on 170 million text-image pairs of high resolution. Then she completed her studies on a separately assembled datacet of two million pairs of high-quality images. This set includes pictures with descriptions in such traditionally difficult areas for neural networks as texts and faces of people. The neural network has also been improved by a new trained model of autoencoder, which is also used as a decoder for vector representations of images. This dramatically improved the generation of high-resolution images: faces, complex objects, and so on. Thanks to this, the model contains 3.3 billion parameters instead of two billion in Kandinsky 2.0.

1 million unique users

In just 4 days after the release of the updated version of Sberbank's Kandinsky 2.1 generative model, the audience of the neural network reached 1 million unique users who have already generated over 5 million images. The leaders are such requests as: "cat," "love," "space" and "happiness." This was reported on April 7, 2023 by Sberbank.

You can test the neural network in a Telegram bot, on the model promo page, on the fusionbrain.ai and on the ML Space platform in the hub of pre-trained models and data sets DataHub. You can also evaluate the capabilities of Kandinsky 2.1 in the Salute mobile application and on smart Sber devices using the "Launch the Artist" command.

Kandinsky 2.1 can generate images from their natural language text description in a few seconds. The model knows 101 languages ​ ​ and can finish parts of the image, mix several drawings, create a picture in endless canvas mode.

Kandinsky 2.1 with the ability to mix several drawings

Sberbank on April 4, 2023 introduced the Kandinsky 2.1 neural network, which is capable of creating high-quality images in just a few seconds based on their text description in natural language. It can also mix multiple patterns, change them according to the text description, generate images similar to the specified one, pick up missing parts of the picture and form images in endless canvas mode. The model understands requests in 101 languages ​ ​ (including Russian and English) and is able to draw in various styles.

The neural network was developed and trained by Sber AI researchers with the partner support of scientists from the AIRI Institute of Artificial Intelligence on the combined Sber AI datacet and SberDevices.

The presented Kandinsky 2.1 model inherited the weights of the previous version, trained on 1 billion text-image pairs, and was additionally trained on 170 million text-image pairs of high resolution. Then she completed her studies on a separately collected datacet of 2 million pairs of high-quality images. This set includes pictures with descriptions in such traditionally difficult areas for neural networks as texts and faces of people.

The neural network has also been improved by a trained model of an autoencoder, which is also used as a decoder for vector representations of images. This dramatically improved the generation of high-resolution images: faces, complex objects, and so on. Thanks to this, the new model contains 3.3 billion parameters instead of 2 billion in Kandinsky 2.0.

In addition, Kandinsky 2.1 uses not only an encoded text description, but also a special representation of the image by the CLIP model. In this form, the neural network forms a representation of the picture based on text information and submits it to the input of the main generative model.

The model can visualize any content and can be used in various industries. For example, in the banking sector it can be used to create personalized marketing solutions, vivid images of products, attract and retain the attention of customers.

{{quote 'author=said Alexander Vedyakhin, First Deputy Chairman of the Board of Sberbank. | Teaching Kandinsky 2.1, we took into account the opinions of users and implemented a bold hypothesis, having studied the most advanced concepts. As a result, we have developed a powerful universal solution for a large range of tasks at the level of the best global counterparts. It opens up great opportunities for both business and the population. In fact, this is another important step towards AGI - strong artificial intelligence. I think everyone has a task for Kandinsky 2.1, and therefore the improved model, like its previous version, is in the public domain: anyone can test it, and for free.}}

You can assess the capabilities of the neural network on the model's promo page, using the "Launch the Artist" command on smart Sber devices and in the Salyut mobile application and on the ML Space platform in the DataHub hub of pre-trained models and datacets. The model is also available for use on the Fusion Brain platform and in the Telegram bot.

2022

Kandinsky 2.0 - Russian diffusion model for generating images by text in different languages

On November 23, 2022, Sberbank introduced Kandinsky 2.0, a Russian multilingual diffusion model for generating images from a text description with 2 billion parameters. Neuronet developed and trained Sber AI researchers with the partner support of scientists from AIRI Institute of Artificial Intelligence the combined Sber AI datacet and from SberDevices 1 billion text-image pairs. You can see how it draws using the "Launch the Artist" command on smart Sber devices and in. mobile application Salute

In Kandinsky 2.0, developers used the increasingly famous diffusion approach, since models of such an architecture, unlike transformers, give good results in almost all tasks of generating multimedia content by text description (synthesis of images, video, 3D and audio).

The model is able to handle requests in 101 languages ​ ​ equally quickly and efficiently. Among them are both common Russian and English, and rarer, for example, Mongolian. The system will understand the task even if there are words in different languages ​ ​ in one request.

Kandinsky 2.0 differs from its predecessor in a juicier, deeper and more realistic picture and advanced capabilities. On the FusionBrain website, images can be generated in 20 different styles, including renaissance, classicism, animation, New Year's Eve and even khokhloma. The model also implements the functions inpainting (replacing any part of the image and any object in the image with those generated by the neural network) and outpainting (the ability to recycle the finished image and the background around the picture).

In addition, in Kandinsky 2.0, users can appreciate how linguistic constructions and concepts are the same in terms of meaning, depending on language and cultural color. For example, if you formulate a request for a "national dish" in Russian, the neural network most often draws cabbage soup, and in Japanese it will be miso soup and sushi.

File:Aquote1.png
Sberbank continues to develop solutions for the automatic generation of images by description in a natural language - the so-called creative AI. Kandinsky 2.0, which replaced the first version of the model, is a breakthrough in this area. This model allows you to get a special picture for a specific task in a few seconds and freely distribute it without a license, which is very important for business. Generative models are developing very quickly: back in 2018, even the formulation of such a task was difficult to imagine, and in 2022 we have a working model that understands 101 languages ​ ​ and draws realistic images that are often indistinguishable from those created by people.
File:Aquote2.png

Representing an Image Generation Model by Text Description

Sber June 14, 2022 presented a model for generating images by text description in Russian the language - Kandinsky. This is an improved version of multimodal, neuronets ruDALL-E which generates pictures by description in Russian. It can be used to create any kind of images - illustrations, materials for, advertizing architectural industrial and design, and even design in the field of digital art.

In November 2021, the ruDALL-E XL model was released, containing 1.3 billion parameters. Its parameters and code were publicly available, and an image generation service was developed. For six months, this service was used by 2 million unique users, who generated a total of 125 million images. Also in November, an exclusive ruDALL-E XXL model with 12 billion parameters was announced and published in the AI Services marketplace of the SberCloud ML Space platform in December.

In 2022, the Sber AI and SberDevices teams were able to significantly improve the quality of this model by further training it with 179 million textual images using the SberCloud ML Space platform and the Christofari Neo supercomputer. The further-trained Kandinsky model can generate images with an arbitrary aspect ratio, and can also use a special way to increase the resolution of generated images based on the diffusion process for images with an aspect ratio of 1:1 (in addition to the standard approach using Real-ESRGAN). Now the model is much better at creating realistic images, efficiently transmitting various textures, shadows and reflections.

Creating images using the Kandinsky model takes place in three stages. First, one neural network (directly Kandinsky), using a text description, generates a given number of images. Then the second (ruCLIP Large) selects the most successful and most appropriate pictures corresponding to the given text description, and then the third increases them in size (both the diffusion model and the generative-adversarial Real-ESRGAN model are available). The result of the model is a set of generated images of high quality. A distinctive advantage of the Kandinsky model compared to previous versions is the higher degree of detail of the images created.

The model is available in, mobile application Salute on Sber smart devices on request "Turn on the artist."

Шаблон:Quote 'author=said Alexander Vedyakhin, First Deputy Chairman of the Management Board of Sberbank.