Sberbank Kandinsky Video Neural network for generating full-fledged video

Product

The name of the base system (platform):	Sberbank Kandinsky Neural network for generating images by description
Developers:	Sberbank
Date of the premiere of the system:	2023/11/22
Last Release Date:	2025/06/25
Technology:	Big Data

Content

2025
2024
- Kandinsky 4.0 Video
- Production of the first AI ballet
2023: Presentation of the first generative model in Russia for creating videos by text
Notes

The main articles are:

2025

Launch in the Evolution Notebooks service

Cloud.ru implemented the launch of an open model to generate Kandinsky Video Lite video in the public cloud. The model is available for free as part of a public preview of the Evolution Notebooks service. This was announced by Cloud.ru on October 16, 2025. Read more here.

Open Developer Access

Business and developers gained open access to an updated model for creating vector representations of text - Giga-Embeddings, as well as models for generating video Kandinsky Video Lite. Both models are distributed under an open license, allowing free use in commercial projects of any scale. This was announced on September 30, 2025 by the senior vice president, head of the Technological Development unit.

{{quote 'author=said Andrey Belevtsev, Senior Vice President, Head of the Technological Development Unit of Sberbank. | Sber's scientific team actively publishes various models of generative artificial intelligence. So, all models of the Kandinsky line, a family of acoustic models for the Russian language GigaAM, which are "under the hood" of our GigaChat service, are publicly available. Providing business and developers with powerful tools like Kandinsky Video Lite and Giga-Embeddings helps accelerate research, develop world-class innovative products and services. This demonstrates our desire to make a tangible contribution to the development of the international open source community. In addition, this is an important stage in the formation of standards in the field of natural language processing (NLP), as well as strengthening Russia's position as a technological leader on the world stage,}}

Kandinsky Video Lite allows you to create short videos up to 10 seconds based on a text request (prompt). This is a model containing only two billion parameters. At the same time, in internal tests, Kandinsky Video Lite surpasses much bulkier models such as the Wan 2.1 14B, Wan 2.2 5B and the original Sora in overall quality (which includes estimates of the quality of following the prompt, visual and dynamics), and is comparable in visual quality to the Wan 2.2 A14B, which is 13-14 times larger than Kandinsky. Particular attention was paid to understanding the domestic cultural code when teaching Kandinsky Video Lite.

Models will find application among researchers, developers and representatives of creative professions. Now the creation of high-quality videos will be available to everyone, regardless of the level of technical resources or project budget. An updated model has also become available to developers and businesses in open source, converting text information into vector representations - Giga-Embeddings.

Giga-Embeddings provides RAG systems (Retrieval-Augmented Generation) that guarantee the reliability and accuracy of artificial intelligence responses. Thus, the corporate sector receives a tool to improve the quality of document search, data analytics and automated user support based on up-to-date information. Developers using the model will be able to quickly create smart assistants and chatbots that contribute to the efficient processing of corporate data without the risk of false answers.

Models are already available for free use on leading platforms.

Kandinsky 4.1 Video Announcement

Sberbank is actively developing its Kandinsky neural network and will release an updated version for generating Kandinsky 4.1 Video in the very near future. This was announced on June 25, 2025 by Andrey Belevtsev, Senior Vice President, Head of the Technological Development Unit of Sberbank.

Шаблон:Quote 'author=said Andrey Belevtsev, Senior Vice President, Head of the Technological Development Unit of Sberbank.

The Kandinsky 4.1 Video model generates a video sequence of up to 10 seconds in SD (720x576) or HD (1280x720) resolution using any text description or arbitrary start frame. Using the model, you can create high-quality videos with an arbitrary aspect ratio for any user and product needs.

The model is based on an advanced diffusion transformer architecture. One of the key factors that made it possible to significantly improve the quality of the model was its further training (Supervised Fine-Tuning, SFT) on carefully selected data prepared by more than 100 experts - designers, photographers and artists with specialized education. This stage of training made it possible to significantly increase the level of artistic expressiveness, video composition and cinematography of the visual series.

The transition to a large architecture significantly increased the need for computing resources, so the development paid special attention to optimization. As a result of the use of distillation and acceleration methods, the video generation time was reduced by more than three times compared to the original version, while in a number of scenarios the generation quality was preserved or even improved.

2024

Kandinsky 4.0 Video

Sberbank December 12, 2024 beta version of the Kandinsky 4.0 Video neural network for creating realistic videos based on a text description or a launch frame. The neural network can be used by ordinary users to create animated videos congratulating loved ones, as well as designers, marketers, animators for whom Kandinsky can become an assistant in generating trailers and clips.

{{quote 'author=said Andrey Belevtsev, Senior Vice President, Head of the Technological Development Unit of Sberbank. | In the year since the release of the first version of the Kandinsky Video model on AI Journey 2023, our team has significantly improved such indicators as the quality and speed of generating full-fledged videos, thereby opening unlimited horizons for creativity, as well as product applications of the model. Now every user of the updated version of Kandinsky Video can embody their ideas and express them in video format. We are always excited to see how our technology helps people achieve their wildest creative ideas. At the same time, the time is closer and closer when artificial intelligence will be able to solve many problems at once, moreover, with a variety of data types and in different domains. And models such as Kandinsky Video contribute to global development in this important direction, significantly bringing modern technologies closer to the synergistic level of processing, perception and creation of information that humans have,}}

Now the model generates a video sequence up to 12 seconds in HD resolution (1280x720) using any text description or arbitrary start frame. With the model, you can create videos with different aspect ratios for any user and product needs.

The most important distinctive properties of this model are improved visual quality - high contrast and clarity of personnel, building the general composition of the scene, and the realism of the movements of the generated objects. This quality was achieved by the collaboration of scientific and engineering teams who worked together both to develop the architecture of the new model and to collect and filter data for training.

In addition to the main model, the Kandinsky team introduced a fast version of the Kandinsky 4.0 Video Flash, which generates a video sequence up to 12 seconds in 480p (720x480) resolution using any text description in just 15 seconds.

Kandinsky 4.0 Video is an ensemble of models, the main part of which is a diffusion transformer with 5 billion parameters. The engineers of the Kandinsky team used advanced algorithms and ways to optimize the training of large models, which made it possible to effectively learn a model of this size on huge video arrays. The model was developed and trained by Sber AI researchers with the partner support of scientists from the AIRI Institute on the combined Sber datacet.

Representatives of creative industries - artists, designers and filmmakers - will be the first to access the updated version of Kandinsky Video. For a wide audience, the neural network will be available in 1Q 2025.

Production of the first AI ballet

In July 2024, the premiere of the first in, Russia ballet created using technology (artificial intelligence AI), took place in Yuzhno-Sakhalinsk. The play "Insight," which tells about the love story of a family of engineers who went to the construction site of the century, has become a unique project at the intersection of art and modern technology.

According to Kommersant"," AI-technologies Sberbank"" were comprehensively used in the creation of the performance. The GigaChat neural network helped refine the script and choreography, Kandinsky generated sketches of scenery and costumes, and SymFormer created original musical parts in the style of modern classical music.

The premiere of the first ballet in Russia created using artificial intelligence technologies took place in Yuzhno-Sakhalinsk

The author of the idea and director was Honored Artist of Russia Kirill Ermolenko. He noted that the decision to unleash the potential of AI technologies in creativity was made together with the team, and expressed confidence in creating a new trend in art thanks to the support of Sberbank and unique specialists.

The production was attended by artists of the Mikhailovsky Opera and Ballet Theater from St. Petersburg and the Dialogue Dance Theater of the Sakhalin Philharmonic, who first performed together on the same stage. The composer of the performance was Ruslan Sabirov, the choreographer was Ivan Zaytsev, and the production designer was Maria Semakova.

The premiere of the AI ballet took place as part of the AI track of the design and educational intensive "Archipelago-2024." The project is an important part of the technological transformation of the Sakhalin Region, launched by Sberbank and the region in 2023. During the transformation, it is planned to concentrate AI technologies in the region, allocate platforms for testing solutions and disclose all factors in the development of artificial intelligence, including infrastructure, regulation and personnel.

The synergy of the creativity of people and neural networks will give viewers the opportunity to get real pleasure from music and dance, - said Andrey Neznamov, head of the Center for Human-Centered AI of Sberbank.^[1]

2023: Presentation of the first generative model in Russia for creating videos by text

Sber presented the Kandinsky Video neural network - the first generative model in Russia for creating full-fledged videos based on text description. This was announced on November 22, 2023 to TAdviser by representatives of Sberbank. According to Alexander Vedyakhin, First Deputy Chairman of the Management Board of Sberbank, the model generates a video sequence lasting up to eight seconds at a frequency of 30 frames per second.

The Kandinsky Video architecture consists of two key blocks: the first is responsible for creating key personnel that make up the plot structure of the video, and the second is responsible for generating interpolation personnel that allow you to achieve smoothness of movement in the final video. The two blocks are based on an updated image synthesis model based on text descriptions Kandinsky 3.0.

The format of the generated video is a continuous scene with the movement of both the object and the background. This is what distinguishes the videos synthesized by the Kandinsky Video model from animated videos in which the dynamics are achieved by modeling the camera span of a relatively static scene. The neural network creates videos with a resolution of 512 x 512 pixels and a different aspect ratio. The model is trained on a datacet of more than 300 thousand text-video pairs. Video generation takes up to three minutes.

"We recently trained Kandinsky to create animated videos by text description, and today we are introducing a completely different level model - the first model in Russia to generate full-fledged videos by text. This is an important contribution to the development of Russian generative neural networks. Users will have even more opportunities for creativity and the implementation of their creative ideas of any orientation, "said Alexander Vedyakhin, First Deputy Chairman of the Management Board of Sberbank.

As he added, people will be able to create unique videos absolutely free of charge. And the model itself will be available in open source.

Previously, active users of Kandinsky 2.2 in test mode have the ability to create animated videos. On one request, you can create a video four seconds long with the selected animation effect, at 24 frames per second and a resolution of 640 x 640 pixels. Users of the Kandinsky 3.0 neural network can also create videos by text description in animation mode. Telegram boat^[2].

The neural network was developed and trained by Sber AI researchers with the partner support of scientists from the AIRI Institute of Artificial Intelligence on the combined Sber AI datacet and SberDevices.

Notes

↑ The premiere of the first AI ballet in Russia took place on Sakhalin
↑ [1]You can evaluate the capabilities of the Kandinsky Video neural network on the fusionbrain.ai platform and in-video_kandinsky_bot, where you can leave an access request

Источник — «https://tadviser.com/index.php/Product:Sberbank_Kandinsky_Video_Neural_network_for_generating_full-fledged_video»

The site content is translated by machine translation software powered by PROMT. The machine-translated articles are not always perfect and may contain errors in vocabulary, syntax or grammar. Read original article
If you find inaccuracies or errors in the results of machine translation, please write to editor@tadviser.ru. We will make every effort to correct them as soon as possible.

Simple Link

How to create a "smart plant": Key characteristics of a modern digital enterprise 14700

Model Studio CS: How to use BIM to give new impetus to the development of the fuel and energy complex 17000