RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

Sberbank Kandinsky Video Neural network for generating full-fledged video

Product
The name of the base system (platform): Sberbank Kandinsky Neural network for generating images by description
Developers: Sberbank
Date of the premiere of the system: 2023/11/22
Last Release Date: 2024/12/12
Technology: Big Data

Content

The main articles are:

2024

Kandinsky 4.0 Video

Sberbank December 12, 2024 beta version of the Kandinsky 4.0 Video neural network for creating realistic videos based on a text description or a launch frame. The neural network can be used by ordinary users to create animated videos congratulating loved ones, as well as designers, marketers, animators for whom Kandinsky can become an assistant in generating trailers and clips.

{{quote 'author=said Andrey Belevtsev, Senior Vice President, Head of the Technological Development Unit of Sberbank. | In the year since the release of the first version of the Kandinsky Video model on AI Journey 2023, our team has significantly improved such indicators as the quality and speed of generating full-fledged videos, thereby opening unlimited horizons for creativity, as well as product applications of the model. Now every user of the updated version of Kandinsky Video can embody their ideas and express them in video format. We are always excited to see how our technology helps people achieve their wildest creative ideas. At the same time, the time is closer and closer when artificial intelligence will be able to solve many problems at once, moreover, with a variety of data types and in different domains. And models such as Kandinsky Video contribute to global development in this important direction, significantly bringing modern technologies closer to the synergistic level of processing, perception and creation of information that humans have,}}

Now the model generates a video sequence up to 12 seconds in HD resolution (1280x720) using any text description or arbitrary start frame. With the model, you can create videos with different aspect ratios for any user and product needs.

The most important distinctive properties of this model are improved visual quality - high contrast and clarity of personnel, building the general composition of the scene, and the realism of the movements of the generated objects. This quality was achieved by the collaboration of scientific and engineering teams who worked together both to develop the architecture of the new model and to collect and filter data for training.

In addition to the main model, the Kandinsky team introduced a fast version of the Kandinsky 4.0 Video Flash, which generates a video sequence up to 12 seconds in 480p (720x480) resolution using any text description in just 15 seconds.

Kandinsky 4.0 Video is an ensemble of models, the main part of which is a diffusion transformer with 5 billion parameters. The engineers of the Kandinsky team used advanced algorithms and ways to optimize the training of large models, which made it possible to effectively learn a model of this size on huge video arrays. The model was developed and trained by Sber AI researchers with the partner support of scientists from the AIRI Institute on the combined Sber datacet.

Representatives of creative industries - artists, designers and filmmakers - will be the first to access the updated version of Kandinsky Video. For a wide audience, the neural network will be available in 1Q 2025.

Production of the first AI ballet

In July 2024, the premiere of the first in, Russia ballet created using technology (artificial intelligence AI), took place in Yuzhno-Sakhalinsk. The play "Insight," which tells about the love story of a family of engineers who went to the construction site of the century, has become a unique project at the intersection of art and modern technology.

According to Kommersant"," AI-technologies Sberbank"" were comprehensively used in the creation of the performance. The GigaChat neural network helped refine the script and choreography, Kandinsky generated sketches of scenery and costumes, and SymFormer created original musical parts in the style of modern classical music.

The premiere of the first ballet in Russia created using artificial intelligence technologies took place in Yuzhno-Sakhalinsk

The author of the idea and director was Honored Artist of Russia Kirill Ermolenko. He noted that the decision to unleash the potential of AI technologies in creativity was made together with the team, and expressed confidence in creating a new trend in art thanks to the support of Sberbank and unique specialists.

The production was attended by artists of the Mikhailovsky Opera and Ballet Theater from St. Petersburg and the Dialogue Dance Theater of the Sakhalin Philharmonic, who first performed together on the same stage. The composer of the performance was Ruslan Sabirov, the choreographer was Ivan Zaytsev, and the production designer was Maria Semakova.

The premiere of the AI ballet took place as part of the AI track of the design and educational intensive "Archipelago-2024." The project is an important part of the technological transformation of the Sakhalin Region, launched by Sberbank and the region in 2023. During the transformation, it is planned to concentrate AI technologies in the region, allocate platforms for testing solutions and disclose all factors in the development of artificial intelligence, including infrastructure, regulation and personnel.

File:Aquote1.png
The synergy of the creativity of people and neural networks will give viewers the opportunity to get real pleasure from music and dance, - said Andrey Neznamov, head of the Center for Human-Centered AI of Sberbank.[1]
File:Aquote2.png

2023: Presentation of the first generative model in Russia for creating videos by text

Sber presented the Kandinsky Video neural network - the first generative model in Russia for creating full-fledged videos based on text description. This was announced on November 22, 2023 to TAdviser by representatives of Sberbank. According to Alexander Vedyakhin, First Deputy Chairman of the Management Board of Sberbank, the model generates a video sequence lasting up to eight seconds at a frequency of 30 frames per second.

The Kandinsky Video architecture consists of two key blocks: the first is responsible for creating key personnel that make up the plot structure of the video, and the second is responsible for generating interpolation personnel that allow you to achieve smoothness of movement in the final video. The two blocks are based on an updated image synthesis model based on text descriptions Kandinsky 3.0.

The format of the generated video is a continuous scene with the movement of both the object and the background. This is what distinguishes the videos synthesized by the Kandinsky Video model from animated videos in which the dynamics are achieved by modeling the camera span of a relatively static scene. The neural network creates videos with a resolution of 512 x 512 pixels and a different aspect ratio. The model is trained on a datacet of more than 300 thousand text-video pairs. Video generation takes up to three minutes.

File:Aquote1.png
"We recently trained Kandinsky to create animated videos by text description, and today we are introducing a completely different level model - the first model in Russia to generate full-fledged videos by text. This is an important contribution to the development of Russian generative neural networks. Users will have even more opportunities for creativity and the implementation of their creative ideas of any orientation, "said Alexander Vedyakhin, First Deputy Chairman of the Management Board of Sberbank.
File:Aquote2.png

As he added, people will be able to create unique videos absolutely free of charge. And the model itself will be available in open source.

Previously, active users of Kandinsky 2.2 in test mode have the ability to create animated videos. On one request, you can create a video four seconds long with the selected animation effect, at 24 frames per second and a resolution of 640 x 640 pixels. Users of the Kandinsky 3.0 neural network can also create videos by text description in animation mode. Telegramboat[2].

The neural network was developed and trained by Sber AI researchers with the partner support of scientists from the AIRI Institute of Artificial Intelligence on the combined Sber AI datacet and SberDevices.

Notes

  1. The premiere of the first AI ballet in Russia took place on Sakhalin
  2. [1]You can evaluate the capabilities of the Kandinsky Video neural network on the fusionbrain.ai platform and in-video_kandinsky_bot, where you can leave an access request