| Developers: | VK (formerly Mail.ru Group) |
| Date of the premiere of the system: | 2025/08/27 |
| Technology: | Big Data |
Main article: Big Data
2025: VK-LSVD Presentation
The researchers AI VK posted the VK-LSVD (Large Short-Video Dataset) datacet in the public domain. With its help, engineers and scientists will be able to develop and improve recommendations algorithms to make services and products more personalized. VK announced this on August 27, 2025.
Dataset includes 40 billion impersonal unique interactions of 10 million users with 20 million short videos in six months (January-June 2025), including aggregated likes, dislikes, shers, viewing duration and playback context.
All data is presented in the format of numerical identifiers, which ensures complete confidentiality. Embedding (numerical description of content) is provided for each video, and socio-demographic characteristics are provided for each user. This allows researchers to build models focused on both behavioral data and content.
Short videos are a format for recommendation algorithms. Unlike music, podcasts or long videos, they cannot be consumed in the background, and each video shown receives some reaction from the user. Even if the user does not leave a like, a pass or video search is already considered feedback.
| As of August 2025, there are not many large open datacets on the basis of which models can be trained and evaluated. To build accurate recommendation algorithms, it is important to take into account not only explicit user reactions, but also additional signals: viewing duration, context, content. VK-LSVD is an important step towards the formation of a research environment in which hypotheses can be tested and accurate models can be built based on real-world data. We plan to develop the datacet, and very soon we will hold an open competition for engineers, - said Dmitry Kondrashkin, AI director at VK. |
Instead of dividing by fixed datacet sizes, VK-LSVD allows you to flexibly customize the selection for the tasks of a particular study. Engineers can independently set the required amount of data, choose how to select them - randomly or in popularity. This approach allows you to adapt the data set for real problems and computing power that commands have. And apply VK-LSVD both for academic projects and for large-scale industrial experiments.

