RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

T-ECD (T-Tech E-commerce Cross-Domain Dataset)

Product
Developers: T-Technologies (formerly TKS Holding)
Date of the premiere of the system: 2025/09/26
Branches: Internet Services,  Trading
Technology: Big Data

Main articles: Big Data

2025: Publication of the T-ECD Datacet in the Public Domain

The Center for Artificial Intelligence of the T-Technologies group has released T-ECD (T-Tech E-commerce Cross-Domain Dataset), one of the world's largest datacets for advisory systems in the field of e-commerce. T-ECD is collected on the basis of anonymized actions of 44 million unique users of services City: Shopping and Supermarkets, as well as the advertising platform of T-Bank, 30 million goods and more than 135 billion interactions, representatives of T-Technologies reported on September 26, 2025.

T-ECD

According to the company, the datacet contains information on 44 million unique users, 30 million products and more than 135 billion interactions. Data is collected with a depth of 1 year to 3.5 years, which allows you to analyze both short-term and long-term user preferences.

The distinctive capabilities of T-ECD are cross-domain and versatility for solving different types of problems. The benchmark consists of five interconnected and fully anonymized data sources: transaction purchase history, checks, feedback, interaction with recommendations for consumer goods (FMCG) and non-food products, history of activations and use of special offers and cashbacks. All data sources can be used as independent datacets, as well as linked to the keys of users, goods or store brands, which allows you to build complete behavior profiles and analyze complex scenarios for personalization. Datacet is suitable for most types of recommendation tasks - recommendations of one next object (next-item), the next basket (next-basket), the next session (session-based), general top N recommendations and other types of tasks.    The data is collected with a depth of 1 to 3.5 years, which allows you to analyze both short-term and long-term user preferences, the dynamics of their changes, as well as seasonality and trends. The data depth factor is extremely important for research, since it allows you to make correct data partitions for model training and makes it possible to significantly improve the quality of recommendations when using deep neural networks.

Datacet contains information on the characteristics of users and goods, explicit and implicit user feedback, which makes it universal and opens up opportunities for research on the full coverage of classes and types of recommendation systems - from collaborative filtering to more complex context and graph approaches for using deep learning. In academic datacets, there is often data only on meaningful actions of users: clicks, purchases, likes, etc., but there is no data on views (the so-called "impressions"). At the same time, it is more important for business to know what the recommendation system showed users. This helps to more accurately understand what users saw, but did not react in any way. T-ECD contains data not only on the facts of views, but also clarifies the source - search, catalog or recommendations, which allows you to assess the impact of recommendations on users or simulate the effect of exposure.

T-ECD solves one of the main problems of the community - most of the existing datacets for recommendation systems are outdated and do not reflect the current user behavior and interaction with modern services and platforms.

With T-ECD, researchers and developers get a benchmark based on real preferences and patterns of user behavior, which allows you to check various algorithmic machine learning for as close as possible to real working data, and increases confidence in the results of experiments.

File:Aquote1.png
The team and I consider it important to contribute to the development of open datacets and models for the further development of recommendation systems. The T-ECD dataset can become one of the benchmarks and bring value to the ML community to optimize the quality of personalization and client experience of real users.
told Marina Ananyeva, Head of Recommendation Systems at T-Bank
File:Aquote2.png

Dataset T-ECD is available on Hugging Face under the Apache 2.0 license, allowing free commercial use and modification.