RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
Project

Magnit will create a new import-replaced data warehouse for Dixy amid the active growth of the store chain

Customers: Dixy

Moscow; Trade


Second product: Apache Spark
Third product: DSS Projects

Project date: 2025/09

As TAdviser found out, a corporate data warehouse (Data Warehouse, DWH) 2.0 will be created for the Dixy chain of stores. This follows from the corresponding request for proposals published on one of the tender sites in August 2025. It was placed by the retailer Magnit, which includes Dixy.

Enterprise Data Warehousing is a centralized information system that stores, integrates, and consolidates data from multiple sources to support management decisions. It is a single, reliable source of data for business analysis and reporting.

From the technical assignment published for the tender, it follows that at the time of the request for proposals as an existing data platform, Dixy operates the solution in on-premium using MSSQL Server, PowerShell and SQL Server Agent technologies. It is planned to create a new "fault-tolerant, import-independent, scalable solution that allows you to solve problems related to loading data, its processing and use, as well as data quality control."

The new platform should be based on an infrastructure located either in the cloud, preferably from Yandex or Sberbank, or in on-premium using open source and/or solutions included in the register of Russian software, the terms of reference specify.

The system should be implemented using lakehouse architecture - a hybrid architecture that combines the capabilities of Data Lake and DWH (for example, Apache Iceberg + Apache Spark). At the same time, the contractor is expected to provide services for the development, commissioning and documentation of frameworks for loading data into DWH and uploading to auxiliary databases for building cubes in SQL Server Analysis Services, if there are any databases.

A variant of a possible implementation of DHW 2.0 for Dixy, given in the terms of reference. The green area highlights the blocks whose components will be changed as part of the migration in relation to the current solution

When choosing technologies, it is necessary to take into account the key requirements for the system, which include: support for batch and streaming modes of data loading, the presence of automatic scaling mechanisms for peak loads, the availability of tools for managing and delimiting resources for different groups of users.

The storage subsystem should provide the ability to host various types of data (structured, semi-structured, unstructured) with the organization of metadata that allows you to organize and quickly search for information in large arrays. To process structured data, it must have query optimization tools to connect large tables - more than 100 000 000 records - without measurable performance degradation.

The current volume of raw data in Dixy is 120 TB, follows from the terms of reference. And the volume of data is constantly growing: the company expects that on the horizon of three years the volume of raw data will be up to 250 TB. One of the main business drivers of data growth at Dixy is the opening of about 300-500 new outlets in 2025. At the time of the purchase, it is about 2.4 thousand.

Other drivers are: the need from departments to load data into DWH - primarily e-com; building objects in a sandbox - in particular, marketing; plans to store semi-structured data in DWH.

Stores near the Dixy house, we recall, is one of the largest retail chains of the grocery format "at home" in Russia. As of the end of 2024, Dixie numbered 2,363 points, according to Magnit's annual report. After acquiring the network in 2021, Magnit continued to develop this brand. In 2024, 344 facilities were redesigned, 155 new stores were opened during the year, including one darkstore, and the model of their work was improved.