New data platform in X5 Group. How the company moved out of SAP storage, ditched Tableau and SAS
Customers: X5 Group Contractors: X5 Group Product: ADB - Arenadata DBSecond product: ClickHouse - database management system (DMS) Project date: 2022/03
|
Speaking at the TAdviser SummIT conference on November 29, 2023 , Kh5 Group data management director Tigran Sarkisov told how the largest Russian retailer in a short time abandoned storage based on SAP HANA and a number of imported BI tools, including Tableau.
At the beginning of 2022, the Kh5 actually had two vaults: SAP BW on HANA and a target platform that consisted of the Greenplum cluster (Arenadata DB build), the Hadoop cluster (data lake) plus various BI analytical tools. Earlier, the Kh5 said that at that time almost half of the analytical load on the part of BI was in the legacy data warehouse SAP BW, which was built for a long time and contained reports, assemblies that the business managed to get used to. An important task that was solved with BW was to prepare reports, close the financial period.
The platform included a tool for Data Governance: the IBM Cloud Pak for Data directory, and Ataccama to solve the data quality problem.
Tigran Sarkisov explained the presence of the two DWHs. Most companies start with the fact that they have some kind of legacy in the form of Teradata, Exadata or others. It is quite expensive to disassemble, and there is not always much benefit from it. Therefore, some part of the cloud was left in the Kh5.
As of February 2022, the company's data management platform was as follows:
The installation in Kh5 was one of the most heavily loaded SAP BW systems in Europe. It was located in the SAP HEC (HANA Enterprise Cloud) cloud based on the Russian data center of the German vendor. But after the events of February 2022, Kh5 were warned that soon the data center will be dismantled, and the servers that are used there will move to the European data center. SAP gave the Kh5 to output data for only three months. Therefore, I had to quickly disassemble this part.
It required technology comparable to HANA, which allows you to store data and process high-load requests from a large number of users. ClickHouse became such a technology for the company, it was added to the architecture. The ETL part with SAS is now migrating to dbt, and Tableau is no longer in use - they switched to Qlik instead. From the data catalog IBM migrated to Open Metadata, which works quite well.
The project lasted about 9 months. Currently, the data management platform looks like this:
Migration was successful, Tigran Sarkisov noted. At first, there were concerns that, for example, Greenplum would not cope with the load, or that they would not have time to train users. But it turned out to be surmountable. ClickHouse and Greenplum work together on tasks.
Now Kh5 is introducing its own, private cloud, says Tigran Sarkisov. Now the project is at the proof-of-concept level of using S3, but soon it is planned to move on to pilot projects to migrate master data to S3 with Hadoop and Greenplum. The company expects that a working case will already appear next year.
О TAdviser SummIT
The TAdviser SummIT conference, held on November 29, became one of the largest in the history of its holding: in total, more than 1.4 thousand delegates attended the event, more than 40 stands were deployed in the exhibition area, and more than 150 speakers made presentations in the plenary part and eight thematic sessions. The conference was attended by the Minister of Digital Development, Communications and Mass Media Maksut Shadayev, who in the format of an interview on stage answered sharp questions from participants in the IT industry.