Customers: RTS tender Product: Apache Hadoop Second product: Apache Kafka Project date: 2017/09
|
"RTS tender", one of the electronic platforms, largest in Russia, for carrying out purchases, constructed Data Lake ("lake of data") there is a data warehouse, told TAdviser in the company in April, 2018. It is created based on open technologies, such as Hadoop, Apache Spark, Kafka, Hive and others.
Data Lake is filled with data of clients "RTS tender", customers and suppliers in the field of purchases who were accredited on the platform and also taken from open sources, in particular, from EIS of state procurements: information from documentation, actions of clients on the platform, tenders in which take part also other. As of April about 200 Tbyte of data are stored in it.
Data in the company are used for a business intelligence and OLAP, reporting, deshbord, including in mobile application.
We use data first of all for process optimization, development of the solutions simplifying work of the user on the platform. For example, our service according to relevant offers which work is constructed on the analysis of a behavioural user profile collects and analyzes data on the specific user that finally by means of algorithms of machine learning to offer the user participation in the tenders suitable it. And it is already customized approach, loyalty of users and monetization of new solutions for business, - Vladimir Grigorenko, the Chief Digital Officer "RTS tender" explained TAdviser. |
Earlier "RTS tender" step by step used classical relational DBMS and Data Warehouse for the analysis and data processing. As of April new services on the basis of machine learning use the data stored in Data Warehouse, but algorithms of their processing are already executed by means of the cloud systems. When data are transferred to Data Lake in sufficient amount, also processing algorithms will be moved there. The company expects to perform complete transition to Data Lake within half a year-year.
Vladimir Grigorenko selects Data Lake methodology several advantages in front of earlier used tools. First, storage of "crude" data is provided in Data Lake. And it allows to save all data of users without distortions and not to spend time for their conversion. Further, analysis algorithms and training can address those data which are necessary to them for the solution of a task, without addition and reorganization of all data array, the representative explains "RTS tender".
The methodology of Data Lake is implemented using open standards and platforms open source. It gives the chance of their simple implementation without big finance costs, - one more advantage brings Grigorenko. |
In addition, the current ecosystem built around Data Lake methodology has a large number of tools for effective solution of different business challenges, the Chief Digital Officer added "RTS tender". Such platforms as Hadoop, Spark are already widely used by the large companies in different spheres. The libraries of machine learning developed for these platforms allow to solve effectively and quickly complex problems.
The main IT infrastructure "RTS tender" is constructed on technologies of Microsoft. In the company say that considered different solutions for creation of "the lake of data", but stopped on open solutions as on the most effective option.