Customers: Novolipetsk Metallurgical Combine, NLMK
Contractors: Jet Infosystems Product: ADH - Arenadata HadoopНа базе: Apache Hadoop Second product: Apache Kafka Third product: Apache Hive Project date: 2018/08 - 2019/08
Number of licenses: 20
|
2019: Data Lake creation
Project scale:
- 7840 man-hours
- 20 automated jobs
Used by software:
The solution is implemented based on a domestic distribution kit of the distributed platform of storage Arenadata Hadoop. For solving of tasks of collecting, transfer, transformation and accumulation data services Apache Kafka, Apache NiFi were used and Apache Hive.
The NLMK company defined strategy, then specialists of the IT company developed and implemented technical solution on the Arenadata Hadoop platform. The implemented SIT DOWN platform of a class of the solutions Data Lake collects data, providing information on production and technology processes to models of machine learning.
Specialists configured regular unloading of data to "the lake of data" from 70 sources (sensors and also MES and an APCS) and also loaded historical data for the last few years of work of the enterprise and developed medium maps technology and production processes of separate workshops. Capacity of the built "lake of data" is the 300th terabyte.
Within the project the team of the contractor developed model of the unified data mart for NLMK, implemented management of metadata of means of Apache Atlas (tagging, search, etc.) and configured the centralized role model, having integrated it with the Active Directory directory service. It gave the chance date-sayentistam most quickly to get access to data necessary to them in Data Lake.
For control of work of Data Lake complex condition monitoring of services of a system in Zabbix is configured and also control of integrity and completeness of data is automated. For especially important and vulnerable data the possibility of backup is created: i.e., in case of inadvertent destruction of data by the user they can be recovered.