RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2018/02/05 13:27:42

TAdviser Interview: Andrej Arshavsky, NLMK - how data analysis saves hundreds of millions rubles

The director of mathematical modeling and data analysis of Novolipetsk Metallurgical Combine Andrej Arshavsky – about creation of models on steelmaking production and economic effect of technologies for business.

Andrej Arshavsky: For the solution of a task it is insufficiently simple to operate with numbers. (The photo is NLMK)

What enters responsibility of the director of mathematical modeling and the analysis of Big Data of NLMK?

Andrej Arshavsky: My command is engaged in implementation of artificial intelligence technologies (AI) for the solution of production tasks. We use methods of machine learning, advanced analytics, the Big Data tools for optimization of production and business processes. The ultimate goal - to earn or save the companies money.

And where it is possible?

Andrej Arshavsky: If to speak about production process then it is about performance improvement of the equipment, economy of raw materials and materials, about product quality, about service and repair of the equipment. If to look more widely, then the artificial intelligence can help with solving of tasks of optimization of warehouse stocks, sales, purchases, tasks of HR. AI technologies can be used also for creation of new products: steel, finished steel. Thanks to data analysis and modeling we will have an opportunity to understand, how exactly it is necessary to change production process or technology to receive these or those properties of metal.

Also found already new methods of production of new products?

Andrej Arshavsky: So far not, but we work on it.

What is represented by your directorate? Do you have the laboratory where you make experiments?

Andrej Arshavsky: There is a command. Yet big. She thinks out that should be made, and attracts to implementation of contractors depending on tasks.

Whom?

Andrej Arshavsky: Last year we worked with three contractors: with Yandex Data Factory which unfortunately stopped the existence, both the companies AlgoMost and CeleraOne. Now we conduct negotiations on cooperation with the new companies, but to tell the specific names so far early. The internal command consists of data scientists, analysts, project managers. We solve the tasks facing us in close cooperation with other divisions of NLMK. First of all, actively we cooperate with division of operational efficiency, directly with workshops, with technical control, with IT and the PCS. Almost representatives of above-mentioned divisions take part in each project, and we are responsible for development of models, algorithms and for the general coordination of works.

How does mathematical modeling in relation to production process, for example, to steel production work?

Andrej Arshavsky: Steel in NLMK Group is melted in oxygen converters or arc furnaces. In converters, for example, liquid iron with additives is blown oxygen which brings carbon and other impurity out of fusion. Ferroalloys which allow to change its chemical composition are added to liquid steel. Ferroalloys are expensive materials, and we are interested in optimizing their expense.

To reduce consumption of ferroalloys, we train model at historical data on what was (chemical composition of cast iron, dosing of other components of a blend, conditions of production, including temperature conditions, volume of the added ferroalloys) and what turned out as a result of processing (final chemical composition of steel). The model studies at these data and becomes capable to answer a question: "what will be if". Using it, we can pick up an optimal proportion of ferroalloys for receiving the set chemical composition of steel.

How do you train model?

Andrej Arshavsky: There is a number of mathematical methods which are used depending on tasks. Their application depends on amount of data, on quality of data. Experiments are made, parameters at which the model will give forecasts with the largest accuracy are selected. As a rule, for the solution of a task it is necessary to build a set of mathematical models which optimum work in these or those modes.

As a result, depending on initial conditions, all models, or their part are used, and a system which predicts "is born what will be if". All this "is packed" into the software which is integrated with other systems and is capable either to manage production, or to recommend to the operator to perform certain operations.

How does integration process happen?



Andrej Arshavsky: Work on integration is performed by either the contractor, or ourselves.

In 2016 the average level of use of data in steel industry did not exceed 5% - such assessment was given by Oleg Bagrin, the CEO of NLMK. In January, 2018 this indicator moved somewhere?

Andrej Arshavsky: Now for implementation of the current projects, we use, certainly, more data, than was a year ago. And we prepare infrastructure that we could use all amount of data.

100% of data?

Andrej Arshavsky: That volume which is almost necessary for solving of tasks.

And what is necessary?

Andrej Arshavsky: We have no problem of utilization of data. We make a start from practical problems of optimization of production. For application of methods of artificial intelligence all data and their large volumes are not always necessary. We prioritiyezirut our tasks in terms of potential economic effect, further already we look at existence, quantity and quality of data. In order that it is good to train model at historical data, to optimum have historical data for a long time frame. Than depth, volume and quality of data, subjects high probability of the fact that the model will be able well to study is more.

What data will NLMK collect? And how is going them to apply?

Andrej Arshavsky: If to classify data, then it is possible to make a start from hierarchy of IT systems. At the bottom level sensors which collect information with the greatest discretization are located and transfer her to levels of controllers and further to levels of automatic control systems by production (PCS).

These data collect for a limited time frame in databases. The following level is the level of management, the so-called MES systems. There data on works orders and managing data are used. The highest levels are ERP, SAP. There data on products, on what is in warehouses are collected. There is a lot of classes of data.

If to speak about higher levels of this hierarchy, then, as a rule data are well qualitatively collected and stored on rather big time "depth". These are data in general of small volume. If to speak about data of the bottom level, data of measuring systems and sensors – that them can be carried to a class of data of large volume. Sensors are capable to register information with big rate.

At these levels they not always collected accurately: their storage was too expensive or is practically not demanded. Now we correct this situation. We unrolled a big cluster which problem is accumulation and data storage from sensors. We start a number of projects which will allow not to throw out data from the PCS, and to lodge to us.

What the cluster is?

Andrej Arshavsky: We already constructed a cluster and "The analysis system of data and modeling" called it. The cluster is constructed on open tools based on Hadoop. It is already started in work in December, 2017. It consists of 10 servers which are united in a single system which allows not only to store data, but also in parallel to process them. Thus, it is the combined storage system and data processings.

She is at the moment capable to store 144 terabytes of data and to process in RAM to the 3rd terabyte of data. We will have enough it initially. About a third of volume is already filled. System architecture allows to scale it linearly. Buying in addition standard blocks, we can expand a system.

Give specific examples of use of analytical tools?

Andrej Arshavsky: One of our projects connected with control of a consumption of ferroalloys reached a stage of industrial tests on the Lipetsk platform now. There already there passes the third stage of tests. So far early to tell about its results, but the expected economic effect of this project - economy about 200 million rubles annually. A system is intended for two converter workshops of the Lipetsk platform.

We implement the similar project in the Urals, on the platform in Revda. Already tests of a part of the project which show economy of 5 million rubles a year came to the end. On Revda development goes for the arc furnace, in Lipetsk – for the converter. In the first case there is a smelting of steel from scrap, in the second – from cast iron. A system gives to the operator accurate instructions: how many it is necessary to add ferroalloys. Earlier the operator operated, based on own experience, on entries in magazines about similar swimming trunks. And the main problem just also is that the person, unlike the machine, is not capable to aim precisely.

How does budgeting of your projects happen?

Andrej Arshavsky: Project implementation will be approved on investment committee. The realization value of one standard project is not big, especially in comparison with the expected income. We conduct a production research, as a result of this research the ideas are born, the ideas turn into projects, the assessment of their economic effect is carried out. Costs for implementation are, as a rule, much lower than the expected economic effect - on average, it is 10 times less, than annual economic effect.

How many now at you in work of projects?

Andrej Arshavsky: Now we work on seven projects. In the nearest future their number will increase to ten. How many we will add projects in 2018, is not solved yet.

What it for ten projects?

Andrej Arshavsky: Two projects on reduction of a consumption of ferroalloys, the project connected with product quality, the project on optimization of work of combined heat and power plant (on optimization of use of natural gas), the forecast of failure of the equipment of the blast furnace, optimization of operation of the unit of continuous galvanizing, we start the project on coke-chemical production (optimization of work of coke-chemical capacities of the company for development of fraction of coke of the optimal size for operation of the blast furnace) and the project on reducing costs for purchase of scrap. On the way several projects on optimization of work of the Camp-2000 designed to increase its performance approximately by 5%. The projects allowing to predict damage are studied. In general, the repair sphere is very extensive also project implementation potential very big there.

In projects in which modeling is used it is difficult to count effect precisely. It depends on data, on process physics – on many factors. Therefore when we approach assessment of economic effect, we look at a difference between the current performance and theoretically possible. For example, we know that the camp theoretically can work for 20% quicker. We take this difference as a basis and we assume that we by methods of mathematical modeling can reduce this difference by 5%. Less than 5% are already simply technical error. Such approach allows to estimate potentialities.

How the efficiency of a mathematical model is measured?

Andrej Arshavsky: We estimate economic effect of the project. Naturally, it is possible to estimate also the accuracy of work of each model if they are used a little. But there are a lot of nuances. At each model the accuracy, and at everyone the accuracy in different conditions. As a result we have many numbers which it is difficult for one to lead to something and there is in it no expediency. We simply estimate total final economic effect.

To NLMK you worked in the banking sector. There is some difference in work with Big Data in the banking sector and in steel industry?

Andrej Arshavsky: Big. Tasks of banking sector for data scientist, even not connected in the previous experience with banks, are intuitive. Data in banks are, as a rule, collected accurately. They are available and there is a lot of them. Problems with quality of data arise, but it is less of them and they are more simply solvable. In the industry in terms of these criteria a situation another. For the solution of a task it is insufficiently simple to operate with numbers, as in bank or marketing spheres. Here it is necessary to understand production process.

Otherwise it is impossible to define the importance of projects, truly to interpret them. With data there is a number of nuances too. In banking sector transactions, the facts of passing of payments are registered. These are the indicators which are unambiguously interpreted and saved. On production we deal with sensors which work not always precisely, and we face a problem of quality, volume and data availability. Often happens so that data are available only on the small time horizon just because they began to be collected recently. Also this horizon is not enough to train model.

What are you doing, to increase quality of data?

Andrej Arshavsky: Sometimes it is just necessary to replace sensors or to add additional. In general, the quality of data is defined by several values: completeness, accuracy, reliability. The completeness is the depth of storage, work range. For example, the sensor well works in certain temperature conditions, and in some - does not work in general. As a result we have only a part of data. So data are not complete. There are sections on which sensors were not installed at all. And if we set it half a year ago, then collected for this time frame there will be not enough data for training of model.

The sensor can incorrectly collect data in general. If to look at data, then they, apparently, are, indicators change, but at deeper trial it becomes clear that the sensor wrote some "gag" and its indications cannot use. Happens that there is enough depth and accuracy of the data, but there is not enough accuracy. For example, the sensor can collect data with a margin error 0.2 scales. And to train model and to exercise exact control with its help, the error at data collection should be no more than 0.01.

Where do you take such fine sensors?

Andrej Arshavsky: In principle, market of sensors rather complete. Here more likely a question in the organization of process of updating of sensors. Sometimes it is necessary to carry out wires to hard-to-reach spots, to weld in sensors in some pipes. For this purpose it is necessary to stop production. In principle, the issue resolved. But if we, beginning work on the project, face that there are not enough sensors or they improper for our tasks, and we make the decision on their replacement, we anyway should wait up to one and a half years until necessary amount of data gathers. The question with quality and data availability needs to be solved in advance. It is necessary to consider even those projects which emergence we do not assume now.

Production workers often say that they see the beginning of process and the end of process, but that occurs inside — it is not known. How do you in such conditions make a mathematical model?

Andrej Arshavsky: Here it is necessary to explain about what types of models we in general speak. Their three views and we are engaged only mainly in one of them. Classical mathematical models which were used the last decades are the models working at set of empirically displaid formulas. Some process, for example, process of domain melting is calculated. It is studied what during this process there are processes as they are interconnected and in what results result.

There is a calculation of what occurs on the basis of these theoretical formulas inside further. These formulas only to defined degree approach reality and do not work precisely in some ranges. And when there is a process of modeling of a difficult object, a set of these chemical and physical formulas give a mutual error, errors collect. We do not use this class of models.

We uses models which study at data. If it is about the blast furnace and if we have data on what was loaded into each timepoint in this furnace what structure there was a casting what was the melting temperature how many was oxygen what factors influenced the furnace at the time of specific melting, then as a result we can train model and receive a set of coefficients. The model will study on the basis of detailed data at an input and at the exit for all history of melting. At the same time to it it is not so important what occurs in the middle in the modelled process.

The model as a result represents "bag" of the picked-up coefficients. Learning process just consists in selecting and changing these coefficients of the selected model type, whether it be "neural network" or "the accidental wood". The model learns to imitate a real object. Historical data are divided into several parts. On one part there is a training of model, on another — checking of model. Then models give real input data and look, how precisely she predicts an exit. Thus, model accuracy is estimated. After that it can already be applied, i.e., with its help to answer a question: "what will be if". The search method, thus, can be found optimal control mode production. This class of models is based on methods of machine learning.

There is still the third class of models. It very difficult also requires work with supercomputers. It is applied widely in modeling of nuclear reactors, a wing of the airplane, engines. In those industries where live experiments are very expensive or impossible. This level is called step-by-step modeling. In it process is decomposed on molecules and on atoms. There is process simulation further. On each step the behavior of each atom of the studied object in the following timepoint is modelled. Creation of such models requires very big preparation and big resources.

Now there are scientific works which describe attempts to simulate thus the blast furnace. If such model appears, it will give boundless opportunities for optimization of operation of the furnace and economy on materials. Under a condition, of course, that the model will have a possibility of exact reconstruction of initial data. The control of the blast furnace is a difficult process. We meanwhile do not touch this subject. Our command is focused on machine learning so far. This method meanwhile allows us to achieve necessary results. We try to approach the project of optimization of operation of the blast furnace. We have pilots.

It is simpler to work with converter furnaces?

Andrej Arshavsky: In the converter furnace it is more and more predictable though too there are cases. In general, in advance to tell whether the task using this or that approach is solved — it is impossible. The behavior of a real object in the different ranges of its work can be very different. For example, in the observed range the approximating function can be smooth. But at an exit out of limits of observed range it can sharply change the character and the forecast of model will be inapplicable. Without knowing exact physical characteristics of an object, it is impossible to tell how he will behave in this or that mode. This unpredictability is not a stop factor. It gives opportunities to experiment with an object in the different modes.

You listed many projects which were implemented or began to implement for the last half a year. What project personally for you became the most interesting?

Andrej Arshavsky: Interesting and at the same time difficult project is a project on reduction of an expense of a blend, ferroalloys and the electric power on the platform in Revda. Within this project we should develop model which will allow to save not only ferroalloys, and a number of materials. And all this should work in a complex. And production on the platform multistage: melting in the arc furnace, then melting in the furnace ladle, - and at each stage new materials are added. So on the platform it is necessary to perform global optimization, to solve at the same time many problems. We work on it now. The project is interesting in a type of the complexity.