Content |
Most business processes now rely on applications and IT infrastructure. There may be weaknesses in this foundation. Monitoring helps to find them, eliminate them, and thereby prevent the loss of money and reputation. We drew up a step-by-step action plan on how to properly organize such a process, together with Alexei Akopyan, the head of the monitoring direction of the Jet Infosystems.
Preparation
Without a monitoring system, finding the cause of a poorly functioning service resembles wandering in a dark room. Before implementing it, you should:
Understand the need for monitoring systems
The logistics company migrated from one platform to another, and as a result, the portal for ordering services "suffered." Customers issued container shipments, but orders flew off. The search for a problem has come to a standstill. Several contractors were involved, and none of them took responsibility for what happened. After the implementation of the monitoring system, a scenario of synthetic tests was developed that emulate typical user actions on the portal when ordering a service, and an informative dashboard for displaying test results. We saw how each step of this scenario was processed, where it "hung" - we regularly hastened to send data to the payment system. So we managed to narrow down the search for a problem. As a result, several significant flaws were revealed, and the customer demanded that the contractor finalize the portal, presenting objective evidence of the problem. Alexey Hakobyan, Head of Monitoring Department "Jet Infosystems"
|
Analyze company processes
The system is built from a business service, otherwise it only captures the fact of a "fire" in a limited area, but does not show the cause and results of the failure.
The absence of a service approach is one of the most critical errors. Without it, this or that component of the IT infrastructure is considered on its own, in isolation from the business function that it supports, "explains Alexei Hakobyan. - More effective if the monitoring system is built from the business service. For example, a company wants to understand how its document flow or financial application works. Then, based on the structure of the service, the monitoring system "covers" the components of the IT infrastructure that supports these services, tracks their availability and quality. |
Develop an architecture
The optimal solution should take into account: Geodistribution of infrastructure, Number and types of targets Number of system users and their roles The amount, frequency, and duration of input data storage.
When creating a monitoring system, the golden medical rule "Do not harm" is used. Sometimes data is collected using specialized agents installed on monitoring objects. It is important to control the additional load that agents have on the "supervised" system, their optimal configuration requires special attention, "emphasizes Alexei Hakobyan. - Some monitoring solutions have built-in overload protection mechanisms: agents are turned off if they begin to consume more resources than defined by the limit. |
Prepare Physical Infrastructure
An error in this can lead to the fact that the lion's share of efforts will then be devoted to optimizing the monitoring system, and not configuring its functionality.
In projects, we often deal with equipment that does not "know" how to give data for monitoring according to standard protocols. For example, outdated telephone exchanges or engineering equipment, "says Alexei Hakobyan. - This requires refinement and sometimes installation of additional equipment: physical converters or specialized controllers. |
Prepare data collection systems
The monitoring system should have extensive data collection capabilities: specialized agents, a standard set of interaction protocols, an open API. Particular attention should be paid to the type of database to store monitoring data. The database can be: classical relational; time series database (in fact, it is a response of vendors to the increasing volume of metrics processed); their combination.
The most important thing is to abandon the practice of "let's monitor everything, collect the maximum number of metrics, and then weed out the unnecessary," recommends Alexei Hakobyan. - It is better to initially think about what information from monitoring will be suitable, and what will only create noise. |
Present the service structure
It can be decomposed into components and implemented graphically - front-end, back-end applications, databases, etc.
Set Triggers
It is desirable to put triggers on each of the elements and see how the components affect each other. This will help you understand the relationship of service components, and the system itself will determine the importance of a particular event. For example, if a web server fell, but it is reserved and the service as a whole is not damaged, then it makes no sense to generate an incident with a high priority.
Choosing a Solution
Companies are often tempted to meet the challenge of monitoring their IT infrastructure by implementing a specific IT tool. But this does not guarantee the prevention of emergencies. There are unfortunately no universal tools to identify present and potential problems.
The choice of solution is affected by:
Monitoring tasks and business specifics Application Performance Monitoring (APM) solutions are suitable for companies that mainly use interactive customer interaction tools (portals, mobile applications) - it is important for them to find problems in multi-component applications at the transaction level and see the relationship between the state of the IT infrastructure and the performance of applications. The AIOps solution will be in demand if the company has already invested in monitoring systems and is now choked with a stream of uncorrelated events. It needs a single point of aggregation of all events and with the help of algorithm machine training, the system will quickly identify the main causes of accidents.
Business ownership structure
State and near-state companies, for which the topic of import substitution is relevant, choose solutions on open software platforms.
Monitoring system capabilities
Today, monitoring systems can be expected to: Constant increase in the number of metrics processed. Using dynamic baseline: it allows the system to accumulate data on particular metrics and determine for them normal behavior based on statistics, taking into account the seasonality factor.
The use of machine learning and artificial intelligence: monitoring systems "learn" to automatically build relationships between events, cut off unnecessary "noises," determine possible causes; as a result, the time to find the problem and its localization is significantly reduced. Predictive analytics: for example, if there is a constant increase in processor utilization, the prediction function will show that in a week disposal will reach 100%, and in this place we will get a problem - a proactive approach to monitoring prevents many incidents.
Budget But its size should not be a decisive factor.
It is more correct to consider several options, checking them on pilot projects and evaluating not the set of functionality, but the value for business, "recommends Alexei Hakobyan. - In my practice, there were cases when after such a "test drive" customers initiated projects with larger budgets than expected. |
Introduction
For successful implementation, it is important to:
Integrate monitoring system into company processes
It shall be synchronized with the equipment metering, incident management and associated with the relevant regulations. Weaknesses identified by the monitoring system IT Infrastructure should be recorded as incidents Service Desk in and worked out.
Split the project into small stages
When the implementation of complex projects is delayed, fatigue from waiting comes earlier than you can see the result. Therefore, it is better to target fast results - quick wins, having determined the completed stages of implementation.
Configure Notation
The monitoring system is obliged to address any problem, for example, in Telegram, using SMS, e-mail, etc.
Connect Visualization
Business is not interested in understanding massive reports, data should be presented informatively and aesthetically, in an intuitive interface.
Configure Web Interface
Fat customers lose: they are limited in their flexibility and functionality. And in general they look like atavism.
Constantly improve the system
Once implemented, all changes to the infrastructure must be consistently reflected in monitoring rules and schemes. Only with this approach will the company be protected from loss of money due to IT infrastructure failures.