RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2011/07/21 10:35:24

Thrifty ITSM

Thrifty ITSM is a package of measures and the technical solutions intended for increase in efficiency of ITSM due to reduction of labor costs of personnel of Service Desk. Thrifty ITSM is an implementation of methodology of Lean Production (Lean Production) in relation to management of IT services. I will remind that Lean Production is the concept of management based on steady aspiration to elimination of all types of losses. According to the concept of Lean Production, everything that consumes resources, but does not create (does not add) consumer value, is losses and should be eliminated. Consumer value — capability of goods or service to satisfy to consumer expectations. Implementation of Thrifty ITSM allows to reduce consumption of the most expensive resource - time.

Content

Thrifty ITSM is a package of measures and the technical solutions intended for increase in efficiency of ITSM due to reduction of labor costs of personnel of Service Desk. Thrifty ITSM is an implementation of methodology of Lean Production (Lean Production) in relation to management of IT services. I will remind that Lean Production is the concept of management based on steady aspiration to elimination of all types of losses. According to the concept of Lean Production, everything that consumes resources, but does not create (does not add) consumer value, is losses and should be eliminated. Consumer value — capability of goods or service to satisfy to consumer expectations. Implementation of Thrifty ITSM allows to reduce consumption of the most expensive resource - time. The offered technique includes:

#Автоматическое creation of the Picture of the Incident (Red Button). The solution is intended for reduction of the labor costs connected with qualification of incidents. #Объединение information on incidents, with information on health of IT Infrastructure and performance of business applications. The solution is intended for reduction of the labor costs connected with diagnostics of incidents. #Мониторинг qualities of operation of applications by "eyes of users" (Fifth Level). The solution is intended for reduction of number of incidents and quality improvement of IT services.

Automatic creation of the Picture of the Incident (Red Button)

Automatic creation of the Picture of the Incident allows service Service Desk to obtain automatically information sufficient for fast qualification of incidents.

Usually, to announce service Service Desk an incident, the user needs to call by phone, to write the e-mail or to fill a web form. In each of these cases for qualification of an incident of the acquired information, as a rule, happens insufficiently. Therefore the operator of the first support line, having received the message about an incident, usually should ask the user a number of the specifying questions. Time spent for obtaining adequate answers depends on a set of factors, in particular, from communicative skills of the user.

Labor costs of the first support line can be significantly reduced if the first support line obtains information sufficient for qualification of an incident, at once, without need to ask additional questions. An example with such information is the Picture of the Incident which is automatically created using the solution the Red Button (see Figure 1).

Figure 1. Automatic transfer of the Picture of the Incident using the solution the Red Button

On computers of users software, allowing to announce simple combination key stroke (clicking of the Red Button) to IT Service failures in work of business applications is established special (errors, slow work, etc.). When clicking the Red Button the following information is automatically transferred to service Service Desk:

1. An incident as it is "seen" by the user:

  • Screenshot of the computer of the user at the time of clicking of the Red Button.
  • Category of an incident (is defined by a combination keyed) and, optionally, its description. The user can add the short description of an incident before clicking of the Red Button.

2. Information on the user and his environment:

  • Computer name and the account of the user (including the domain).
  • The name of division where the user works (it is imported from the Active Directory).
  • Variable environments (environment string) which can describe, for example, the geographic location of the user, category of personnel to which it belongs, etc.

3. Information on activity of the user:

  • The name of the business application with which the user at the time of clicking of the Red Button worked.
  • The name of business operation which the user executed at the time of clicking of the Red Button. For Windows applications business operation is determined by the text in heading of an active window. For Web applications – on URL. For console applications – in the text on the screen. For identification of business operation on the computer of the user the corresponding directory of business operations should be previously set.

4. The history of actions of the user in 15 minutes prior to clicking of the Red Button including information on the used applications and the executed business operations.

The picture of the Incident automatically is accepted by the Probe, automatically registers in the Consolidated database and is automatically displayed on the Engineering console. The probe, the Consolidated database and the Engineering console are Information Aggregator components (more detailed below) which is a part of products of the ProLAN SLA-ON family.

On Information Aggregator computer the Connector to Service Desk (Windows service) is also installed. The connector continuously scans the Consolidated database on emergence of information on new incidents. When such information appears, the Connector executes the following:

  • Defines the addressee of information on an incident (the corresponding support group). The addressee is defined on the basis of information on the business application and (or) business operation which at the time of clicking of the Red Button was executed by the user. I will remind that this information contains in the Picture of the Incident. To different business applications, and sometimes and to different business operations "in" one application, there can correspond different support groups.
  • Creates the Picture of the Incident in a format of the relevant Service Desk and automatically sends it to the necessary addressee. Now are supported by HP Service Manager and BMC Remedy.

Automatic creation of the Picture of the Incident allows to reduce significantly labor costs by qualification and routing of incidents and also to reduce the probability of their wrong qualification and routing. It increases efficiency of ITSM. For more details see. Red Button.

Consolidation of information on incidents, with information on health of IT Infrastructure and performance of business applications.

Consolidation of information on incidents, with information on health of IT Infrastructure and performance of business applications allows service Service Desk to define quickly the root reasons of failures in work of IT Infrastructure and business applications.

It is obvious that the efficiency of ITSM substantially depends on the speed with which the IT Service diagnoses incidents. The traditional scheme of the ITSM organization (see Fig. 2), does not promote fast diagnostics of incidents.

Figure 2. Traditional scheme of the ITSM organization

The reason is that incidents and information characterizing work of IT Infrastructure and business applications "are not connected" with each other. Incidents are registered in Service Desk, and metrics characterizing health of IT Infrastructure and performance of applications are measured using other systems. In that case, to diagnose an incident, it is usually necessary to make the following.

At first it is necessary to define time when there was a failure (incident). Then, knowing time, to analyze metrics of health of IT Infrastructure and performance of applications in this timepoint. As incidents are registered, as a rule, by phone, to determine the exact time can be difficult therefore it is necessary to analyze metrics on a big time frame. As there is a lot of data, it is possible "to see" in such a way the root reason not always. In that case the only exit is to try to reproduce failure. As a rule, it is very labor-consuming task. According to information of Network Instruments company, 59% from 592 network professionals polled by this company, spend not less than 25 days a year for reproduction of failures, and 71% of respondents spend 25 and more days a year also for determination of the reasons of failures which they reproduced, in more detail...

Figure 3. Scheme of Thrifty ITSM

Time of diagnostics of incidents can be reduced significantly if to follow the following rules:

  1. Регистрировать incidents using the Red Button.
  2. Сохранять information on incidents, health of IT Infrastructure and performance of applications in the uniform (consolidated) database.
  3. Анализировать incidents, metrics health of IT Infrastructure and metrics of performance of applications with "binding" to a uniform time line (to see values of metrics at the time of clicking of the Red Button).


The scheme of the ITSM organization providing support of the stated above rules is shown in Figure 3. A crucial element of the offered scheme - Information Aggregator. The aggregator includes: The probe, the Consolidated database, Artificial Intelligence, the Engineering console. The probe accepts Pictures of Incidents, collects information on health of IT Infrastructure and performance of business applications, and writes all acquired information in the Consolidated database. The artificial intelligence processes the consolidated information on the set rules, and after processing transfers in Service Desk.

In comparison with the traditional scheme of the ITSM organization, the offered scheme has two important advantages. First, Service Desk receives not the "crude", and processed information allowing quickly "make the diagnosis" to incidents. For example, disaster scale is visible at once (the single user, a user group of a certain region, a user group of a certain business application, etc.), at once it is visible that could lead to emergence of an incident (what actions of the user preceded an incident) and many other things. The second advantage, - an opportunity to quickly define causes and effect relationships between incidents, on the one hand, both health of IT Infrastructure and performance of business applications, on the other hand. For the solution of this task the Engineering console allowing to display all obtained information with "binding" to a uniform time line is used. It is very efficient method of determination of the root reasons of incidents (root cause analysis). Read in article in more detail: "The red Button – diagnostics of incidents in a new way".

Monitoring of quality of operation of applications by "eyes of users"

Monitoring of quality of operation of applications by "eyes of users" allows to detect failures in operation of applications to the appeal of users to Service Desk and, thus, it is essential to reduce number of incidents and to increase quality of IT services.

Monitoring of quality of operation of applications by "eyes of users" (further - Monitoring of Quality of Experience, QoE) is a receiving in real time information on quality of work of real business applications at real users. A main objective of such monitoring is detection of failures in work of IT Infrastructure and business applications to the appeal of users to Service Desk. In other words, to learn about quality of work of IT services not on number of appeals to Service Desk, and irrespective of it. Monitoring of QoE allows to lower number of appeals to Service Desk (read - labor costs of personnel of IT Service) and to increase loyalty of users of IT services.

Example of the solution intended for QoE Monitoring is the solution the Fifth Level of ProLAN company. Having implemented this solution, the IT Service will be able automatically to obtain the following information:

  1. Время reactions of business applications, measured on user side (E2E RT, End-to-End Response Time). This time from the moment when the user, working in the business application, requests accomplishment of a certain function till the moment when this function is executed (an example of function, - search of goods in the reference book).
  2. Количество the transactions executed by the user with separation into successfully executed transactions, compulsorily complete transactions on some event (for example, to an error, a timeout, etc.), the transactions completed successfully, but executed with violation of the standard sequence of actions.
  3. Количество errors at accomplishment of transactions with separation on system and user. System mistakes are caused by failures in work of IT Infrastructure or errors in the code of applications. User – result of the wrong actions of users.
  4. Значения APDEX (Application Performance Index). APDEX is the result of statistical processing of the measured E2E RT values characterizing satisfaction of users with performance of business applications.


For measurement of QoE the Client Instrumentation method and the program (Windows service) is used: EPM Agent. The program is installed on computers of users; if users work in the terminal mode, then on a terminal server. The principle of work the EPM Agent consists in automatic tracking of all processes and the related events executed on the computer of the user.

The fifth Level has several advantages in comparison with similar solutions of the western companies. First, - high profitability. The cost of the Fifth Level at least 10 times less the cost of the western analogs. Secondly, - support of business applications of the Russian production. The western products, as a rule, "do not understand" the Russian business applications. The third advantage - transparent integration with the systems of network management. Very few western products are able to do it today. All this, certainly, is very important, but there are two more advantages about which it is necessary to tell especially. It is support by an EPM Protocol agent of SNMP and transparent integration of the Fifth Level into the solution the Red Button.

Support by an EPM Protocol agent of SNMP

Support by an EPM Protocol agent of SNMP is a possibility of transfer by the EPM Agent of all metrics measured by it using the SNMP protocol, see Figure 4.

Figure 4. Support by an EPM Protocol agent of SNMP

Support by an EPM Protocol agent of SNMP means a possibility of use as QoE Management console practically of any system of network management since practically any system of network management supports SNMP today.

Integration of the Fifth Level into the Red Button

Integration of the Fifth Level into the Red Button provides a possibility of automatic transfer to Service Desk of Pictures of the Incident at the initiative of the EPM Agent. In other words, in this case the Red Button is "pressed" not by the user, but the EPM Agent, see Figure 5.

Figure 5. Automatic transfer of the Picture of the Incident at the initiative of the EPM Agent

Let's assume, the business application on the computer of the user gave the error message. The EPM Agent set on the user's computer automatically identifies the arisen error and automatically informs the Red Button (HelpMe application) on this event. At the same time the EPM Agent will report to the Red Button the detailed description of this event. Having obtained information on an event, the Red Button will make the screenshot (if for this event it is authorized), will read variable environments, will define the executed business operation, etc. then will transfer all acquired information to Information Aggregator from where she will be automatically given to Service Desk.

Integration of the Fifth Level into the Red Button provides essentially new level of Monitoring of QoE. The novelty consists in the following:

  1. ИТ-Служба automatically obtains information on failures in work of business applications, without attracting for this purpose users. For example, the user of the IT service, having obtained information on a critical error, will always be sure that the corresponding incident is already registered in Service Desk, and he can "quietly go to have tea". At the same time the staff of Service Desk does not need to ask the user any additional questions since they see everything that is seen by the user, and it is even more.
  2. ИТ-Служба automatically obtains information not only about failure, but also and automatically to process the additional information allowing to qualify this failure. This information includes, in particular, data on what the user did at the time of failure that it did before failure emergence what is set to software on its computer, who this user, etc. It allows to automate processing of incidents and to create the expert system diagnosing crucial events automatically.