Maintenance (M) of the systems of regular and reserve power supply of DPC
The regular supply of ultimate loads the net electric power is a problem of each data processing center (DPC). For achievement of this purpose it is necessary not only to create and execute well the project and also it is necessary to provide the correct maintenance (M) and to execute power supply system testing. Ideally, it should be executed without shutdown of ultimate loads or potential risk of emergence of a similar situation.
The directory of TAdviser of Data centers of Russia and technologies for data centers.
However, some administrators of DPC sometimes consider maintenance as unnecessary case and additional expenses. It especially began to be shown in present economic conditions where each item of expenditure is considered in terms of its possible reduction or better still, in general its liquidations. Nevertheless, periodic maintenance is necessary for maintenance of project level of reliability of data center and smooth operation of ultimate loads. Of course, for this purpose it is necessary to provide some level of redundancy which would allow to work date to the center in maintenance time (t.e in a chain of power supply. reliability level, at least, of tier 2 should be provided, and in certain cases tier 3 or tier 4) not to interrupt work.
The level of redundancy of an electric power supply system is higher, the probability that shutdown of the critical equipment will be required during routine maintenance is less. Nevertheless, it makes sense to install the excess equipment only if it is serviced properly. Often inadequate maintenance procedures and a human factor became the reasons of shutdown of power supply systems, even in DPC with levels of reliability tier 3 and tier 4.
On condition of presence of level of redundancy which would be sufficient for maintenance let's consider principal components and the advanced practice of maintenance of the systems of reserve power supply.
- Main input distribution cabinet
- Generators
- The switch for turning on of generators on parallel operation
- Automatic Input of a Reserve (AIR)
- Main distribution frame
- Board of a service bypass for the UPS
- Uninterruptible Power Supply Unit (UPSU)
- Accumulator modules or other power sources for the UPS
- Distribution of loadings
- Planning, instructions, training and control
Main input distribution cabinet
The main input distribution cabinet is the first element in a power supply system of DPC. In this place input of power supply lines of data center is performed. Though usually this cabinet is not touched during normal work, every three months or time in half a year, but at least, than once a year is recommended, to perform its visual inspection and to take temperature measurements, using the thermal imager or the infrared contactless thermometer to these purposes.
Reserve sources of power supply (DGU)
Operators of DPC recognize need for regular testing and maintenance of reserve electric power supplies as which, the diesel generator sets (DGS) are generally used. In some data centers outside Russia a system automatically once a week starts the standard check program DGU. It is important that the personnel were informed and was present at routine maintenance and testing of a reserve source of power supply. Practically any type of testing requires constant control. For example, it is not recommended to be engaged right after turning on of the generator and start of testing of operation of the transfer switch in another matters. Control of operation and working off of a system of back power supply that, in case of a problem, it was possible to switch over to a normal operation mode is necessary.
If just it is boring for someone to look at the working generator within half an hour or hour, it is possible to listen to operation of the generator for the purpose of identification of unusual noise and to examine the diesel generator set for the purpose of identification of leakages of oil, fuel and other liquids. It is necessary to remove indicators of tension and current and also to define the number of engine speed and frequency. Check and write indications of oil pressure sensors and temperature in the engine and also take temperature on certain sections of the generator using the infrared thermometer or the thermal imager. Record of these indicators can serve as initial material for the subsequent analysis. These indications can be used for identification of problems and simplification of routine inspection of critical or suspicious sections of installation. Maintenance, for example replacement of oil and filters, is carried out after DGU worked a quantity of hours and also during a certain interval. Frequency of maintenance inspections are specified by the producer of this equipment. In addition, once in half a year is recommended, at least, to check quality of diesel fuel.
The switch for turning on of generators in parallel operation
On large data processing centers with several generators the switch for inclusion of DGU in parallel operation is required. This additional component increases the level of complexity of standby system of power supply of DPC as the system of synchronization of switching of generators in parallel operation demands certain knowledge design of the similar systems and good qualification from contractors. It is very important to ensure the correct system operation of synchronization of generator units, and its regular testing and checks should coincide with maintenance inspection of DGU. If not all generators work synchronously, for example, have no identical speed and are not synchronized on phases, then you will not be able to connect loading to in parallel to the working DGU. The data center can be not connected to the system of back power supply even if all generators work, but are not synchronized.
Of course, some elements of this system of synchronization also log in, set on these generators, and as such should join in the program of maintenance inspection of this generator. As a rule, the generator, the transfer switch and the switch on parallel operation should be serviced by the same supplier. It is recommended to pay first of all attention to specific requirements of this system of synchronization, for example, support of other switch on parallel operation and also to perform regular visual inspections and to check temperature.
Automatic Input of a Reserve (AIR)
Unlike the majority of types of switches which, as a rule, remain in one provision during all the service life transfer switch switches are much more often used for inclusion, switching off and switching of reserve loading. Therefore it is necessary to monitor carefully contacts and to execute their timely service and replacement. Every time when the AVR switch makes switching of reserve loading, he, in effect, "eats" these contacts as a result of the sparking arising at inclusion and interruption of schemes of high tension. In most cases for survey and replacement of contacts it is necessary to disassemble the AVR switch.
It is also necessary to perform inspection of the electromechanical mechanism and to check its free running and also to delete dirt.
To make complete maintenance, the AVR switch needs to be deenergized. Also the AVR switch should have an internal or outside bypass jumper which will provide continuous power supply of loading in maintenance time. Not in all transfer switch switches there is this opportunity; in this case for service of the AVR switch it is necessary to disconnect power supply. For ensuring service of the AVR switch without interruption of power supply of loading its bypass jumper should join in initial project requirements. It is necessary to perform time in three or six months inspections of transfer switch switches, and times a year — maintenance.
Besides, some DPCs will have to work at a generating power supply during shunting of the UPS or the accumulator in order to avoid possible shutdown of the centralized power supply during maintenance as the UPS will not be able to provide power supply in turn-on time of generators and switching to reserve loading.
In addition to the listed above capital equipment, on large platforms with alternative power supply systems (2N or S+S) can be one or more section breakers. These automatic safety locks allow power supplies to switch to an alternative power supply system and to ensure simultaneous operation of both of these systems in maintenance time. As a rule, it becomes "on the run" (both systems are energized and should be cophased) to hold an ultimate load alive in switching time of the power source. In different places of an electrical system it is possible to put several section breakers, for example, before and after the AVR switch and even after the UPS, depending on type of redundancy of a system. It allows to shunt or disconnect individually different sections of an electrical circuit and at the same time to provide power supply of racks from both power sources. Nevertheless, to prevent disconnection of a system from power supply, it is extremely important that these breakers turned on only in a certain sequence and only the authorized employees with the high level of preparation. As a rule, section breakers to keep closed for prevention of emergence of this problem.
Main distribution frame
Having passed through the AVR switch, the electric power moves to the main distribution frame. As a rule, this board provides power supply of the UPS and conditioners and also lighting and other systems of DPC. As well as the distribution cabinet, it usually does not open during normal work and it is necessary to examine and check it visually temperature at least once a year.
Board of a service bypass for the UPS
On an input and the output of the UPS current passes through the module of a board of a service bypass, and then moves on an ultimate load therefore it is extremely important to perform visual inspections of modules and to check temperature. Sometimes in small data centers external modules of a board of a service bypass are not put for reduction of expenses on acquisition and installation of the UPS, or just because someone decided that time the UPS already have an internal bypass, they do not need to buy the external bypass module.
Unfortunately, it is quite widespread phenomenon in small DPCs, and can create serious problems if it is necessary to deenergize or replace the UPS. Also, in these small DPCs it is normal only one UPS therefore to them to have to disconnect an ultimate load from the system of regular power supply when it is necessary to perform works with the UPS.
In many cases modules of a board of a service bypass are selected for given the UPS and made and installed by the producer of this UPS. Modules of a board of a service bypass can be supplied with the blocking keys of the Kirk Key Interlocks system and can have a feedback with UPS control panels for prevention of misoperation of them. Usually on service maintenance of the UPS joins in the agreement also works on TO of a board of a service bypass. In order to avoid problems which can arise in case of need for safe shunting of the UPS written operation manuals of the module of a board of a service bypass should be given to the chief engineer of this IT platform.
Uninterruptible Power Supply Unit (UPSU)
Electrical connections of internal systems are checked, their visual inspection is performed and temperature readings are taken. The qualified factory service workers can carry out diagnostics of the UPS elements also. In certain cases the UPS can be transferred to the mode of an internal bypass, and for carrying out other tests or maintenance procedures it is required to deenergize the UPS and to transfer it to a bypass status by means of the module of a board service a bypass. Anyway, the ultimate load will be exposed to risk of emergency shutdown of main power supply if there is no reserve UPS. As it was told above, in some DPCs it will be necessary to turn on reserve generators during UPS service or maintenance of accumulator blocks to avoid a possibility of disconnection of an ultimate load from a power supply system. Physical maintenance, for example, cleaning of UPS fans and replacement or cleaning of air filters is executed. Usually it becomes time in half a year, but at least once a year.
Accumulator battery packs or other power sources for the UPS
In order that the UPS was able to provide power supply of an ultimate load from the moment of emergency shutdown of main power supply until inclusion of reserve electric power supply, the accumulated energy should be ready to immediate use. This energy is in most cases provided with group of rechargeable batteries.
Groups of accumulators need regular maintenance and surveys regarding detection of corrosion, leak and the difference of temperatures of separate elements. Each accumulator is consistently connected to each other by means of the connection cable, and each cable needs to be checked for quality of connection and lack of corrosion. In an accumulator cabinet tension on the bus permanent tension 480B forty 12-volt accumulators and, therefore, 80 jumpers which require survey. It is addition to periodic testing of voltage and internal resistance of rechargeable batteries, load tests are also carried out.
Remember that some data centers will have to work at back supply during a service bypass of the UPS and accumulators or testing of loading. Application of a reserve source is necessary to avoid danger of shutdown of main power supply when the UPS is not able to provide power supply.
In many big data centers there are specialized monitoring systems of accumulators which can monitor operation of the separate accumulator, and not just all group of accumulators in general. It is convenient for timely detection of signs of deterioration in a status of one accumulator. Keep in mind that one bad accumulator in the block can put at risk integrity of all group of accumulators. In data processing centers other types of accumulation of energy, for example, a flywheel or the so-called "rotor UPS" are also used. Maintenance of rotor UPSes has mainly mechanical character, control of bearings generally is required.
From all other elements of a power supply system accumulators most need maintenance, testing and timely replacement. Depending on accumulator type — VRLA, liquid or NiCad – it is necessary to execute testing time in three or six months. If for this purpose budgetary funds are not allocated, then this action is often postponed or ignored. Here It should be noted that, according to the statistics, fault of the accumulator is the most common cause of idle time after, of course, human factor.
Load test
The load test is usually carried out at initial commissioning of DPC. As a rule, it includes all critical sections of an electrical circuit which are described above. Nevertheless, as soon as the platform will be put into operation, will difficult execute a load test without disconnection from power supply if this object does not belong to 3 or 4 levels of reliability. There are different opinions on need of continuous load tests. Some specialists insist on need of regular load tests. In some big DPCs there are blocks of loadings, and they can be connected to key points of this electrical system previously.
Operators of other DPCs consider a load test optional, and in normal conditions, additional risk of shutdown of power supply and which should be seen off only if some equipment behaves suspiciously or recently was replacement. It first of all concerns small platforms with 1 and 2 levels of reliability in which it is necessary to rent blocks of loadings and to temporarily connect them to electric boards. Of course, in these cases the ultimate load should have an additional power source, and switches for shunting of the electric power without shutdown of this loading, or it will need to be disconnected for the period of a load test.
One of more discussed problems is operational testing of groups of accumulators, or directly, or at simultaneous connection of the block of loadings to the UPS because each complete discharge of the accumulator in real time reduces an operating cycle and capacity of elements. Even after a successful load test, one element the moment to fail next day, and in case of power supply shutdown, the ultimate load will be lost. The only way which allows to reduce this potential danger is use of several groups of accumulators.
Planning, instructions, training and control
There is no need to say that in this article only the high-level overview of problems of maintenance of standby systems of power supply of DPC is given. Real maintenance procedures are defined by recommendations and requirements of each producer and should be executed only by the qualified and prepared service personnel. Besides, key personnel of DPC, for example, shift supervisors, also should watch normal maintenance which is executed by third-party service companies and internal technicians, to monitor observance of the corresponding rules, instructions and procedures. The personnel should the be sign and is even able to perform some basic and abnormal procedures, for example, manual control by the equipment, start of the generator in the manual mode and also inclusion and switching off of the module of a service bypass of the UPS.
The personnel should have detailed written instructions on accomplishment of these procedures which in case of need need to be reviewed or updated. Equipment suppliers or service personnel should organize personnel training and also time in half a year or time a year to conduct courses of retraining. As capability of an inside personnel to pass and work in the manual mode, for example, to include an abnormal bypass, can prevent idle time in work of data center.
Besides, detailed written instructions on hardware maintenance and also control of regular personnel of their observance can prevent a complete stop of DPC. These instructions can be required if the new service personnel which still completely did not accustom with the equipment and the systems installed in DPC should execute maintenance. The action plan in emergency situations needs to be placed on visible and readily available places. In them there have to be signed images of switches of the equipment with the description of the sequence of transactions and also their use in case of emergency. Also it is necessary to place small, no more than one or two pages, the instruction for accomplishment of abnormal procedures nearby or on the module UPS where it is also possible to include information on manual control by the system of back and uninterrupted power supply.
The quality and frequency of maintenance depends on the size of DPC and its technical department. The organizations in which there is a big data processing center set in the certain building often locate much more skilled staff, than those companies at which DPC small to 100 square meters. The general culture and level of training of technicians very strongly differ. Also, as on accomplishment of the majority of maintenance procedures service contracts either with equipment manufacturers, or with one either more service organizations or subcontractors are signed, it is necessary that someone from the management of data center knew the schedule of works what work and whom is executed and also who controls this work type.
In each DPC there can be different types of the equipment and different requirements to maintenance, however, on all platforms it is necessary to hold preventive events which would not affect operation of the IT equipment. Some administrators try to avoid complete tests for overcoming failures and large-scale maintenance of the critical systems as they can potentially go "not to that direction". It simply removes risk in day of routine maintenance to unknown risk for 364 days.
Avoiding maintenance, the IT personnel put DPC at risk of idle time because of a number of timely not revealed work violations which are not noticeable during normal work, and can be shown only in case of shutdown of the main source of power supply. Proper preparation, planning, control and instructions for performing procedures of maintenance inspection and also support of the top management, are important very much in order that the normal planned action did not turn back unplanned idle time.
- Source of http://dcnt.ru