Баннер в шапке 1
Баннер в шапке 2

HSE: HPC TaskMaster System for monitoring the effectiveness of tasks on a supercomputer

Developers: Higher School of Economics (HSE)
Last Release Date: 2023/09/12
Technology: Office applications

2023: Inclusion of HPC TaskMaster in the register of Russian software

The system for monitoring the effectiveness of tasks on the HPC TaskMaster supercomputer developed at the Higher School of Economics is included in the register of Russian software. For this, a large set of preparatory work was carried out. Having proved the effectiveness of the system, Vyshka is ready to distribute it to other universities and companies. This was announced on September 12, 2023 at the Higher School of Economics.

Since the advent of supercomputers, the most important task is to ensure the efficiency of their use. For large computing clusters, their own monitoring systems are being developed. The supercomputer "cHARISMa," which is used by employees, teachers and students of the Higher School of Economics to conduct scientific research, was no exception.

The domestic HPC TaskMaster software system was developed and implemented in 2022. It helps supercomputer users perform calculations more efficiently: it provides informative reports on completed tasks, indicates errors, gives recommendations for improving efficiency.

Our system itself can recommend to supercomputer users what needs to be improved in their computing tasks, and in case of serious errors, even stop incorrectly running calculations of one user in favor of launching effective tasks of another. This approach allows you to prevent equipment downtime, increases the number of simultaneous studies,
said Pavel Kostenetsky, head of the department of supercomputer modeling at the Higher School of Economics.

In addition, the system allows you to more efficiently use the resources of the supercomputer as a whole, saving expensive machine time. According to statistics, in the first half of 2023, the system made it possible to increase the effective performance of the supercomputer by 20.5%.

To be included register in the program, it must meet many legislative and technical requirements. Therefore, the specialists had to carry out a fairly extensive set of work on revising the used technologies and detailed documentation of the developed system. For example, I had to abandon use as a basic operating system Linux Ubuntu due to license restrictions. The code was improved safety , a set was formed test to data demonstrate the work to the expert, options for deploying the system both in and in dockercontainer the form of an image were developed. virtual machine

Having proven the effectiveness of the system in our country, we are ready to distribute it to other universities and companies. Thus, we want to share our experience and increase the efficiency of interested scientific and production teams. The inclusion of HPC TaskMaster in the Registry Russian software is a significant step in this direction. Adapting the software and its description to the requirements of the register is a laborious process, but it shows that the product meets all the requirements of Russian legislation and is ready for use outside the Tower,
Dmitry Bondar, Senior Director for Digital Transformation at the Higher School of Economics.