RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
Project

ICD implemented a data catalog

Customers: Moscow Credit Bank (MKB)

Moscow; Financial Services, Investments and Auditing

Product: OpenMetadata
Second product: Kubernetes
Third product: PostgreSQL DBMS

Project date: 2023/04  - 2023/09

Content

2023: Implementation of the OpenMetadata data catalog

MKB (Moscow Credit Bank) implemented directory data c. open source OpenMetadata Its task is to streamline work with, data make it more operational, and the data themselves - better, representatives of the ICD said on November 16, 2023. The implementation of the solution, according to experts, bank saves about a third of the working time of analysts.

For the bank, data of a very different nature is the basis for making decisions, including managerial, creating recommendation systems, and, of course, scoring. And if this data is not of sufficient quality (the main criteria here are correctness, relevance and completeness), then decisions can be incorrect, as well as late - if it takes too much time to find the necessary data (up to 80% of the working time of analysts). These two problems become more and more serious as the business develops: the volume of data is growing, and therefore finding the right information becomes a difficult task.

In addition, the fact that data information - metadata - was stored in a bank in a separate way is added: in Confluence, Jira and other spreadsheets. And analysts working with certain data can find promptly necessary information. Therefore, the departure of such an IT specialist can lead to a partial loss of expertise in a certain data segment.

What will help the catalog?

According to the ICD, the introduction of a data catalog will help increase the level of trust in them, simplify the process of finding the necessary data, and also save from the so-called orphaned data, for the quality and condition of which no one is responsible.

Another important factor is data security. There is no one hundred percent guarantee against data leaks and losses, but it is necessary to minimize the risks, for which the data must be ranked by criticality, know where and how they are stored and by what means their security is ensured, which employees have access to them, with what level of privilege, and so on. Also, in the event of an unwanted incident, if there is a directory, it will be clear which data blocks were compromised, which will facilitate the elimination of consequences.

How to choose a solution

Under current conditions, when solutions from large foreign manufacturers software () ON are not available, you can create a data catalog in the bank in two ways - either develop it yourself, or use the ready-made open source solution. Self-development is a very costly process in terms of time and. In finance the case of an already ready-made open source solution, the implementation process itself is associated with difficulties integration , with already working information systems of the bank, because the documentation is often not detailed enough, and experienced implementers with the necessary expertise on the market may simply not be.

As a result, the OpenMetadata data cataloging system was selected. Any data catalog must be able to connect to source systems and read their meta information. These are data data - tables, the structure of tables where these tables lie, the name of the database, the name of the schema, the name of the table. In perfect condition - also comments. OpenMetadata can do this.

The system can receive metadata not only from databases, but also from systems for working with streaming data Apache Kafka, Apache Airflow, BI systems.

Implementation and Complexity

The implementation process of the system took two months and consisted of test and full-scale "combat" implementations. Initially, for security reasons, it was decided to deploy OpenMetadata not as a "test site," in the development loop. As a platform for the operation of the OpenMetadata directory, they used Kubernetes it as a platform DBMS for storing metadata. PostgreSQL This was a feature of this case of implementing such a system - the basic catalog documentation was compiled to work with. MySQL In addition to OpenMetadata itself, for the normal operation of the catalog, ICD also deployed systems such as Apache Airflow (for reading metadata from sources) and. search engine Elasticsearch

The main difficulty is the introduction of a data catalog into the bank's processes, since a restructuring of data management and development processes is required. The work uses heterogeneous systems. For example, the statement of work is developed in Confluence, but after its implementation by the analyst, the built table is also described in the data catalog.

Each data dimension must be given time to search, mark up, and describe its data in the catalog. Instilling a culture of data management and forming the necessary habits is a key task.

Results and efficacy

The catalog has already been deployed and is fully operated in the ICD. 8 main information systems of the bank are connected to it: corporate data storage, CFT-Bank system, CRM, etc. The process of describing and marking the data (filling the data catalog with information) continues.

According to the ICD, compared to the previous format for organizing work with data, the catalog has already demonstrated its advantages: the working time savings of analysts are already 32% (the work on those data blocks that are already described in the catalog was studied).

What's next

File:Aquote1.png
The ICD has big plans for the data catalog. You plan to integrate the data catalog with the Data Quality Control (DQ) tool. And of course, the topic of introducing a culture of data management and the formation of the necessary habits of working with them among bank specialists continues to remain relevant, - representatives of the ICD shared.
File:Aquote2.png