Developers: | DataCatalog |
Date of the premiere of the system: | 2022 |
Last Release Date: | 2024/05/16 |
Branches: | Information Technology |
Technology: | Big Data |
Content |
Main article: Big Data
Arenadata Catalog is a tool for organizing data management within Data Governance.
2024
Developing a connector for integration with Picodata DBMS
The DataCatalog team (part of the Arenadata Group) has completed the development of a connector that provides compatibility between the Arenadata Catalog (ADC) product and the Picodata DBMS. Arenadata announced this on November 26, 2024. Read more here.
Compatibility with "MDM Harmony"
and the company Navicon Datakatalog"" (part of the Group) Arenadata on August 28, 2024 announced the completion of the testings Arenadata ON Catalog (ADC) and regulatory reference information management system. " MDM Harmony Integration two solutions will allow Russian business customers to use these products as part of the construction of complex IT systems for. data management
The integration of Arenadata Catalog with the NSI and Master Control System data - Harmony MDM was tested as part of joint tests carried out on a specially deployed stand. Arenadata Catalog users can now be assured of the cleanliness, relevance and consistency of the company's master data. This will ensure high speed and accuracy of analytics and decisions based on it.
Metadata and master data management are traditionally closely intertwined, providing an integrated approach to data organization and quality. "MDM Harmony" focuses on providing uniform, accurate and up-to-date data such as customer, product and supplier information. Integration with the Arenadata Catalog metadata management system allows you to track and manage data at the metadata level, such as the origin of data, its structure and relationships. This helps identify and resolve inconsistencies, duplicate records, and other issues, which ultimately improves overall data quality and consistency. With a unified and integrated approach to data management, management and analysts receive more complete and up-to-date information for decision-making. This contributes to effective planning, strategic analysis and prompt response to changes in the business environment, - said Ivan Novoselov, CEO of DataCatalog ("DataCatalog"). |
Arenadata Catalog is a popular product among large Russian business customers, the demand for which is constantly growing. The compatibility of our solutions will open up new prospects for market participants and make it possible to more effectively solve problems related to managing large data flows, "commented Maria Averina, Director of Strategic Development at Navicon. |
Apache Impala Compatibility
On May 16, 2024, Arenadata announced that DataCatalog (part of the Arenadata Group) had tested a connector that provides compatibility between the Arenadata Catalog (ADC) product and the Apache Impala service, which is part of the Arenadata Hadoop (ADH) enterprise distribution.
According to the company, the connector allows you to import Impala object descriptions into the catalog, profile data, and configure custom data quality checks in Impala. This is not the first module to provide integration with the Hadoop ecosystem, previously customers were presented with a connector for the Hive service.
The Hadoop ecosystem is a de facto standard in business scenarios related to the storage, processing and analysis of large amounts of arbitrary data types. The steady demand for systems of this class is supported by the trend for digitalization and the growth of unstructured data and the number of related projects.
Responding to customers' need for high-performance analysis of big data stored on systems deployed on Arenadata Hadoop, Arenadata included Apache Impala, a distributed SQL query execution service, in the next update. It is designed for massively parallel processing (MPR) of ultra-large amounts of data.
Impala is designed as a faster and more efficient mechanism for executing SQL queries compared to traditional SQL-on-Hadoop (Hive, Spark SQL) components. Service support optimized product performance for a number of business scenarios, including the so-called data sandboxes for ad hoc processing by information analysts.
A number of Arenadata customers took advantage of the ability to speed up SQL processing and data analysis by using Impala instead of Hive in the data lakes. Unfortunately, the lack of support for this service in Arenadata Catalog deterred some of them from switching the load to Impala in the industrial circuit. The operational development and delivery of the metadata connector ensured the continuity of metadata tracking in systems and eliminated this obstacle. |
The metadata of the objects of the integrated systems is the basis of the data catalog. The integration of Impala object metadata allows Arenadata Catalog users to get an up-to-date and complete view of service objects to include in the lineage graph, explore links to objects of other source systems, and link to the business entities of the organization involved. The Arenadata Catalog administrator can supplement the automatically collected Impala metadata with an extended description, accompanied by custom attributes. Just like the rest of the objects in Arenadata Catalog, Impala service objects can have an owner and be classified according to the level of business criticality.
The technological landscape data stores the Russian of enterprises is characterized by complexity and fragmentation. In the past, corporate products of foreign vendors were widely used to build QCD; as of May 2024, solutions based on open source are being developed and implemented. In the software long term domestically produced software , it will take preferential positions. It is for this reason that Arenadata Catalog regularly expands the list of connectors to popular data sources and platforms, regardless of their type, developing them independently. noted Ivan Novosyolov, CEO of DataCatalog |
User quality checks and automatic collection of data profiling metrics are configured for Impala data integrated into the catalog. For example, you can test for duplicate values in a database table or a non-zero value in a column. Based on the results of inspections, a final report on the quality of data is generated. For Apache Impala, it is possible to form a visual origin (Data Lineage) between tables and views, including a generational lineage. Now, looking at the analytical report, you can track the path of data transformation between systems: which attributes of which tables of which database transmitted the information, how in turn they received it, which other information systems are involved.
Arenadata Hadoop (ADH) is an Apache Hadoop-based enterprise distribution for storing and processing semi-structured and unstructured data.
Tasks to be solved:
- Storage and processing of large volumes of semi-structured and unstructured data of any type (document and content management systems, event storage and recording, sensor data, product catalogs, backup of other DBMS).
- Distributed information processing.
- Construction of lakes and data factories (a single center for all company data, quick deployment and folding of sandboxes for pilot projects and testing statistical hypotheses, working with analytical tools in a single environment).
- Machine learning and artificial intelligence.
- Data source for QCD.
- Import substitution of Western systems.
Arenadata Hadoop has received a certificate of state registration of the computer program. The product is included in the unified register of Russian programs for electronic computers and databases.
2023
Arenadata Catalog 0.3 Release with Enhanced Glossary Capabilities
DateCatalog announced on June 20, 2023 the release of Arenadata Catalog 0.3, the next version of the data management tool. The Arenadata Catalog software is intended for organizations wishing to implement Data Governance practices, and allows you to solve the problems of managing the company's information assets and maintaining the corporate business glossary in a single interface. The most significant improvements in this version relate to the Glossary module. The updated functionality will allow users to expand the list of term types, develop an attribute register and perform full-text searches.
In Arenadata Catalog 0.3, the developers significantly expanded the capabilities of the Glossary and added predefined "boxed" types of terms: "business term," "entity," "data attribute," "calculated data attribute" and "indicator." Each type of term has its own set of attributes that users can extend. For the types "entity," "attribute," "calculated attribute" and "indicator," there are special types of "entity-attribute" relationship.
Thanks to the innovations, users will be able to add their own types of terms, and a special constructor will help manage the set of attributes, their order and the obligation to fill in.
In the attribute register, they are fully managed: specifying validation, filling instructions, selecting the number of values, and specifying the default value. Maintaining such a registry allows you to reuse attributes in different types of terms.
Thanks to the functionality of importing data into Glossary, the introduction of software into commercial operation is accelerated, the functionality allows not only to create terms, but also to update existing ones.
This version of Arenadata Catalog implements full-text Glossary search: it can be used to find both data catalog objects and objects of the Glossary itself. Users can also subscribe to Glossary objects such as "terms," "subject areas," and "glossaries," which will allow you to track changes that occur in metadata catalog objects through notifications.
The developers have added a user task management interface. For the administrator, it is available to monitor the execution dates, search for tasks without a performer and the ability to delegate the task to other users if the person responsible for coordination is not available.
In addition to the following, in Arenadata Catalog 0.3:
- revised; adapter Greenplum
- a connector for Luxms BI is enabled with the ability to create an automatic Data Lineage before the column of the data source table;
- blocking the publication of terms if it is impossible to determine those responsible for user coordination;
- It is possible to add the status of the task due date: "At risk," "Expired," "Norm";
- added an updated algorithm for generating the name of user tasks. User tasks now contain: "Task Type," "Event Type," "Term Name";
- for terms, a link with a link type is available.
This is an expected release both among our customers already implementing Arenadata Catalog and among companies conducting pilot projects. The main functionality of Arenadata Catalog 0.3 is focused on building a comprehensive and flexible conceptual data model that allows business and IT to build a single "Glossary" for communication and description of data. We see the demand for this functionality among customers and the need for flexible support from us for various options for implementing data management processes in companies, commented Ivan Novosyolov, CEO of DataCatalog.
|
Very often we heard from customers wishes to customize the "Glossary" for their special requirements. Moreover, even at the stage of the birth of Arenadata Catalog, we drew attention to the rather scarce capabilities of the tools on the market for setting objects and the composition of the attributes of the Glossary. And in most Open Source tools they are completely absent. Therefore, we decided to make this functionality one of the main features of Arenadata Catalog and worked for a long time to ensure its maximum versatility and convenience. Now users will be able to create attributes of various types, ranging from standard "string," "number" to such specific ones as "calculation formula," "logical value," noted Rasil Saifullin, owner of Arenadata Catalog, DataCatalog company.
|
We add that for each attribute you can specify different settings for valid values, prompts and instructions for filling. This makes it possible to flexibly implement almost any requirements for the creation of the Glossary, taking into account the individual aspects and nuances of each industry. With extensive options for setting tolerances, you can reduce errors and improve the accuracy of information management, increasing the trust and frequency of use of the tool among business users.
Arenadata Catalog Capabilities
According to information for March 2023, Arenadata Catalog allows:
- Integrate metadata from different data processing and analysis systems
- Search for data and collaborate with metadata
- maintain an enterprise business glossary and ensure its integration with the data catalog.
Arenadata Catalog is based on open source technologies, fully adapted for use in Russian commercial and government organizations, and includes the Unified Register of Russian Software.
2022
In 2022, Arenadata, a supplier of the big data management platform and Luxms Group of Companies, a supplier of BI and ETL systems (Luxms BI and Luxms Data Boring), joined forces to ensure the efficient use of data by Russian companies and organizations in their activities.
The joint venture Datakatalog"" creates a product to support processes Data Governance - Arenadata Catalog.
The basis of the company's strategy is the creation of an open source software product for the needs of the largest companies in Russia implementing Data Governance approaches:
- support for metadata integration, including Russian and open-source software
- architecture based on open metadata exchange standards
- focus on user experience and usability
- automatic detection of data subject to regulation in Russia (TIN, addresses, etc.)