Developers: | UniData |
Last Release Date: | 2022/11/17 |
Technology: | Data Quality, MDM - Master Data Management |
Content |
The Unidata data management platform is the flagship product of Unidata, based on the modern technology stack of free software (SPO). The transition to a product platform concept that supports the ability to expand basic logic and use adaptive modules has created one of the most powerful and effective global data management solutions in the world.
The unified product roadmap is based on current industry trends in data management (Data Governance), data quality assurance (DQ) and regulatory reference information (NSI ), as well as new technologies and methods for processing large amounts of data. In the process of software development and testing, modern approaches are used, which are developed and improved every day.
In general, Unidata is a multifunctional platform for building enterprise data management systems based on it, providing:
- Centralized data collection (inventory and resource accounting)
- standardization of information (normalization and enrichment),
- taking into account current and historical information (control of recording versions, periods of data relevance),
- data quality and statistics maintenance.
The range of solutions for different needs is very wide: this is work with contractors, and work with personnel, and material and technical support, and work with the nomenclature, and the creation of client bases.
The scope is also very extensive: public administration, transport, industry, finance, power, retail and wholesale of goods and services, oil and gas transport, pharmaceuticals.
History of development
2022: Compatible with Libercat and Astra Linux
Unidata announced on November 17, 2022 that it had completed compatibility tests between the Unidata platform and a product based on it, in particular Unidata MDM, with the Java Development Kit - Axiom JDK (developed by BellSoft), the Libercat application server (developed by BellSoft) and the Astra Linux CE 2.12.44 operating system (Eagle).
Axion JDK is a progressive JAVA development and execution environment with Russian technical support for commercial and state-owned companies. Libercat - Java EE Application Server. Astra Linux is an operating system based on the Linux kernel, which is being implemented in Russia as an alternative to Microsoft Windows and other foreign Linux distributions.
Unidata is a data management platform based on the current free software technology stack (SSW). Unidata is a multifunctional platform for building enterprise data management systems, providing their centralized collection (inventory and accounting of resources), standardization of information (normalization and enrichment), accounting for current and historical information (control of recording versions, periods of data relevance), data quality and statistics. The product provides a wide range of information management capabilities to obtain up-to-date and reliable data.
Among the main functions of the Unidata platform are centralized data collection, search and consolidation of duplicates, data analysis and statistics, standardization and quality assurance of data, their upload to third-party information systems, data management based on internal regulations of the enterprise, granting rights to act with data. The transition to a product platform concept that supports the ability to expand basic logic and use adaptive modules has created powerful and efficient data management solutions in the world. The platform is included in the Register of Russian programs for electronic computers and databases of the Ministry of Telecom and Mass Communications of Russia (Certificate No. 2016610744).
This series of joint tests was conceived by Unidata together with technology partners to support customer import substitution efforts in accordance with the company's overall strategy. Note that obtaining a certificate was a logical consequence of the course taken by Unidata for partnership with Russian IT companies.
The tests strongly demonstrated the full compatibility of the company's product lines in the absence of any decrease in performance. The completion of joint tests allows you to reduce the risks of customers for "patchwork" import substitution and switch entirely to the import-substituted stack in terms of the tasks of managing the main data of the enterprise and managing data.
2020
The inclusion of "Unidata" in the "magic quadrant" of Gartner for the third year in a row
In early December 2021, the authoritative international analytical agency Gartner published the Magic Quadrant for Master Data Management Solutions, a study reviewing the world's leading Master Data Management solutions. "UniData" entered the list of the best players in the data management market for the third year in a row! Read more here.
Extending ETL Functionality for Data Retrieval Tasks from Multiple Sources
On May 18, 2020, Unidata announced that the platform of the same name has received a significant expansion of ETL functionality, which allows you to effectively solve problems related to obtaining data from various sources. We are talking about the classic chain: data loading, validation (validation), mapping (determination of compliance), data aggregation and unloading to the target system.
"This approach allows you to create a single ecosystem for working with our customers' data. We have documented our developments in this area as a separate module. The functionality itself was presented a long time ago, but in the presented extension we not only introduced it to the platform, but also enriched it with analysis and many other capabilities. As a result, we designed a seamless system for working with data both at the stage of their acquisition and at the stage of their processing. All this allows our customers to work in the platform as in the Unified Data Management System and use the unified tools for working at all stages. At the same time, the greatest effect is achieved by synergy when combining ETL with DG and MDM, " |
According to the developer, the following ETL functionality was added to the platform: impact analysis of changes (analysis of the consequences of data changes), version control of models and maintenance of an extensive data validation library.
The developers are not going to stop there: as of May 2020, work has already begun on the creation of a streaming ETL with Data Quality mode.
Unidata Platform Editors
According to January 2020, there are several editions of the Unidata platform, including a protected, certified FSTEC and a publication for processing large amounts of data (Unidata High Performance Edition). In addition, there is a Data Governance (DG) product based on the platform. It is a set of tools that allows you to connect and track all user actions related to data through all levels of enterprises of any scale: from the annual reports of the company's management to the column in the table of the regional division.
In addition, as part of participation "UniData" together with universities Russia in the formation of the Community of Specialists in the field, data management an open edition of the platform was released - "Unidata Community Edition (CE)."
2019
Compatibility with Alt 8 SP OS
The IVK Group of Companies, a Russian manufacturer of computer equipment, system-forming software and information protection tools, announced on November 29, 2019 that it was increasing the ecosystem of Russian application software compatible with the Alt 8 SP operating system. Based on the test results, full compatibility with UniData was confirmed. Read more here.
Alt Server and Alt Workstation Compatibility
On November 15, 2019, Unidata announced that it had tested the Unidata data management platform in the Alt family of operating systems together with BASEALT.
Confirmed full compatibility and correctness of the server part of the Unidata platform on the Alt Server 8 operating system, as well as the Unidata platform web client on the Alt Workstation 8 and Alt Workstation K 8 operating systems, which allows recommending platforms with the Alt family operating systems.
Unidata. Generation 5
Unidata introduced the Fifth Generation of the eponymous data management platform in June 2019. Taking the main course to increase product maturity and use new technologies, the developer presented the main innovation - high-quality systemic performance growth on large volumes of data. This release lays the foundation for the Unidata product line in Data Governance. The most important new feature was the introduction of Data Lineage, an accessibility for researching and documenting data flows into storage. Another key update is the global refinement of classifiers (Classifiers 2.0). In particular, the functionality of quality rules for attributes of the classifier node is implemented. We have radically changed the possibilities for working together - drafts and a history of changes have appeared. On the attributes of the classifier, it became possible to create quality rules, by analogy with data records and much more.
We absolutely clearly see and consciously form the new niche of Information Governance, which combines both Enterprise Data Management and the traditional well-known niche of Collaboration Tools. - explains the general director of the company "Unidata" Sergei Kuznetsov. I note that we have brought the platform to a fundamentally new highest technical level! During this year, our clients expect the release of new versions of the platform that develop this topic and adapt new directions to the realities of the domestic software market. |
Significant changes also affected many other parts of the platform - the migration mechanism has been significantly improved with the prerequisite for backward compatibility, the classifiers have introduced copying and transferring nodes, checking changes, and methods of encoding nodes. In addition, the user interface convenience has been improved and the licensing service has been updated.
2018: Updated to version 4.7
Unidata, a Russian software developer, announced in February 2018 that it was upgrading the Unidata platform to version 4.7.
According to the developers, the update includes a number of voluminous changes. The functionality of importing data has been expanded. The data can now be downloaded not only by the full archive, but also partially. For example, only links can be loaded. The import task now supports two modes of operation, one of which is optimal for large volumes of data (from 3-5 million records), and the second for average volumes.
The algorithms for working with data have also changed. The main essence of the platform - the data model - is now edited in draft mode, which allows you to make many preliminary changes in safe mode, and then publish them. When comparing records, information about the previous attribute value is available, making it easier to resolve record conflicts. It became possible to automatically identify and merge duplicate records. You can split the merged records as needed.
Integration capabilities allow you to customize the user interface for the needs of the client and apply expansion points to search operations and links. Integration with external access delimitation systems has also been expanded, which allows you to use non-standard authorization methods, for example, using single sign-on technology.
For easy administration, the platform configuration settings have become available in the application. Among other things, many fixes were aimed at simplifying and optimizing user interaction with the interface.
2017
Upgrade to 4.7 High Performance Edition
Unidata, a Russian software developer, announced on December 15, 2017 the release of the revision of the platform of the same name - Unidata 4.7 HPE (High Performance Edition).
This edition is designed to build data management systems for large corporations and enterprises with high requirements both in terms of data volume (from 100 million records and above) and in terms of speed of work with them. The presented version of the platform includes additional modules for monitoring, performance of individual components and the entire solution itself, data processing speed in batch and online modes. Added modules for informing the administrator about deviations of current metrics from target ones, specialized batch processing modules designed for large amounts of data, as well as detailed instructions for building and maintaining high-load master data management systems.
During the start of the development of the UniData platform, we conducted a large amount of research on the modern landscape of MDM solutions at that time and, most importantly, the expected business requirements in the horizon of 2, 5 and 10 years, - explained the chief architect of the platform Alexey Tsyryulnikov. - It became clear that existing approaches are exhausting themselves, and the future lies with hybrid techniques - a combination of proven modern technologies such as NoSQL, search indexes, horizontal scaling of stateless architecture, document-oriented and graph databases, in-memory computing and traditional relational DBMSs. Thus, we needed to offset the shortcomings of each of the technologies with the advantages of other approaches and technologies. This approach was widely used in the development of the platform and fully justified itself, which was clearly demonstrated in the summer of 2017 on release 4.5, when the platform was successfully tested on 1 billion records. |
With release 4.5, the right approach to supporting high-load systems was clearly demonstrated, "said Sergei Kuznetsov, General Director of Unidata. - But, as you know, reaching record levels is not a sufficient condition and proof of applicability for industrial use. And we set ourselves the task of creating a special edition of our platform for building high-load systems based on it, which has a number of advanced and deeper settings that are not needed on small amounts of data up to hundreds of millions of records. |
To do this, the company's specialists analyzed and divided the typical tasks that arise when servicing the platform up to 100 million records and from 100 million. Typical scenarios arising from a large amount of input data were also identified. The result was a special edition and methodology for implementing and operating the platform with large amounts of data. In parallel with the release of the editorial office, the Training Center of the company launched a training course "Unidata Platform" in high-load projects. "
Streaming and Online Data Quality Management Modes
Unidata, the manufacturer of the eponymous data management platform, on August 24, 2017 announced the implementation in the platform of functions of uniform filtering, cleaning and normalization of data coming from different sources, checking data for their compliance with the specified criteria, enriching data from internal and external sources.
In particular, it became possible to process records received through SOAP and REST requests from third-party information systems, followed by the application of data quality rules and sending processed records. At the same time, the records themselves are not saved in the platform. Such capabilities allow third-party information systems of the enterprise to perform data verification in real time, check tens of millions of records in batch mode on a schedule, and track the history of changes in record quality over time.
In addition, there are two processing modes at once - streaming and online. In streaming mode, a large number of records are processed at a time for a data validation request. Accordingly, the processing result contains corrected entries and a list of errors (if any). At the same time, information about the found errors is stored in the platform database with the possibility of subsequently obtaining this information by the record identifier. In turn, in online mode, a synchronous response occurs to a request to check one record. The result of processing contains a corrected record and, if any, a list of errors.
We are sure that such an approach will be widely demanded by the market, - said the general director of Unidata Sergei Kuznetsov. - The essence is as follows: in streaming mode, a one-time processing of a large number of records is carried out. At the same time, the Unidata platform does not load all incoming data into its storage, processing them "on the fly." After processing the input data, the Unidata store contains only statistics about which record did not pass certain quality checks. The platform stores only the primary keys of the processed data and does not store the entire records. |
Upgrade to version 4.5
In early August 2017, the Unidata platform was updated to version 4.5. At the same time, the innovations affected almost all sections of the system, the Unidata company said.
Version 4.5 distinguishes the data operator interface, which has been significantly redesigned in terms of improving efficiency: the color scheme and interface elements have been changed to reduce operator fatigue, the new navigation panel provides the ability to adjust the compact presentation mode. Search and record screens have been simplified, auxiliary functions and operations have been moved to context menus. Added storage of contexts and partition states, which allows the operator to navigate between the main tools without losing input. Added new capabilities for viewing the original records, system information about the record.
In addition, the search capabilities of the data operator for fuzzy search of records, search for system attributes and errors of quality rules have also been expanded, and the Main Screen tool has been expanded with data quality statistics in reserves of criticality and error categories. Added the ability to view statistics in past periods.
Tools for using record quality assurance tools are implemented. We are talking about the external quality assurance service mode. In this mode, the platform allows you to apply quality rules to records without saving them. To do this, the following APIs are presented:
- Online check: performing a synchronous response to a request to check one record;
- Stream check-Processes a large number of records one time for a validation request. In this mode, the platform stores the test results and allows you to get them later for each record separately.
In addition, with the upgrade to version 4.5, the platform's ability to import data from databases has been expanded.
A billion records
Unidata, the developer of the data management platform of the same name, announced in May 2017 that as part of the work done to increase the efficiency of the system, the platform achieved a performance of one billion records. Unidata became the first company in the world (according to the company itself) capable of working with such an array of data in the field of regulatory reference information.
When developing, the principle of "billion per million" was put at the forefront: we are talking about a billion records on servers worth only a million rubles.
This result was our response to the ever-increasing needs of our customers. - General Director of Unidata Sergei Kuznetsov comments on the achievement. - That is why we decided for ourselves that it is one billion records that is the line to strive for. Such serious success was preceded by a long and systematic work, carried out with all care. Gradually, by including this bar in the list of main tasks, step by step moved in this direction. I would like to thank our developers who have done serious work on scaling the platform, which made it possible to say that we provide our customers with the previously incredible billion records.
Upgrade to version 4.4
On April 3, 2017, the Unidata platform received another major update - version 4.4. The innovations affected all sections of the system.
For the data operator, the mechanisms for finding and processing records have been expanded and simplified. Updates allow you to search for data for complex queries using flexible search conditions, as well as delete any number of records using a new batch operation.
For the data administrator, the tool for loading and unloading the data structure (metamodel) has been simplified. The tool is designed to export the entire data model (registers, directories, classifiers, user settings, data quality rules, etc.), as well as partial or complete import of the metamodel. Interactive prompts to the data administrator help you maintain relationships in the data model. The new version has expanded the list of quality functions and implemented the processing of duplicate entries in reference books.
Administration functions allow you to manage additional properties of user groups, as well as check the delimitation of data access rights with the display of prompts to the system administrator about the consistency of the granted rights.
The user settings are enhanced, allowing you to add new buttons and change the appearance of attributes in the record card, as well as search and modify records according to the specified settings.
Platform architecture
2021
2017
Platform modules
- Module - an external component that is dynamically connected to the platform, expanding the standard functionality
- Package - a thematic set of modules designed to comprehensively solve a specific technological or business problem
- Contractors
- 1C: Enterprise
- Data Dictionary
- EGRUL/SPARK/Kartoteka
- BPM Adapters
- Integrator
- STATE
- Social data
- GEO
Standard cycle for the development of the degree of readiness of organizations for the Centralization of data management
Main Platform Functions
- Centralized data collection
- Find and merge duplicates
- Data Analysis and Statistics
- Normalization, enrichment and data validation
- Sending data to third-party information systems
- Data management based on internal company regulations
The author's methodology for implementing the Unidata platform is based on international DMBOK standards, is fully adapted to the realities of the Russian market and is saturated with a set of industry modules. The unified roadmap for product development is focused on modern trends in the development of the industry in the field of data management (Data Governance), data quality assurance (DQ) and work with regulatory reference information (NSI ), as well as on new technologies and methods of processing large amounts of data. In the process of software development and testing, modern approaches are used, which are developed and improved every day.
Centralized data collection
Direct acquisition of data from external and internal information systems of the enterprise through the use of an extensive library of ready-made adapters, as well as specialized software interfaces that allow:
- Receive data as structured files (CSV, Excel, XML, JSON).
- Exchange data with relational databases.
- Exchange data with NoSQL databases, including distributed sources such as Hive, HBase, and others.
Indirectly acquire data through standard adapters to most of today's ETL-class tools and APIs embedded in the enterprise integration bus. The ability to use arbitrary data transport.
Normalization, enrichment and data validation
- Data Cleanup, Noise Removal, Single View Reduction, Multiple Classification
- Enrichment of data from external sources such as SPARK, databases FTS and others.
- consolidation of data of different origins.
- Check data for completeness, integrity and consistency.
- Clarification of relationships, including by fuzzy algorithms, data segmentation.
- Publish source-bound data quality reports.
- Create tasks to manually validate, reconcile, and refine data.
Find and merge duplicates
- Set up duplicate search rules based on combinations of accurate and fuzzy search.
- Using specialized domain-bound comparison and search algorithms.
- Set up rules for automatic processing of duplicates.
- Create manual tasks and refine potentially duplicate data.
- Set up a proprietary data merge in automatic and manual modes.
- Automatically and manually create links between similar records.
Data management based on internal company regulations
- Set up change reconciliation processes at the activity and object level in standard BPMN notation.
- Manage the status of records based on role and corporate hierarchy.
- Use an electronic signature to approve changes.
- Receive automatic notifications about new, active and expired tasks, including the tasks of their subordinates.
- Generate regular performance reports for different categories of users with the ability to distribute by data type.
Data Analysis and Statistics
- Full-Text Data Search
- Search for attributes by their values, including facet and classification searches.
- Composite search by relationships between objects.
- Analysis of the origin and history of data change.
- Upload data to Excel for manual analysis.
- Generate PDF and XLS reports that can be printed.
Sending data to third-party information systems
- Automatically send data to source and consumer systems in synchronous and asynchronous mode.
- Set up notification of receipt of specific data, data from a specific source, or the fact of performing a specific operation on data.
- Batch data on a schedule, on the occurrence of an event, or in real time.
- Integration with corporate "data transport."
297