RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2022/08/02 22:53:04

AI: from data to knowledge

For a modern digital enterprise, data is the foundation for the development of effective management decisions, both operational and strategic. However, on the path to decision-making, "raw" source data turns into corporate knowledge. Today we see several areas of such transformation, both at the level of accumulation of the company's digital experience, and at the level of the data itself. The article is included in the TAdviser review "Artificial Intelligence Technologies"

Content


Gartner analysts distinguish a separate group of IT solutions - the Digital Experience Platforms (DXP), for which the Magic Quadrant is released annually.

Gartner Magic Quadrant 2022 for DXP Solutions

Image:Gartner Magic Quadrant 2022 года для решений DXP.jpg

In their analysis of DXP platforms, Gartner analysts primarily focus on the integration of client data from various sources. So, among the leaders in the market segment is Adobe, which, according to Gartner, has a mature DXP solution that includes content management, analytics and personalization functions. Salesforce has become one of the leaders because it has made a content management system, CRM, as well as marketing automation functionality part of its ecosystem. Gartner analysts believe that the Salesforce DXP platform, together with CRM and CDP (Customer Data Platform), provides the most complete ecosystem of digital experience services.

Another interesting member of the Gartner DXP Magic Quadrant is the company. Liferay Its development - Liferay DXP - is distinguished by its high ability to integrate (an extensive set API and connectors, ready-made functions to support B2B and B2E use cases), as well as the availability of a open source version. It allows you to supplement the source code with your developments and create closed commercial products. For example, the Russian the company is moving along this path - L2U it has developed SAT a corporate knowledge base based on Liferay and. InKnowledge omnichannel platform L2U

Corporate CPS

The term "knowledge management systems" (CPS) began to be used back in the mid-1990s in connection with the tasks that arose when processing large amounts of information in large corporations. It is associated with the support of the processes of creating, distributing, processing and using knowledge within the enterprise.

In 2020, the global knowledge management software market, according to Verified Market Research, amounted to $22.45 billion and by 2028 $58.81 billion with annual growth of 12.5% in the period 2021-2028.

The CPS task is to accumulate not disparate data, but structured and formalized knowledge, that is, rules, patterns and principles that allow solving real production problems. This makes it possible to make deep corporate knowledge available to employees and reuse it at the level of the entire large corporation. Knowledge is classified and categorized according to the specific architecture and system approaches to knowledge management.

• InKnowledge Knowledge Management System.

According to the developers, InKnowledge is a system that structures knowledge and aims to improve user experience. It is sharpened for storing and organizing content: articles, news, documents, scripts, reused fragments (for example, company details) and other types of content, the structure of which the user creates on his own. All downloads files are stored in the internal InKnowledge media library and are unique, identifier which allows, for example, to replace an image or document in all places where it is used, if necessary, in one click.

The placed knowledge in the system is distributed into various thematic areas, which either intersect or are strictly isolated. In addition, the system is designed for integration with other systems, bots, virtual assistants, in order to act as a single information provider for them.

File:Aquote1.png
The knowledge base helps companies reduce time spent searching for relevant and reliable information and provide access to it for customers, partners, their own branches and representative offices. In addition, using the Knowledge Base, you can set up a self-service portal for customers and close technical support lines without the participation of operators and managers, the company says.
File:Aquote2.png

• CraftTalk KMS Knowledge Base.

File:Aquote1.png
In the CraftTalk platform, the knowledge base block has always been one of the key for organizing the joint work of Call Center operators and artificial intelligence. It is he who helps the operational high-quality training of the bot and effective answers, - says Mikhail Sbitinkov, CTO and co-founder of CraftTalk.
File:Aquote2.png

CraftTalk KMS is a solution database that stores, distributes, and manages all collected corporate information. Enterprises can use it as a verified omnicanal source of knowledge on all necessary issues: the company's internal procedures -, HRinformation the knowledge base for,, IT business processes information about, projects quick access to information through chat, news mailing, the necessary working tool for the corporate Call Center and an artificial intelligence-based text assistant. The simple and convenient interface of the system is implemented according to the wiki principle.

The CraftTalk KMS Knowledge Management System is built in the No-Code paradigm for the fastest implementation and effective training of employees to work with it: an intuitive graphical script editor for operators or bots, visual block diagrams.

File:Aquote1.png
Data and information are consolidated and structured, which allows each team member to quickly find the necessary information, quickly immerse new employees in the desired topic, and also easily exchange information with employees within and outside the company, if this is approved by the information security policy, the company says.
File:Aquote2.png

CraftTalk cloud technology allows you to use the corporate knowledge base right in the messenger chat - there is no need to switch to other systems, the integration of the knowledge base displays response options directly to the chat.

The solution easily integrates with existing company knowledge bases and other enterprise systems. According to Denis Petukhov, CEO of CraftTalk, CPS CraftTalk KMS is a solution for the full import substitution of such popular foreign products as KMS Lighthouse, Confluence (Atlassian), Vision.

• СУЗ Minerva Knowledge/Naumen KMS.

  • is the Minerva Knowledge company's MinervaSoft knowledge management solution for medium to large businesses with a large number of line personnel. The system is a single source of information for the entire company: personnel management, client service, commercial services, marketing and PR, legal services, workflow management and, business processes office employees, operations departments, helps synchronize content in external and internal sources, provides information about changes and quick access to information through accurate search.

The Minerva widget is embedded in corporate IT systems to understand what employees are doing and recommending the right knowledge.

In the spring of 2021, NAUMEN acquired a stake in MinervaSoft, and the product is further developed under the Naumen KMS brand. The Naumen KMS solution will allow large and medium-sized companies to create knowledge bases for contact center operators and front office employees. For example, when integrating with a chat bot and a company website, the system will avoid a situation where customers receive different answers in different service channels.

The product helps to create a universal source of information for all employees, in which information about the company's products, services and business processes is synchronized and updated in a timely manner.

Naumen says that in specific implementations it is possible to reduce the duration of processing requests by 10% already three months after the system is launched.

• Knowledge base Bitriks24"."

With this product, Bitrix 24 creates a single enterprise data warehouse that is easily updated and easily edited by all employees of the company. In fact, this is a multimedia space of corporate knowledge created for the systematic storage of regulations, articles, checklists, documentation and other data in the company. The necessary information is found in it in a matter of seconds - for this, a smart search works in the system.

In this case, you can create a separate knowledge base for each project within the working group - with rules, algorithms and other important information: all project participants will have access to it by default, will be able to collectively create, discuss and edit articles.

Any employee can create and edit the Knowledge Base. To do this, templates are used, ready-made blocks in which you can change texts, images and videos on the fly.

• CPS Rostelecom"."

The knowledge management system is a digital product of Rostelecom Contact Center, designed for use in contact centers and customer support services. With its help, a single information space is created for the customer support service of the telecom company. This information space contains all the information materials that are used in customer service.

Architectural solution of Rostelecom CPS

Source: Description of the Knowledge Management System, JSC MC NTT

Identify implicit relationships in data

File:Aquote1.png
Pattern detection is, one might say, a standard task of machine learning technologies. And, in general, the success of its solution depends on the amount and quality of existing data, the competence of specialists and available resources. But, in particular, it copes with standard tasks in the commercial market successfully, - says Alexander Khledenev, director of digital solutions at VS Lab.
File:Aquote2.png

File:Aquote1.png
With this approach, models are trained on "normal" data or events and automatically or on the basis of models with "abnormal" previous behavior allow you to quickly or proactively detect a deviation, - explains Alexander Khledenev.
File:Aquote2.png

File:Aquote1.png
The nature of monitoring tasks makes it necessary to detect anomalies in a mode close to real time, says Alexander Khledenev. - Therefore, they are implemented using streaming data analysis and in-memory computing platforms, which makes implementation expensive.
File:Aquote2.png

True, with the spread of Edge Computing (equipment for boundary, peripheral computing) and decentralized (federated) AI systems, you should count on its reduction in cost and expansion of use, the expert believes.

File:Aquote1.png
Algorithms and graph analysis are best suited for detecting directions and indirect, non-obvious relationships between various entities and objects of research, says Alexander Khledenev. - Due to this, graph analysis has become widespread in the cybersecurity market and as a tool for special services and law enforcement agencies (for example, the well-known American system Palantir,), and is also used for analyzing social networks.
File:Aquote2.png

Often, such products on the market are combined under the term OSINT (Open Source Intelligence), and the framework of the same name is used for research (investigation). The OSINT toolkit allows you to quickly and clearly present all incidents, and this, in turn, allows employees in conjunction with AI to quickly and efficiently process them, for example, quickly stop incidents related to fraud and money laundering.

The commercialization of such solutions is still at an early stage, says Alexander Khledenev. However, they are gaining more and more popularity in various areas of activity, in addition to cybersecurity. For example, they are used to build knowledge trees for the purpose of competitive exploration, supplier analysis, detection of technological trends for early investment. There are also more "mundane" scenarios - analyzing the user basket for retail or optimizing routes for transport companies. Among the popular areas of application is also the analysis of the borrower to assess the credit limit and prepare cross-proposals from the bank.

According to Alisa Selivanova, leader of the "Solutions and Technologies of Credit Antifrod" unit at Sberbank, the company has chosen a mechanism of large graph models to create a "portrait" of the client. According to Alice Selivanova, such a scoring model has been used for several years, and during this time it has not degraded.

Graph analytics

File:Aquote1.png
The value of the data will form relationships, Gartner is confident. - By 2023, they will contribute to rapid constitutionalization for decision-making in 30% of companies around the world, where they will be used to research relations between organizations, people and transactions.
File:Aquote2.png

The fact that Graph Analytics technologies are very relevant today is evidenced by various market research. Thus, analysts at Markets and Markets in their global forecast until 2024 predict the annual growth of this market segment by 34.0%: from $584 million in 2019 to $2,522 million by 2024. Analysts emphasize that the main drivers of market growth are: growing demand for low-latency query analysis and the ability to identify relationships between data in real time.

Image:Graph Analytics.png

According to Graph Analytics Market Size, Share, Potential Growth, Competitive Analysis 2022-2027 report prepared by Market Research Future analysts, Graph Analytics market size will grow with CAGR at 31.6% and reach $2885.2 million by the end of the forecast period.

The Graph Analytics market is segmented by analytics application. The route optimization segment is expected to be the fastest growing segment in this market due to the growing need to determine the best route in logistics, transportation services, retail and e-commerce. Other growing application segments of Graph Analytics include customer analytics, risk and compliance management, advisory mechanisms, fraud detection, transaction management, and asset management.

The point is that with the help of graph data analytics, connections will be discovered that were not easy to identify using traditional analytical tools. For example, in conditions when the world is striving to quickly and correctly respond to the changing conditions of a pandemic, graph technologies will help connect spatial data on the smartphones of residents and identify people who have been in contact with persons whose tests for coronavirus have tested positive.

Combined with algorithms machine learning , these technologies can be used to analyze thousands of data sources and documents in order to help physicians and specialists in the field of organization health care quickly find new possible treatments or factors that contribute to negative manifestations in some patients.

For example, in Sberbank, AI in 100% of cases makes decisions on issuing loans to individuals, studying the digital traces of the borrower. According to German Gref, the amount of information contained in them allows us to say that a digital double of a person is being formed in the bank. And these twins add about 500 MB in weight every day, as constantly operating smart monitoring programs bring new data about customers and borrowers every day.

In order for analytical scoring processes to take place at a rate close to real time, real-time data marts are usually created. They are built on the basis of special high-performance, focused DBMS on very fast data processing, for example, an open DBMS. Apache Cassandra At the same time, the Graph neural networks Neural Network (GNN) is actively used for the tasks of building complex connections between many entities.

In fact, graph neural networks are a way to apply classical models of neural networks to data in the form of graphs.

Source: Mail.ru Group Blog on Habr, May 2021

A specific feature of GNN is that one of its layers is a regular fully connected layer, with the only difference that the weights in it do not apply to all input data, but only to those that are neighbors of a particular vertex in the graph, in addition to its own representation from the previous layer. The features of the presentation of data in GNN have led to its popularity for creating recommendation systems, for example, in this way they simulate the interaction of users with goods in order to select personalized offers for goods and show them to specific users in real time by ranking the results.

Ontologies - a mechanism for creating semantic models of objects, situations, processes

The term "ontology" came from philosophy, where it implies the doctrine of everything that exists in general philosophical categories: being, substance, cause, action, phenomenon. In knowledge engineering, ontology refers to a detailed description of some problem area that is used to formally describe it. It can be said that an ontology is a specification of some domain that includes a dictionary of terms of that domain and a variety of logical links (element-class, part-whole) that describe how the terms relate to each other. It is important that, unlike the philosophical prototype, computer ontologies make it possible to present concepts of the domain in such a way that they become suitable for machine processing.

File:Aquote1.png
The ontological approach to CPS design allows you to create systems in which the knowledge accumulated within the organization becomes available to most users. The main advantages of this approach: ontology presents the user with a holistic, systemic view of a certain subject area (PrO); knowledge about PrO is presented uniformly, which simplifies their perception; the construction of the ontology allows you to restore the missing logical connections of PrO, "wrote Anatoly Gladun and Yulia Rogushina in their article" Ontologies in Corporate Systems "in the journal" Corporate Systems "in 2006.
File:Aquote2.png

The authors then proposed an illustration of the knowledge of the corporate CPS in the form of corporate memory, which records information from various sources and makes this information available to specialists for solving production problems.

Source: Ontologies in Corporate Systems Part I. A. Ya. Gladun, Yu.V. Rogushina, journal "Corporate Systems" (No. 1, 2006)

In 2020, Gartner analysts in a study of new technologies Hype Cycle for Emerging Technologies predicted for ontologies a plateau of productivity within 2-5 years.

Today, according to the professional community, ontologies are a key method of solving the problem of "semanticization" of web content. In this regard, problems of the so-called ontological engineering (methods and development tools and evolving ontologies), as well as the availability of existing ontologies, acquire particular importance.

A vivid representative of the semantic paradigm is Semantic Web, based on large ontological resources and a specially developed ontology language Web Ontology Language (OWL) for them. The OWL language is being developed by the W3C consortium and has become a de facto standard not only for Semantic Web, but also for other technological areas, for example, information search in large arrays of unstructured data and natural language word processing.

Among the critical aspects of Semantic Web, it is worth mentioning two:

  • Multi-lingual content. Semantic Web is essentially designed to support effective access to information regardless of which language it is originally presented in. I must say that the solution to this problem, experts associate to a large extent with the solution of problems of ontological engineering and the automatic processing of EI texts.
  • Another significant problem of Semantic Web is its stability. This direction involves serious efforts in the field of standardization, which should ensure the creation of reliable technologies for the formation of knowledge spaces.

It is the problem of standardization that is now one of the key positions in Semantic Web issues, since the possibility of creating open and/or cooperative semantic systems based on previously created ontologies directly depends on its solution.

In a scheme that experts call Tim Berners-Lee's "puff pie" (after the ideologist of the Semantic Web concept), its lower levels, through the efforts of research teams and the international W3C consortium (WWW Consortium), developed and implemented recommendations for XML, Namespace (namespaces) and RDF formats. They currently exist at the de facto standard level. It can be stated that the results of work in this area have already passed from the research stage to the stage of practical use, including in commercial systems.

"Puff
Pie" by Tim Berners-Lee

Thus, at the level of RDF schemes, W3C RDFS standards (RDF schemes) are proposed and supported, which allow you to specify dictionaries of used terms, as well as develop appropriate specifications for existing and new applications.

But at the ontological level (Ontology) of the "layer pie" the situation is somewhat different. In fact, in this direction, a rather powerful reserve was created in the framework of research on the presentation of knowledge, in particular, general approaches to the presentation of knowledge such as frames, semantic networks, etc. At the same time, work to standardize the means of representing knowledge of the ontological level is still far from complete, and the creation of appropriate ontological engineering tools is currently one of the "hot spots" of this area.

The main areas of research and development here are the following:

  • Create more powerful ontology specification tools that provide logical knowledge inference and knowledge integrity testing.
  • Create means to maintain the integrity of ontological specifications as they evolve, both the specifications of the models themselves and the standards.
  • Create cross-reference specification tools between dictionaries and convert specifications.

Why is this important for corporate information systems? In fact, ontologies are a meta-language that is well embodied in the form of software. Therefore, this tool has become widespread: from the creation of semantic "models of the world" to an adequate translation of HER texts (in a natural language) between different languages.

File:Aquote1.png
Ontologies perform the function of integration, providing a common semantic basis in decision-making and data mining processes, as well as a single platform for combining a variety of information systems, "says Anton Ermakov, head of Comindware.- In other words, ontological models become intermediaries between business users and the information system.
File:Aquote2.png

Graph databases are used to implement the ontological model. It is worth noting that, despite the relatively young age of these databases (the first graph database using the oriented graph model appeared in 2007), today there are various types of them on the market. In particular, HyperGraphDB uses a multigraph model, ArangoDB and OrientDB are positioned as multimodel DBMS, and GraphX, a distributed framework for working with graphs in the Hadoop ecosystem, uses the Spark computing mechanism.

All of them provide support for key characteristics of ontological models: decentralized data structure, support for distributed structures, data connectivity through the semantic layer of the IT architecture, storage of both structured and unstructured information.

Thus, graph bases are used on Facebook for flexible and operational management of the social network, in Amazon - for the functioning of the recommendation service. Sberbank is experimenting with a prototype business solution based on ultra-large graphs, hoping with their help to solve problems with billions of connections in an interactive mode: from the search for affiliates and organizations to product recommendations.

Ontologies: General and Specialized

The key issue of constructing the ontology of the subject area is the task of classifying concepts partially. For a narrow area, it is solved at the initial stage of its creation. This is not the case with top-level ontologies, which contain the most common concepts of the whole real world that do not belong to a strictly limited domain.

Ontology hierarchy

Source: Ontologies in Corporate Systems A.Ya. Gladun, Yu.V. Rogushina, "Corporate Systems, No. 1, 2006

Nevertheless, to date, about one and a half dozen top-level ontologies have already been created, which are used in practice. For example, the SUMO ontology developed by the IEEE Institute aims to integrate existing ontologies into a single framework that would have the status of a universal standard. It implements the categorization of concepts on the basis of the philosophical conceptual apparatus: the vertex is the concept of "Essence," entities are divided into physical and abstract, etc.

The BFO ontology was created as a top level for domain ontologies in the field of science. Today, the main goal of the BFO is to support the development of subject area ontologies in order to promote coordination of the work of various groups of specialists, achieve consistency and prevent redundancy of ontologies.

Hierarchy of type is-a in BFO 2020

Source: GOStov base, allgosts.ru

The ontology of BFO is the basis of the Russian GOST R 59798-2021 "Ontology of the highest level (TLO). Part 2. Basic formal ontology (BFO), "which was approved by Order No. 1300-st of the Federal Agency for Technical Regulation and Metrology of October 25, 2021. It was developed taking into account the main regulatory provisions of the international standard ISO/IEC FDIS 21838-2: 2021" Information technologies. Top-level ontologies (TLOs). Part 2. Basic formal ontology (BFO) "(ISO/IEC FDIS 21838-2: 2021" Information technology - Top-level ontologies (TLO) - Part 2: Basic Formal Ontology (BFO), "NEQ).

The Russian standard describes basic formal ontology as a resource for supporting the exchange of information between heterogeneous information systems.

File:Aquote1.png
The variety of subject areas for which these resources have been created prove the versatility of the proposed model of linguistic ontology, - say development managers Natalya Lukashevich and Boris Dobrov in the article "Design of linguistic ontologies for information systems in wide subject areas," "Ontology of design," volume 5, No. 1, 2015. - That is, through such a model, you can describe the basic properties and relationships of concepts present in any subject area.
File:Aquote2.png

The IPPI RAS develops a general-purpose ontology designed for the tasks of semantic analysis of text in any natural language. In fact, the STAGE system solves the problem of extracting meaning from a natural language text. Thanks to semantic analysis of the text, additional information is extracted that can be used for translation. In general, the linguistic processor ETAP-3 is a computer system with a large amount of knowledge about the natural language in general, as well as Russian and English, in particular. As a result, the machine translation system is able to translate texts from Russian into English and from English into Russian based on the "Meaning - Text" mechanism.

Text-Meaning Transformation for Natural Language Sentence

Source: IPPI RAS

On the basis of ontological models, ABBYY Compreno technology works. The subject ontomodel allows you to identify entities (names of organizations, subject of the contract and its parties, amounts under the contract, etc.) for building filters and in-depth analytics. User entities are also connected in the background along with semantic enrichment.

Источник: презентация «Перспективы ABBYY Compreno на российском рынке: бизнес-сценарии, преимущества, эффективность решений», Maxim Mikhailov, Senior Vice President, ABBYY, 2015

For example, Rosenergoatom implemented an integrated approach to improving the quality of design documentation when creating complex technical facilities (for example, nuclear power plants): ABBYY Compreno ontology was created, describing the conceptual apparatus of the subject area (hierarchy of terms and concepts, synonymous constructions, semantic connections, typical characteristics, value ranges based on available UML descriptions and expert knowledge). This approach made it possible to organize a single semantic representation for various descriptions used on complex technical objects: an object information model designed in the form of drawings, diagrams, 3D objects, etc., project documentation, as well as natural language documentation.

DataFabric has created the universal enterprise ontology platform DataFabric KGL, which allows you to unify access to all data in the enterprise using the open source data virtualization platform. DataFabric KGL (or Logical DWH) of the enterprise is implemented on the basis of a graph (semantic) data model in subject area terminology. It allows federated access to heterogeneous data from different sources without the need for their preliminary collection, aggregation and storage, the company says: data is accessed in domain terminology through an abstraction layer in the form of a business glossary, without the need for any operations at the physical storage layer.

The core of the DataFabric KGL platform is the enterprise knowledge base, which is an ontological domain model. The knowledge graph includes all terms, entities, concepts, and definitions that are involved in enterprise business processes. All entities are related to each other, and all business entities from the knowledge graph are projected onto the relational structure of each of the data sources.

The possibilities of analyzing data on knowledge graphs allow: to establish connections between entities, search for signs of affiliation between objects, find relationships between events and objects, and also draw logical conclusions based on the "world model" implemented in the system. In particular, the system is able to independently generate new data and establish new connections between data based on the "picture of the world" modeled through the ontological model of the subject area.

Trinidata uses ontologies to create digital enterprise twins within the framework of the ArchiGraf.MDM system. At the same time, a digital twin means a computer model of the internal processes of an object and its interaction with the environment.

File:Aquote1.png
The ontological model helps to cope with the data structure of any complexity, to change this structure in the course of working with data, "explains Sergey Gorshkov, head of Trinidata. - Firstly, the ontology allows you to work with thousands of types of entities and different relationships between them, doing it conveniently, intuitively, flexibly, including changing the structure in the course of the system.
File:Aquote2.png

Source: Presentation "Data Collection and Modeling for Creating Digital Twins," Sergey Gorshkov, Director of Trinidata

The Archigraf.MDM platform supports a number of key features:

  • Storing the ontological model of all corporate information.
  • Support for NSI and master data that all other enterprise applications use.
  • Store any transactional, operational data in DBMS clusters under platform control with access through a single interface (data virtualization.)
  • Real-time access through a single API to the data of any third-party data warehouses, including legacy software DBMS, web services of modern enterprise applications, etc. (logical data mart).

The platform's capabilities enable it to operate in Open Platform Communications (OPC) format in industrial plants, where OPC software technologies provide a single interface for managing various devices and communicating.

Source: Presentation "Data Collection and Modeling for Creating Digital Twins," Sergey Gorshkov, Director of Trinidata

Ontologies for Business Process Modeling and Enterprise Application Management

The concept of ontology and ontological analysis is included in the procedures for modeling business processes, because the description of the business process is, in fact, structuring data and knowledge. An example is the Process Specification Language (PSL) ontology, which is designed to automatically exchange information about processes occurring in various production applications: production planning, electronic document management, project management, etc. It is approved as an international standard ISO 18629.

Manufacturing Service Description Language (MSDL) is an ontology language to describe the production itself, which includes five levels of abstraction: suppliers, stores, machine tools, components and processes.

The Russian BPMS-system Comindware Business Application Platform works on the basis of the graph base and ontological data model. The company emphasizes: it is the graph database that provides joint storage not only of the data itself, but also of the relationship between them, that is, semantic attributes. Thus, the graph database naturally integrates the ontological business model with the data. Moreover, indirect link processing is easily implemented in the graph database, which, for example, is generally impossible in relational DBMSs.

File:Aquote1.png
The formalism of ontological models provides a really, very flexible and intuitive mechanism for describing business processes of any complexity and, critically, their changes, "emphasizes Anton Ermakov from Comindware.
File:Aquote2.png

In addition, ontologies play the role of a basis for the implementation of Low-code process management tools aimed at company employees.

QCD semantic layer

The ontological model is able to maintain the integrity of the data and ensure its proper quality when changing the external and internal operating conditions of the system. For these reasons, for example, Sberbank creates a single integrated data model in the formalism of semantic description - in this way, business units are provided with access not to "raw" but high-quality data, and flexible data integration is implemented throughout a large organization. This single integrated data model in the company is called the "Single Semantic Layer" (ECC).

ECC contains meta-data that provides efficient management of data processing processes in the corporate data warehouse (QCD), including loading data from high-load banking systems at the level of several tens of terabytes per day. The task of this layer is to separate the consumer from the "raw" data and provide access to integrated, consistent and high-quality ECC data, which will eliminate duplication of the work of various divisions of the bank for data integration, as well as make working with data more understandable and transparent for the user.

Source: Sberbank, 2018

The semantic layer of QCD is also implemented in the Loginom analytical platform of the Russian company BaseGroup Labs. The platform provides, in particular, advanced data integration, including access to heterogeneous sources (office applications, 1C: Enterprise, DBMS, ERP, CRM systems, files, web services), data consolidation into storage and a convenient semantic layer of data storage for extracting information using familiar business terms.

A few years ago, Avikomp Services, a developer of semantic processors of the OntosMiner family, became part of Rostec (the United Instrument-Making Corporation). The technological architecture is based on the concept of using subject ontologies to control the processing of natural language texts.

OntosMiner solutions are focused on supporting W3C consortium ontology standards (primarily OWL), processing result specifications (XML, OWL, N3) and international TREC level standards during the word processing stages. The company emphasizes a focus on such architectures of designed systems that provide flexible integration of heterogeneous components through the use of information exchange standards between them.

The ontological approach to designing OntosMiner family systems has led to the creation of its own ontological engineering toolkit. This gives reason for representatives of Avicomp Services to argue that the OntosMiner system is able to analyze the semantic structure of any type of data. In the future, systems can be created on its principles for analyzing images and sound, as well as for controlling home devices ("smart home"), the company says.

The United Instrument Corporation intends to implement the OntosMiner linguistic processor in projects related to the construction of complex analytical systems and monitoring systems for a wide range of customers. In addition, these technologies are considered promising for projects in the field of DBMS and integration of multi-format information storage.

In general, ontological models of subject areas are a promising direction for the development of systems related to the analysis of natural language. For example, such models capable of capturing the ambiguity of words play a consolidating role in order to form a single space of descriptions of complex technical objects and systems. For example, a project of this kind based on ABBYY technologies was carried out at Rosenergoatom.

Complex technical objects, in particular, nuclear power plants are described in various ways. The most significant are:

  • information model of the object, designed in the engineering system in the form of drawings, diagrams, 3D objects, plans, etc.;
  • design documentation containing the description of the object and its parts in natural language;
  • engineering information model (describes parameters and connections between parts of the object).

The volume of data can be judged, for example, by such figures: 272,814 files in 23,982 folders make up the design documentation of one Russian NPP project. The required amount of file storage is 522 GB.

In order to ensure the internal consistency of the characteristics of the organizational structures, systems and components of the NPP, fixed in the design documentation, an ontology was created that describes the conceptual apparatus of the subject area. (As a basis, the current ontology was used, which is used in the information models of nuclear power plants and CAD used for their development). With the help ON ABBYY InfoExtractor of, all documentation is transformed into a comprehensive system of objects, entities, values ​ ​ and relationships, which made it possible to identify possible internal contradictions within the array of identified characteristics.

Thus, ontologies, with the help of which you can imagine a consistent system of concepts of the simulated field of knowledge, are gaining increasing popularity among developers. However, in practice, the creation of application systems most often requires the integrated use of various methods of knowledge representation. And this is another "hot" direction of research work, which today is conducted by various scientific teams.

Possible integration of different types and models of knowledge representation

Next Overview Material > >
> Browse Home > > >

Other Review Materials

Other materials on the topic of AI