RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

MISiS: Semantic Quick Search Technology for Specialized Databases

Product
Developers: NITU MISiS (National Research University of Technology)
Date of the premiere of the system: 2021/12/22
Technology: Data Mining

Main article: Data mining Intelligent data analysis

2021: Presentation of semantic quick search mechanism by specialized DB

Russian scientists have developed a mechanism for semantic quick search of specialized databases. The study on the segmentation of text documents for optimization and 20% acceleration of the search for the necessary information by users was implemented by a group of scientists at NITU "MISiS" as part of a grant from the Russian Science Foundation in the amount of 18 million rubles. This was announced on December 22, 2021 by TAdviser at NITU MISiS.

Scientists solved the problem of correctly searching for voluminous documents close in meaning. Usually, large complex documents, especially within the framework of specialized search engines, contain several topics at once, which make automatic search very difficult. The researchers suggested using a segmentation method.

IISiS technology can be used to improve the quality of information search and analysis of data in specialized search engines intended for scientific and industrial organizations - according to reports, patents, scientific publications.

File:Aquote1.png
"Document segmentation is the division of text into passages that refer to the same thing, which can be useful in different natural language processing tasks. Such tasks include, for example, analyzing large documents or searching by document content. From the point of view of applied machine learning, segmentation of long texts is justified, since various vectorization methods usually work better on short texts. This is logical, because the larger the text, the more different meanings in it and the more difficult it is to aggregate all these meanings into some general vector representation, "said Nikita Nikitinsky, a researcher at the Center for Big Data Research at NITU MISiS.
File:Aquote2.png

The Center's team of specialists proposed the following solution to this problem: to split the document into several segments, each of which belongs to the same topic. Such thematically homogenous pieces of text to computer to algorithm are easier to search.

File:Aquote1.png
"As part of the study, we used a method based on the additive regularization of thematic models (ARTM) approach and the Topic Tiling algorithm. As a result of the experiments, it was possible to improve the accuracy of the work of a highly specialized search for scientific publications from 55% to almost 82%, "Nikitinsky added.
File:Aquote2.png

According to the developers, the technology has already been implemented in the Russian project to create a register of mandatory requirements. According to their estimates, using the proposed method, up to 15-20% increases the speed and efficiency of searching for the necessary information by users, which is critical for scientific and industrial organizations.

As you know, as of December 2021, similar problems are solved by researchers and engineers from other large organizations, including the University of Mannheim, the French research center Eurecom and Google Research, which, as part of their research, studied the publications of members of the scientific team on this topic.