"Smart" document flow: how to implement machine learning in EDMS and is it necessary
The Intellectual Algorithms (IA) are already ready for work with documents in electronic document management systems, but whether the companies are ready to it? The fact that the future where algorithms of the machine learning (ML) can replace secretaries or registrars, came is confirmed by still not numerous, but real tests of opportunities of application of MO in EDMS in Russia.
Content |
At emergence of any new technology the question becomes ripe — what it is necessary for for whom and, the last as to implement it. Spoke about universal transition to electronic document management 10 or more years ago, and still about it speak as about yet not ended process. Paper documents are still present at the organizations. Digital transformation continues (or begins, according to some). How many years will be required to employ in office of AI and for what solution of tasks it validly will be effective?
Digital Design company — one of the few who achieved in this direction of a certain progress and showed practical results. On the account of the company — three pilot projects which showed ample opportunities of application of the intellectual algorithms (IA) for solving of tasks of office implemented based on SDU "Priority" on the Docsvision platform. And what is interesting, the state organizations became one of the first intellectual algorithms, experimental for studying of a possibility of application, in EDMS.
First, a public sector — the main customer of EDMS / ESM-sistem, state institutions process the enormous volume of documents with the regulated time of their working off, and the efficiency and efficiency of public authorities in many respects depends on quality and efficiency of documentary interaction. The EDMS functions in state agencies are not limited to in-band management, large volume occupies external document flow — communication with citizens and the organizations for providing the state services. Due to the development of the electronic government the amount of the processed requests can reach several thousand a day.
Secondly, processes in government institutions are typified, as well as documents therefore application of intellectual algorithms will be more effective, than in structure where a difficult and unique organization structure.
Machine learning can accelerate processing of documents, prepare all data, necessary for human decision making, and still prevent human errors, and without the super powerful computer. That day when people cease to understand who responds to the request — artificial algorithms or the secretary will come soon... But still machine learning is engaged in other tasks.
Applied problems of machine learning
"Smart" technologies are designed, first of all, to help the person to get rid of the routine transactions which are not requiring adoption of any solutions. Thanks to application of algorithms of machine learning the document can pass all way from registration before formation with the minimum intervention of the person in process. Though at first the machine nevertheless will need to learn to receive different representations from data array (Big Data), in this case are results of processing of specific documents the person.
Machine learning is provided by a set of algorithms today, some are quite universal and can be used for different tasks. To understand what place is taken by algorithms of machine learning in the course of document flow, we will disassemble the block of tasks of processing of texts which solutions formed the basis of new functional system modules of documentary management Priority based on Docsvision.
Document clustering
One of first-priority (but also the simplest, but important) the problems of data analysis received from an electronic document management system is creation of a cluster data model. Cluster analysis represents splitting the database into clusters — group of similar elements — and has a wide range of applicability. In general, taking into account volumes of documents which process workflow systems the ability of a system to break documents into clusters will be useful before application of any algorithms of machine learning in it. The clustering will be useful to simplification of the solution of such tasks as search of duplicates, search of close/similar documents, etc. and also will allow to construct an algorithm for more exact prediction of attributes of documents. The most obvious options of practical application of results of a clustering is an automatic classification (or tagging) new documents.
On images No. 1 and No. 2 results of a clustering of the database of real documents which is carried out by specialists Digital Design during a pilot project in a state institution are provided. A system, detecting similarities in the text, defines the document to one of clusters, thus arranges objects in rather homogeneous groups.
Prediction of attributes of documents
Any electronic document is followed by a set of attributes (the author, division, a document type, the contractor, and dr) which need to be filled for its further processing and also the subsequent document retrieval and report generation. Actually, processing of the document completely depends on a set of its attributes: for example, the documents which arrived from a certain addressee and on a specific subject (those categories about which was told slightly above), should be processed by specific division and by quite specific rules. Now this processing procedure of each document is performed for 100% manually. But, considering structuredness of this information, it is easy for same rules to train also an algorithm of MO. "Having swallowed" the sound database in which documents are structured according to rules of the organization, algorithms of machine learning will be ready to predict independently new attributes and routes of processing for new documents and also to predict the number of the days which are required for task performance and to define the contractor. In order that algorithms learned to do it with a high accuracy, the base of the structured and not really these huge amounts is necessary.
During tests of algorithms of machine learning to specialists Digital Design was succeeded to reach 95% of accuracy of determination of the division responsible for processing of the document, according to its contents.
Other relevant tasks
Automatic abstracting
Manual summarization (formation of short informative "pressing" from the complete text of the document) — the complex, routine work requiring big labor costs therefore it is here too reasonable to use instruments of automatic generation of a summary of the text. The first publications on a subject of methods of automatic abstracting of texts appeared in 1958. Since then a large number of methods was developed and the quality of results improved.
The main objectives of automatic text summarization in EDMS are selection of the main information in the document, a duplication exception.
Selection of anomalies in agreements
This task comes down to finding atypical parts in the text of the agreement: errors new or absent points. For the person it is rather long process and not such idle time, and here algorithms of machine learning cope with a task for read seconds.
Where it can be applied and how it works?
Before these algorithms of MO acquired the right to become a part of process of document flow and independent functional units in a system, specialists Digital Design conducted researches based on, containing more than 1,000,000 real documents. The main result of this research — confirmation of a possibility of application of MO in work of EDMS for various tasks.
Of course, the main objective is an increase in efficiency of use and the analysis of the saved-up knowledge base. If to speak in more detail about what advantage algorithms in unstructured work with documents with high extent of dangerous influence of a human factor can introduce, then MO:
- Will help to cope with the growing flow of incoming documents and addresses
- On the basis of contents of the text of the arrived file a system autocompletes necessary data in a document card, sets interrelation with other similar documents or correspondence and itself offers the addressee of the message, based on given about accomplishment of similar questions. And determines terms of processing of the document by the same principle itself.
- Will help to increase personal productivity of employees
- Algorithms select the suitable contractor of order not only proceeding from a task profile, but also loading of the employee, allowing to distribute thus loading between contractors.
- Will simplify work with the organizational and administrative documentation (OAD) and regulations
- A system for several seconds will automatically determine a route of approval of the project on the basis of its contents, will find the connected regulations and ORD and will make the resolution of the document. As result, acceleration of process of approval and reduction of errors at release of documents.
- Will raise the security level of data of limited access
- Intellectual monitoring of base of the documents intended for office use will provide protection against unauthorized access, will warn about suspicious user activity.
Features of implementation
To machine learning, to be exact that mathematical magic which under it disappears, actually it is already more than 50 years. But only today increase in power of working computers made possible application of algorithms for the solution of routine tasks. Earlier to solve a problem using MO the supercomputer would be required, and the notebook suffices now. During this time data scientists could make great progress in algorithm elaboration, technologies of semantic analysis of the text that allowed to solve problems with rather high accuracy, then there was electronic document management, experts began to claim about universal implementation of EDMS. As a result, the companies saved up enough electronic documents in order that it was possible to find the general patterns of data, interesting dependences and, at last, to apply algorithms of machine learning.
And, apparently, all conditions are created to pull out the person from processing of documents, and completely independent organism of EDMS in a tandem with algorithms of machine learning will earn, but actually so to think early. First, decision making still remains for the person how many algorithms connected to processing of the document, they will not be able to make important management decision or the decision on reduction of the budget, for example. Secondly, as showed researches Digital Design, algorithms should be applied to the structured databases, then they will be able to carry out tasks with a high accuracy therefore already now it is worth starting design of such data model by application of algorithms of machine learning and data analysis. At the first grade level of algorithms not to do without participation of the person. As it is correct to make it and what to begin with — specialists will prompt Digital Design which already have a practical experience of implementation of algorithms of machine learning in EDMS.