Types of the systems of stream input of documents
To us everything is dreamed that the paperless era, at last, came. The number of folders in cabinets of employees really decreased. In state structures there are databases using which there are state services, there is a portal, SIEI and RSMEV – with their help we can effectively interact with different departments. At us is EDMS, helping us to manage documents. We have electronic archives which at separation of the wheat from the chaff (forgive, documents from data), give us the chance by data from 1C and ERP to tighten documentation and so on and so on. But the paperless kingdom and the kingdom of information systems modestly services section of input or the center of digitization. Exactly here data from documents separate, and the enormous potential for the solution of specific functional objectives – fast drawing up selections of documents, processing of citizens' appeals, formation of analytics according to documents and data, etc. is put.
Content |
In article we will consider three types of a system of stream input of documents
At once we will make a reservation that the system of stream input "is omnivorous": current, archive, structured, semistructured, unstructured; paper, electronic, different types, formats and statuses can be entered into information systems, from all up-to-date data can be drawn. Under each document type the corresponding equipment and the software are selected.
Input section
Section of input will be organized when there is a problem of digitization of certain documents of divisions where there is no need for geographically distributed structure of sections of scanning therefore on section of input there can be several scanners, but one type (document, planetary, etc.). For example, section of input of technical documentation, accounting documentation, etc. Section of input may contain any necessary for software. So input sections, can be in different divisions, and each division will load into the single accounting system the documents – PBD, drawings, etc.
Separately it should be noted OTsO – the General service centers in which input sections will be organized. In the large companies, holdings with geographically distributed complex structure the uniform division to which business processes of a certain direction for all organization are transferred is created. For example, in accounting OTsO functions of financial and tax accounting concentrate. So instead of 10 accounting in each branch, OTsO is created, and processing of financial information of all enterprises of group of any holding is consolidated in it that optimizes the state and reduces operating costs. In OTsO input section is created where all finance documents are digitized and loaded into a uniform IT system (i.e. in fact within OTsO the same section of input for division, only for division in the form of OTsO is created).
In the large organizations, especially with geographically distributed structure, the problem of timely receipt of documents from remote divisions and branches in the main office is particularly acute. This problem can be solved by a courier delivery of paper documents in the uniform center then they are scanned, indexed and loaded into corporate information systems. The main lack of such approach is the impossibility to provide operational filling with documents of information systems owing to objective restrictions on minimum possible delivery date (from one to several days). It is possible to carry considerable transport costs and also a possibility of loss of documents in way to other essential shortcomings. Therefore if the organization is not large holding, there is no sense to create OTsO. Other approach is the center of digitization consisting of a different type of the equipment under centralized operation.
Center of digitization
The center of digitization assumes a hardware-software component which different types of the scanning equipment and the software enter.
For the organization of the distributed input of documents at the main office of the organization the universal center of digitization which is designed to optimize a fabrication cycle of stream input and document handling in territorially remote enterprises is created and to manage it. In each remote division section of scanning/document handling is created. After digitization electronic copies of documents in real time come on further processing stages to the main office where recognition, indexing, verification and filling of the database and corporate information systems is performed.
The organization of the distributed scanning of documents and transfer of electronic images from divisions provides visualization and transparency of process of receipt of information, carries out the accounting of the arrived images of documents and volumes of the done work.
Depending on a flow of documents the distributed scanning of paper documents is performed at the level of departments and services or on specially created section. In divisions only scanning of documents and their transfer to the main office is implemented. The expediency of implementation of additional operations on sorting of documents and indexing on places is defined as a result of examination of flows of documents and the analysis of business processes of the organization in general and its territorially remote divisions.
In the center of digitization sorting of documents according to types, scanning, complete or partial recognition, indexing according to the set fields, check of correctness of the recognized information, filling of the database and the existing AIS is performed (in particular, EDMS).
Actually, within sections of digitization there can be several types of scanners for different documents: on the loaded sections high-speed scanners for the embroidered documents, the Elarskamaks type are installed, for digitization of the sewed documents the planetary equipment is used. On small sections there can be normal office scanners and also large-format scanners.
In the course of scanning improvement of quality of images is performed: process of cleaning of the image ("cleaning of garbage", "removal of noise", elimination of different distortions, alignment and orientation of the page, transferring of limit of the sheet and other functions of preprocessing of the image). Process can be as completely automatic, and with participation of operators.
When sample recognition "chokes"
Products of sample recognition can "choke" when they have to recognize, for example, more than thirty different document types with other things being equal of the center of digitization.
For example, in governing body of property of the region for the solution of a set of tasks of different divisions 17 million documents of 30 types created for a long time are demanded. The governing body will need to predict a certain optimal set of index fields that is extremely difficult — the archive of documents can be made universal only on condition of excess indexing. And the customer at a certain stage is limited to creation of electronic images of 17 million documents as expenses on indexing are too big. Other case: before governing body of town planning of a large Russian subject there was a task to fill an information system with data on documents for problem solving, including the judicial character connected with identification of objects of the squatter settlement. An array – more than 1500 cases containing tens of documents. It is required to carry out their scanning and indexing more than on forty fields: document type, the file number, approval date of the document, the name of an object, category of construction, construction type, number and registration date in the Town-planning inventory, the territory, the customer, the designer and so on.
Alternative to traditional approach on creation of an electronic information resource of documents of town-planning and land and property complexes is use of the automated system of a selection of documents (ASSD). The unstructured digital array in a type the scan images of documents is loaded into a system which performs draft recognition without verification. Then ASPD determines document belonging to a certain type by the set qualifiers and a set of regular expressions. For each of document types a set of the details necessary for inclusion on their basis in operational selections is programmed, and from documents on the basis of the same recognizable substrate index data are taken and contact documents. Names of documents are not formalized, and a system will recognize close words, for example "agreement", "contract", "agreement". A system interacts via the program interface (API) with any source containing documents — EDMS, ECM, the file system. The technology of information processing allows to define word forms, to reveal different entities in the text (it is made using dictionaries and/or rules of the use of words, for example with high probability the word from a capital letter standing after an abbreviation of joint stock company is the name of the organization): name of the organizations, name of streets, house numbers, etc.
If a set of folders (nomenclature affairs) was a paper prototype of an array, then case will be processing unit in ASPD. At determination of type of the page, a system does logical "tab" with the parameters of this document and the selected index fields tied to it which, in fact, is a card of the document mentioned by this "tab". The unstructured array as a result of processing by a system turns into the formalized electronic information resource. As a result of a request which is performed in a universal search field a system will independently recognize a format and the maintenance of a request, proassotsiirut them with the address or the name of the legal entity, cadastral or conditional number, date of the lease agreement, etc. On the interface of a workplace of the specialist of governing body of property in addition to search string there will be only three buttons — "Documents on Lease", "Documents on an Object" and Tenant. Specialists will be able quickly to create selections of documents by the name of the legal entity, a fragment of the address of finding of an object, conditional or cadastral number, a lease agreement number. Every time when the division of administration has a need to fill a system with new documents, they after digitization will be processed also automatically and will be included in processes of ensuring activity of departments.
The interface of a workplace of the employee of governing body of a town-planning complex will consist also of search string and three buttons – Allowing Documentation, Project documentation and Regulating documentation. Selection is performed respectively: by cadastral number, the postal or construction address; across the territory, the period and the designer; on the period, the territory, the name of the body which adopted the regulation.
In addition to types of the formalized documents, the array will contain also accompanying receipts on sending the registered mail, judicial notifications, etc. Create the global reference book, considering also these document types, it is inexpedient. The placed tabs give the chance to provide direct access to the document, and a prosmotrovshchik — "thumb through" not only this document, but also close to it.
Thus, the automated system of a selection of documents allows to solve not only standard problems, but also any new, being very practical alternative to traditional stream input.