The name of the base system (platform): | Amazon Web Services (AWS) |
Developers: | Amazon |
Date of the premiere of the system: | May, 2019 |
Technology: | SaaS - The software as service, the Systems of stream recognition are EDMS |
2019: Announcement
At the end of May, 2019 Amazon started a cloud service for recognition of documents of Textract which is capable to take automatically from pages the text, tables and other data. Different formats, including JPEG, PNG and PDF are supported.
Textract treats programs of optical text recognition (OCR), as well as, for example, Abbyy FineReader. Unlike many OCR solutions of Textract not just takes the text from documents, but also distinguishes their format and contents. For example, it distinguishes tables and forms in documents, including in checks, tax declarations and consignment notes and also supports graphic formats. After recognition of software independently structures data.
Amazon claims that the Textract service is capable to define passport data, dates of birth and the addresses then it is correct to interpret regardless of in what place of the page they are. In case of change of a template a system will not pass the wrong result.
According to developers, it was succeeded to achieve high efficiency of recognition due to use of the machine learning (ML) for processing of millions of documents. As a result a system learned to identify correctly the text and objects "practically in any" a document type.
Developers for connection of Textract to the applications do not need to be machine learning specialists, the vice president of department of Amazon Machine Learning Swami Sivasubramanian says. They can take the text and data, using DBMS and analytical services Amazon and to adjust integration with other MO-services.
Textract is intended for automatic recognition of a large number of documents. The cost of use of service begins with $1.5 for 1000 processed pages.[1]