Content |
Premises of emergence of a problem
The problem of data integration is the integral aspect of a perspective of development of information infrastructure of the enterprise.
Historical roots of a problem closely intertwine with evolution of approaches to business automation. Non-automated data storage did not assume broad formulation of the question about their reuse — for use of the data created in the course of activity of the enterprise and recorded in paper or other not electronic the carrier, repeatedly on other section of activity their duplication in the necessary form was required.
The first projects of automation of business which are technology connected with use of mainframes assumed automation of specific functional objectives without backlog under their expansion and integration within processes of the enterprise. Besides, solutions of this stage relied if necessary on repeated input of the same data, as due to the domination inherited from non-automated processes of work with data of approaches and because labor costs on repeated input in terms of money long time were incomparably low costs of the organization of data storage in machine memory. At this stage also the value of real data on business which is sometimes estimated as equal now (or exceeding) of the value of algorithms of their analysis was not widely realized.
In process of emergence of the information systems which are based hardware on minicomputers and, afterwards, the PC extended as a circle of the enterprises capable to afford implementation of such systems, and a circle of tasks solvable such the AIS. However, the overwhelming prevalence of logic of developers over Business Logic and the dominating approach on automation of functional tasks, led to the fact that such the AIS became sections of the so-called "scrappy" automation which is not assuming conscious system approach to business automation. At the same time need of data storage specific the AIS and their reservations is already considered, a part of systems is implemented taking into account multi-user access and on the basis of the client-server architecture. Need of "data exchange" between different the AIS of the enterprise, however, is practically not taken into consideration and still generally is removed due to repeated input with rare exceptions in the form of separate specific solutions.
With growth of sections of automation shortcomings of "scrappy" automation — lack of uniform approach to the AIS organization, the choice of the platform and tools begin to affect fully, to models of data structure lead to increase of duplication of the same data in different the AIS within one enterprise. The situation when the user is forced to enter repeatedly similar or close data into slightly systems, adjacent on functionality, can be an example. At the same time the organization of interaction of systems at the programming layer is often disturbed by absence of Application Programming Interface (API). In addition to actually growth of labor costs on repeated input and increase of mismatch of data in the different systems and numbers of errors, the fragmentariness of data storage results in lack of a uniform picture of activity of the enterprise.
With the advent of the concept of BI and the analytical systems, including, OLAP becomes explicit the need of special data preparation for such systems caused both by fragmentariness of data sources for the analysis and the special requirements to data structure for the purposes of the analysis formulated by Edgar Codd within 12 rules OLAP, specified by Nigel Pendse within tests of FASMI and others.
Approaches to data integration
Now data integration can be divided on traveling direction into three types — consolidation, federalization and data exchange.
Consolidation
Consolidation — data collection from several sources (it is normal — the accounting systems) in uniform storage location. The consolidated data most often are used for the purposes of the analysis or preparation of the reporting as, for example, in a case with data warehousing for BI. At the same time the specifics of collecting of diverse information from several sources obsulovit a number of features of consolidation of data, in particular, a data-refresh delay in target storage location in comparison with systems data sources. This delay is caused as need of approval of cycles of updates for the different systems data sources, and need of data translation from different formats to a format of target storage location of data which in many real applications is an uncommon task. For the classical purposes of BI-applications, the small delay in data-refresh in target storage location was not problem as the analytics and forecasting assumed operating by wider intervals, than the accounting systems. However, in process of emergence of requirements to coordination of a business intelligence with operational management, requirements to conversion rate of data purchase the increasing importance, imposing new requirements to the technologies using consolidation and forcing to look for alternative approaches.
The consolidation of data which is the most often used by technology can consider ETL (Extract Transform Load) assuming extraction of data from external sources, their conversion according to requirements of a business model, loading of the transformed data in a direct system. At the same time the modern ETL systems understand as conversion (transformation) not only technical conversion of formats, but also possibilities of standardization of diverse data in terms of the corresponding regulations, ensuring unity of the applied systems of data coding, qualifiers and reference books.