RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2010/05/11 22:39:33

ETL (Extract Transform Load)

ETL (from engl. Extract, Transform, Load there is an extraction, conversion, loading) — one of basic processes of management of data warehouses and also the name of a class of utilities of automation of this process. ETL in a confined sense treats technologies of consolidation of data, however, the modern solutions presented at the market support in addition to consolidation and implementation of federalization of data and also data exchange.

Content

Process of ETL

ETL includes:

  • extraction of data from external sources;
  • their conversion according to requirements of a business model;
  • loading of the transformed data in a direct system (for example, the data warehouse).

Despite the seeming simplicity, each of project stages in reality it is rather difficult. First, different information systems which data storage formats and procedures of their extraction can significantly differ can act as external sources of information. Secondly, ETL process does not come down only to technical conversion of formats — data from diverse sources should be unified also in terms of business rules, unity of the applied systems of data coding, qualifiers and reference books. Thirdly, process should consider also features of business processes of the company, including, functioning of speakers as data sources of separate information systems, frequency obnovlenya data in them, etc.

Therefore, in process of formation in practice of business of the settled requirements to process of ETL, some experts and suppliers speak about a trend of transition to the concept of "corporate application integration" (engl. Enterprise Application Integration) or the "corporate service bus" (engl. Enterprise Service Bus) complementing ETL process with tasks of ensuring data exchange in "real time", to profiling of data (engl. Data profiling), quality control of these (Data quality), the organization of metadata and some other, at the same time considering them in terms of logic business processes.

Implementation of ETL

In fact, any specific ETL process can be automated directly using the majority of the modern languages of programming. Besides, most of suppliers a component of BI solutions provide data translation between the products. At the same time, if it is not about simple single converting of small amount of data between two systems, and about building of process of permanent data integration of several diverse sources, then it makes sense to consider option of use of the specialized utilities facilitating automation of standard transactions, support of the main used formats and most widespread information systems. At the same time it is necessary to take parameters of scalability, speed and expansibility of such utilities into account.

Main players and solutions

IBM InfoSphere DataStage

Talend Open Studio

Pentaho Data Integration

Informatica

See Also

DWH