OLAP systems
OLAP (from engl. OnLine Analytical Processing — online analytical processing of data, also: analytical data processing in real time, interactive analytical data processing) — the approach to analytical data processing which is based on their multidimensional hierarchical view, being a part of wider information technology field — a business intelligence (BI).
You look the directory of the OLAP solutions and projects in the section of OLAP at TAdviser.
Content |
From the point of view of the user, OLAP systems represents means of flexible viewing information in different cuts, automatic obtaining aggregated data, accomplishment of analytical transactions of convolution, detailing, comparison in time. All this does OLAP systems by the solution with obvious advantages in the field of data preparation to all types of the business reporting assuming data view in different cuts and different hierarchy levels — for example, sales reports, different forms of budgets and so on. Pluses of similar representation and in other forms of data analysis, including are obvious to forecasting.
Requirements to OLAP systems. FASMI
The key requirement imposed to OLAP systems — the speed allowing to use them in the course of interactive work of the analyst with information. In this sense of an OLAP system are opposed, first, to traditional RDBMS, selections of which with the requests, standard for analysts, using grouping and aggregation of data are usually expensive on waiting time and loading of a RDBMS therefore interactive work with them at a little considerable amounts of data is complex. Secondly, OLAP systems is opposed also to normal ploskofaylovy data view, for example, in the form of often used traditional spreadsheets, submission of multidimensional data in which difficult and not intuitively, and operations on change of a cut — the points of view on data — also require time expenditure and complicates interactive work with data.
The term OLAP offered by Edgar Codd for differentiation of such systems with OLTP systems (from engl. OnLine Transaction Processing — transaction processing in real time), some experts consider too wide. Therefore Nigel Pends (Nigel Pendse) suggested to use for the description of this concept and instead of 12 rules OLAP offered by Codd the so-called FASMI test (from engl. Fast Analysis of Shared Multidimensional Information — the fast analysis of the available multidimensional information), more precisely haraketerizuyushchy requirements to these systems.
Fast (fast) in reflects the requirement to the speed of reaction of a system mentioned above. On Pendsa, intervals from the moment of initiation of a request before obtaining result it has to be measured seconds. Importance of this requirement increases when using such systems as the instrument of operational data view for the analyst as the progressive tense of waiting can harmful influence a chain of reasonings of the analyst.
Analysis (analysis) assumes fitness of a system to use in relevant for a task and the user to business logic with preserving to the available "normal" user of ease of operating by data without use of low-level special tools.
Shared (availability, general availability) describes the obvious requirement to a possibility of simultaneous multi-user access to information with the integrated system of differentiation of access rights up to the level of a specific cell of data.
Multidimensional (multidimensionality) is the key requirement of the concept. It is supposed that a system should provide a full support of multidimensional hierarchical view as "most logical way of the analysis of business and organizations". Let's note that the multidimensionality indicates model of conceptual data view, i.e. how the user should represent data structure at formulation of requests, but not on in what structures data physically are stored.
The multidimensionality within OLAP assumes conceptual data view in the form of a multidimensional structure of data — a hyper cube (OLAP cube) in as which edges measurements (dimension), and data (facts — the facts act; measures are measures, indicators) are located on intersection of axes of measurements.
At the same time measurement usually represents the flat or hierarchical list. For example, measurement Partners can include the list of partners of the company, measurement Time there is a list of branches with geographical grouping (the region of the world, the country, the region, the city, branch). If as a measure sales volume is determined, then on a cut on measurements Partners and Time we will have the table with data on scope change sale on partners in time as which row headings and columns our measurements — Time and Partners will act, and in cells on intersection of lines and columns data on sales volume during a specific time frame for the specific partner will be located values of a measure, i.e.
Information (information) — all this relevant to the user's purposes data, at the same time availability of "excess" data negatively affects the requirement to the speed of reaction of a system.
Features of architecture. Classification of OLAP systems
Several factors have an impact on architecture of the specific OLAP systems. Among them — interaction with data sources, features of the organization of data storage in the OLAP system and approach to data processing in it.
Data sources
OLAP systems is seldom used as means of direct storage and data modification (except for some simple and small-scale budgeting systems, accounting and the analysis of sales, etc.) as the majority of the data used in OLAP for the analysis are generated in other information systems (ERP CRM HRM etc.).
At the same time, on the one hand, requirements to data, specific to OLAP systems, usually mean data storage in the special structures optimized under standard tasks of OLAP, from other storna, direct extraction of data from the existing systems in the course of the analysis would lead to significant falling of their performance.
Therefore, the important requirement is providing makimalno a flexible linking of import export between the existing systems acting as data source and an OLAP system and also an OLAP system and external applications of data analysis and the reporting.
At the same time such sheaf should meet obvious requirements of support of import export from several data sources, implementation of procedures of cleaning and transformation of data, standardization of the used qualifiers and reference books. Besides, need of accounting of different turnover cycles of data for the existing information systems and unification of the required detail level of data is added to these requirements. The complexity and versatility of this problem led to emergence of the concept of data warehouses, and, in a confined sense, to selection of a separate class of utilities of converting and data translation — ETL (Extract Transform Load).
Models of storage of active data
We stated above that OLAP assumes multidimensional hierarchical data view, and, in a sense, is opposed to systems which are based on a RDBMS.
It, however, does not mean that all OLAP systems uses multidimensional model for storage of active, "working" data of a system. As the model of storage of active data has an impact on all requirements dictated to FASMI test, its importance is emphasized what on this sign is traditionally selected by the OLAP subtypes — multidimensional (MOLAP), relational (ROLAP) and hybrid (HOLAP).
At the same time, some experts, led by above-mentioned Nigel Pends, specify that the classification which is based on one criterion is insufficiently complete. Especially as the vast majority of the existing OLAP systems will treat hybrid type. Therefore we will stop on models of storage of active data in more detail, having mentioned what of them correspond what of the traditional OLAP subtypes.
Storage of active data in a multidimensional DB
In this case data of OLAP are stored in the multidimensional DBMS using the constructions optimized for data of this kind. Usually multidimensional DBMS support also all standard for OLAP transaction, including aggregation on required hierarchy levels and so on.
In a sense it is possible to call this type of data storage classical for OLAP. All steps on preliminary data preparation, however, are fully necessary for it. Usually data of multidimensional DBMS are stored on a disk, however, in certain cases, for data processing acceleration such systems allow to store data in RAM. Storage in a DB of in advance calculated modular values and other settlement values also is sometimes applied to the same purposes.
The multidimensional DBMS which are completely supporting multi-user access with the competing transactions of a read and write are rather rare, the normal mode for such DBMS is single-user with access to record at multi-user on reading, or multi-user only on reading.
Among the conditional shortcomings characteristic of some implementations of multidimensional DBMS and OLAP systems which are based on them it is possible to note their exposure to growth of amounts, unpredictable from the user point of view, of the place taken by a DB. This effect is caused by desire as much as possible to reduce a system response time, dictating to keep in advance calculated values of modular indicators and other values in a DB that causes the nonlinear growth of amount of information which is stored in a DB with adding in it new values of data or measurements.
Extent of manifestation of this problem and also the related problems of effective storage of the rarefied cubes of data, is defined by quality of the applied approaches and algorithms of specific implementations of OLAP systems.
Storage of active data in a relational DB
Data of OLAP also can be stored in a traditional RDBMS. In most cases this approach is used in attempt of "painless" integration of OLAP with the existing accounting systems, or data warehouses which are based on a RDBMS. At the same time, this approach demands from a RDBMS for ensuring effective implementation of requirements of FASMI test (in particular, providing the minimum response time of a system) some additional opportunities. Usually data of OLAP are stored in the denormalized type, and a part of in advance calculated units and values is stored in special tables. At storage in the normalized type the efficiency of a RDBMS as a method of storage of active data decreases.
The problem of the choice of effective approaches and algorithms of storage of the precalculated data is also relevant for OLAP systems which is based on a RDBMS therefore producers of such systems usually focus attention on advantages of the applied approaches.
In general it is considered that OLAP systems which is based on a RDBMS more slowly than systems which are based on multidimensional DBMS including at the expense of less effective for tasks of OLAP of structures of data storage, however in practice it depends on features of a specific system.
Among data storage advantages in a RDBMS usually call big scalability of such systems.
Storage of active data in "flat" files
This approach assumes storage of portions of data in normal files. Usually it is used as addition to one of two basic approaches for the purpose of acceleration of work due to caching of up-to-date data on a disk or in RAM of the client PC.
The hybrid approach to data storage
Most of producers of the OLAP systems advancing the complete solutions which are often including in addition to actually DBMS OLAP system, the ETL tools (Extract Transform Load) and reportings now use the hybrid approach to the organization of storage of active data of a system, distributing them in one way or another between a RDBMS and specialized storage and also between disk structures and caching in RAM.
As the efficiency of such solution depends on the specific approaches and algorithms applied by the producer to determination of what data and where to store, to hasty draw conclusions about initially bigger efficiency of such solutions as class without assessment of specific features of the considered system.
OLAP (engl. on-line analytical processing) – set of methods of dynamic processing of multidimensional requests in analytical databases. Such data sources usually have quite large volume, and in the means applied to their processing one of the most important requirements is high speed. In relational DB information is stored in separate tables which are well normalized. But advanced multitabular queries in them are executed quite slowly. Considerably the best indicators on processing speed in OLAP systems are reached due to feature of structure of data storage. All information is accurately organized, and two types of data warehouses are applied: measurements (contain the reference books separated on categories, for example, of a sales point, clients, employees, services, etc.) and the facts (characterize interaction of elements of different measurements, for example, on March 3, 2010 the seller of A rendered service to the client of B in shop B for the amount of of monetary units). Measures are applied to calculation of results in an analytical cube. Measures represent sets of the facts aggregated on the corresponding selected measurements and their elements. Thanks to these features smaller time, than in relational sources is spent for advanced queries with multidimensional data much.
One of the main vendors of OLAP systems is the corporation Microsoft. Let's consider implementation of the principles OLAP on practical examples of creation of an analytical cube in applications Microsoft SQL Server Business Intelligence Development Studio (BIDS) and Microsoft Office PerformancePoint Server Planning Business Modeler (PPS) and we will get acquainted with opportunities of visual representation of multidimensional data in a schedule type, charts and tables.
For example, in BIDS it is necessary to create an OLAP cube by data about insurance company, her workers, partners (clients) and sales points. Let's assume the assumption that the company provides one type of service therefore measurement of services is not required.
At first we will define measurements. From activity of the company the following entities (category of data) are connected:
- Sales points
- Employees
- Partners
Also measurements Time and the Scenario which are obligatory for any cube are created.
Further one table is necessary for storage of the facts (fact table).
Information in tables can be entered manually, but data loading using the master of import from different sources is most widespread.
In the following drawing the sequence of process of creation and filling of dimension tables and the facts manually is provided:
Fig. 1. The dimension table and the facts in an analytical DB. Sequence of creation
After creation of multidimensional data source in BIDS there is an opportunity to browse its representation (Data Source View). In our example the scheme given in the drawing below will turn out.
Fig. 2. Representation of data source (Data Source View) in Business Intellingence Development Studio (BIDS)
As we see, the fact table is connected with dimension tables by means of unambiguous compliance of identifying fields (PartnerID, EmployeeID, etc.).
Further deployment of a cube is made. Besides, if necessary hierarchies, attributes of measurements are in addition configured, the calculated measures are created.
Let's look at result. On a tab of the observer of a cube, dragging measures and measurements in fields of results, lines, columns and filters, we can gain an impression of the interesting data (for example, the concluded bargains on insurance agreements concluded by a certain worker in 2005):
Fig. 3. Viewing analytical cube
Main players and solutions
OLAP systems is a part of the vast majority of solutions for a business intelligence, the "corporate" editorial offices DBMS of prime vendors (IBM, Microsoft, Oracle). To some extent OLAP technologies are used in an essential part of modern ERP systems. Preference is given to the OLAP tools offered by BARS GROUP in public sector of the Russian Federation.
External links
To get acquainted with examples of data visualization on the basis of a cube of BIDS and also to learn about opportunities of creation of multidimensional models in Microsoft Office PerformancePoint Server it is possible here
Sources
[1]Codd E.F., Codd S.B., Salley C.T. "Providing OLAP (On-line Analytical Processing) to User-Analysts: An IT Mandate". Codd & Date, Inc, 1993. Retrieved on 2008-12-11.
[2]Nigel Pendse. "What is OLAP? An analysis of what the often misused OLAP term is supposed to mean. Retrieved on 2008-12-11.
[3]Nigel Pendse. "OLAP architectures". Retrieved on 2008-12-15.