Customers: Tinkoff Bank
Contractors: Glowbyte Consulting (Gloubayt Consulting) Product: EMC Greenplum Data Computing ApplianceНа базе: EMC Greenplum Database Edition Second product: Splunk Enterprise Subsystem of processing of machine data Project date: 2011/12 - 2012/05
|
Content |
Business Intelligence in Tinkoff Bank
On June 30, 2015 Sotnichenko Sergey as the chief of the department of data warehouses and the reporting of Tinkoff Bank made the report on the TAdviser Big Data and BI DAY conferences. In the performance he told about implementation and development of solutions of a business intelligence in his company. To this project in Tinkoff bank 8 years, for the reporting 500 people use BI systems. Traditional DWH (Data Warehouse, data warehouse) and BI and integration approach to their implementation was implemented into Tinkoff. Big Data began to be used in the company as a part of strategy of DWH.
Analytical
- Complex business metrics
- Compound integration
- The high relevance of data is not critical
Operational
- Separate business processes
- Easy integration – one system
- The high relevance of data is necessary
Operational BI – data
- We build the report directly based on a business system = the High load on a business system
- We build an otchetn of data of DWH = High latency of data – day and Insufficient flexibility
- We build the report on Operational Intelligence - Splunk = Not all data are available and Not all logic is implemented
We use remarks = High flexibility. The acceptable latency (1-3 hours) and Base source does not negruzhatsya
GoldenGate replication
Real time data warehousing mode
- Minute latency of data in ODS
- Magazine of change of tables of sources in ODS
Replication of O2G
The main objective – to move from Oracle to Greenplum tens of millions of changes per hour without creating a high load neither on Oracle, nor on Greenplum
- Own development
- The effective engine of replication from Oracle in Greenplum
- Admissible latency (~ 1-3 hours) in Greenplum
- The platform for analytical requests
- Batchevy unloading + only the changed data are unloaded
- Internal bulk-mechanism of loading Greenplum
O2G are the facts
- Latency the given no more than 1-3 hours
- Low load of Greenplum (~ 10% of the max resources)
- Effective data storage in Greenplum – pokolonochny compression
- A possibility of unloading in several Greenplum'ov
- Other consumers of data of ODS (DWH, bus)
- We store all changes according to all tables during all the time
Big Data – Velocity, Volume
- 14 TB – the volume of tables of all sources in ODS
- 3 TB - the size of the remarks ODS in GreenPlum: 3 Tb
- 500 million transactions a day
- 75 billion transactions for half a year
Glowbyte Consulting integrator
The EMS corporation announced in the spring of 2012 completion of implementation by Tinkoff Credit Systems Bank of the platform of storage and analytics of superlarge volumes of data of EMC Greenplum. Integrator consultant in the project Glowbyte Consulting company.
Objectives
Plans for building of the customer base and the increased requirements to the processing speed of accumulated information caused the necessity of application of specialized means of work with Big Data and adaptations of analytical infrastructure of Bank to real-time operation.
Implementation
The project on migration of the corporate data warehouse of Bank on the distributed analytical EMC Greenplum platform became the first stage of the selected strategy. The main criteria which defined the choice of Bank became: the highest speed of loading and data processing, scalability of the solution, a possibility of polymorphic data storage, existence of technology of compression, deep integration with the analytical products of SAS Institute company used by Bank.
The main project started in December, 2011 and was executed in 6 months, using methodology of iterative development.
During implementation of the project the infrastructure existing in Bank in the form of standard DBMS working on the heavy SPARC architecture servers was replaced with the EMC Greenplum system constructed based on architecture of massive parallel computings without the separated components (MPP, Massive Parallel Processing). In this architecture which was developed especially for creation of a business intelligence and analytical processing each device functions as self-sufficient DBMS which owns a certain part of the general data and manages them. This system automatically distributes data and parallelizes workloads of requests for all available equipment, using the principles of Map Reduce.
As a result time of the solution of analytical tasks was reduced at least ten times, and for some – more than by 100 times. Use as nodes of a system of servers of "standard" architecture allowed to receive cost-efficient and unlimited linear scalability of computing powers.
The complexity of the project, in addition to deployment of new infrastructure of data storage, consisted in need to integrate new approaches to loading and extraction of data with the analytical SAS Institute systems used by Bank, having saved at the same time integrity and operability of the operating business processes of Bank. For accomplishment of an assigned task the project team from consultants of the Glowbyte Consulting company which undertook functions of the system integrator, the engineers of World IT Systems who were responsible for setup and operation of working environments and specialists of department of IT of Bank was brought together. Designer's service and audit of project solutions were performed by representatives of EMC/Greenplum and SAS Institute companies. As curators of the project CTO and the founder of EMC Greenplum company Luke Lonergan and the Director of SAS Global Technology Practice Mark Torr acted.
Customization
During the project complex reengineering more than 350 loading processes and data translations was performed, the library of ELT transformations optimized for work with Greenplum and SAS is developed, infrastructure of direct access of business analysts of Bank to a detailed layer of data for carrying out data mining of researches without attraction of resources of internal IT is created, are developed backup procedure and disaster recovery of data.
Thus, the Bank started the platform ready to data loading in storage and to updating of analytical show-windows in real time that for the company using knowledge of clients as competitive advantage is the priority direction. Further the Bank is going to develop not only the Greenplum tool adapted for massive parallel computings but also and Hadoop intended for processing of unstructured data and the Chorus platform which gives a joint work opportunity with the corporate data obtained from different sources.
Vyacheslav Tsyganov, the vice president, CIO, Tinkoff Credit Systems bank, noted: "The value of the executed project for Bank, despite its explicit technology orientation, consists in development of the culture of decision making existing in Bank on the basis of information analysis. The ability to turn the saved-up data into knowledge is sign of competitiveness of Bank long ago, and data – a strategic asset and potential for future growth. In the nearest future Banks which understand their behavior better will be demanded by clients, habits also as much as possible correspond to them. We are convinced that successfully started analytical platform of data EMC Greenplum, thanks to the unique parameters scalabilities and productivities will allow to increase not only the speed of decision making, but also value and relevance of our knowledge of clients".
"We are glad that one of the Russian banks which are most advanced in the technology relation made a choice for benefit of the solution EMC Greenplum. Now specialists of bank can broaden the data warehouse without prejudice to performance, including be connected to new sources of Big Data, for example, to social networks. The necessary resource of storage is selected automatically and released after the solution of a specific objective. In fact, analytical processing of large volumes of the data structured or unstructured, after implementation of the platform of storage of EMC became for bank ordinary service of a corporate information system. The bank took the major step in the direction of creation of the online data warehouse, having created infrastructure which operability does not depend neither on data types, nor on their volume and even on rates of their growth", - Cobi Lif, the Sales director in the region Europe, the Middle East and Africa, Greenplum, division of EMC told.