[an error occurred while processing the directive]
RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2018/01/22 15:01:59

How to struggle with a dangerous disease of a digital era - impetuous moneymaking of data?

An opportunity to analyze Big Data, in a popular speech the called Big Data, is perceived as the benefit, and is unambiguous. But is that really the case? What can lead impetuous data storage to? Most likely to what domestic psychologists in relation to the person call pathological moneymaking a sillogomaniya or it is figurative "Plyushkin's syndrome". In English the vicious passion to collect everything call a hording (from engl. hoard - "stock"). On classification of mental diseases the hording is ranked as mental disorders. To a digital era to a traditional material hording it is added digital (Digital Hoarding), both individuals, and the whole enterprises and the organizations can suffer from it.

Data storage is simpler, than things, but less considerably surrounding. However, a status of own PC or phone - the personal record. Worse another, from a digital hording during an era of Big Data many suffer if not most the enterprises. They are guided by the fact that it is possible to store in a row everything that can be used once, in conditions when the cost of storage snizhtsya constantly. Meanwhile, the damage from storage of redundant data changes the enormous amounts and it is connected not only with the losses caused by obvious expenditure for energy, service and the occupied space, but, what is more important, with difficulties of the analysis is excessive large volumes of data.

The general council to avoid undesirable effects of a digital hording, it is necessary to follow several simplest recommendations:

  • Get rid of superfluous - if did not use any data for several years, they most likely are not necessary.
  • Not arrange the isolated dumps from data - the fact that in English call silo, i.e. a silage tower.
  • Not accumulate the raw and defective data (the incomplete or containing errors).

Structured data, including the annotated data or marked – the fixed placement, the predetermined content, strict formats. Semi-structured data – unstable placement, the predetermined content, variable formats, the tabulated data. Unstructured data – any placement, variable content, multipage documents

The problem of a digital hording is connected with rapid growth of volumes. In ten years, from 2010 to 2020 there will be a 50-fold increase in the stored data, and for more than 90% - it will be corporate data.

Before, when the monopoly for data storage belonged to DBMS, the problem of a digital hording could not be by determination. Structured data which are usually stored in relational databases by the nature are arranged. In the 21st century unstructured data became the main source of a digital hording.

The idea of creation of the computer database in her modern understanding was offered in the late fifties in SDC company which is considered the first software company to this day. The company was state, it developed the software for a well-known managing complex in the computer history of the project intended for tracking airspace under the name SAGE (Semi-Automatic Ground Environment) which is still translated as "wise man".

Since then DBMS passed huge, not deprived of strangenesses, a way of development on which creation of the theory of relational bases is of particular importance. The strangeness can be considered the fact that for many years mathematicians purists for whom mathematical severity of the RDBMS relational bases, the theorem of CAP and some other things were more essential, than the real nature of data and need of users were ideological leaders of the movement. Until the application of computers was limited to the transaction systems, there were separate attempts of infringement of monopoly of a RDBMS like post-relational DBMS, but they were not mass. At the beginning of the second decade of the 21st century with the advent of clouds and Big Data the situation changed - alternative DBMS, for example, of NoSQL and NewSQL moved to the forefront.

There is a natural question - what to replace DBMS in relation to new conditions with? "platforms of Big Data" (BDP, Big Data Platform) can become the answer to it. You should not confuse this type of platforms with relatives on sounding by platforms - what in marketing is called the user platforms of data of CDP (Customer Data Platform) forming databases for CRM (Customer Relationship Management).

From the structural point of view (we will lay mathematical aside) DBMS significantly more simply BDP – is the homogeneous tabular storage constructed on the relational principles and there is a data access by means of SQL. The environment of Big Data of a geterogenn, its approximate structure it is shown in the drawing below.

Big Data environment

BDP provide to the enterprise integration of all these functions and a complete view of data, approximately such as in due time DBMS, but with significantly other great opportunities regarding adaptation to the changes happening in business to scaling and to work in a cloud environment.

Understand solution type which integrates in itself(himself) applications and means for solving of tasks of processing of large volumes of data as BDP. The BDP platform usually consists of data warehouses, databases, servers, management tools data and means for analytics, in particular BI. As an organic component of BDP serves analytical software (Big Data Analytics Software). Advantage of platform approach in reduction of system complexity. A system can be implemented on the platform of the customer (On-Premise) or in a cloud.

The BDP platforms are designed to solve the following problems:

  • Transformation of data into a full-fledged corporate resource for use.
  • Collecting and preserving of data (Data Ingestion), management of data (Data Management), ETL (Extract, Transform, Load, i.e. extraction, conversion and loading) and support of the data warehouse (Data Warehouse).
  • Support of program construction of Hadoop for work with Big Data using open codes and cloud computing.
  • Stream data processing (Stream computing), the high-performance solution allowing to obtain in real time data from different sources, to perform preprocessing and to reduce them in one flow.
  • Analytics with machine learning.
  • Content management.
  • Data integration from all possible sources.
  • Organizational management by corporate data (Data Governance).

The market of BDP progresses, more than 100 different platforms are offered to consumers as of 2017, the overview of 50 the most popular can be found in the material "Top Big Data Platforms fnd analytical software"[1].

BDP from MapR company - Converged Data Platform is considered classical. It is called convergent because in it all making BDP are aggregated, in that time as many products belonging to this class are specialized under certain applications.

BDP from MapR company - Converged Data Platform is considered classical
MapR Converged Data Platform интегрирует Hadoop, Spark и Apache Drill с базами данных, работающими в режиме реального времени, с глобальными потоками событий, масштабируемыми DWH new generation

MapR Converged Data Platform integrates Hadoop, Spark and Apache Drill with the databases working in real time with the global event streams scaled by DWH of new generation. At the same time MapR maintains security of the corporate level, thanks to integration reduces operating costs and investments into the equipment. Use of the BDP platforms allows to take real advantages from Big Data and minimizes a problem of a digital hording.

Notes