Translated by
2017/10/24 16:45:31

Big Data

Information which it is already impossible to process by traditional methods, including structured data, media and accidental objects belongs to the category Big Data. Some experts consider that for work with them the traditional monolithic systems were succeeded by new massive and parallel solutions.


What is Big Data?

The simplest determination

From the name it is possible to assume that the term 'Big Data' belongs just to management and the analysis of large volumes of data. According to the report of McKinsey Institute 'Big Data: a new boundary for innovations, the competition and performance' (Big data: The next frontier for innovation, competition and productivity), the term 'Big Data' treats data sets which size exceeds possibilities of the typical databases (D) on entering, storage, management and information analysis. And world repositories of data, certainly, continue to grow. In the report of Digital Universe Study IDC analytical company submitted in the middle of 2011 which preparation was sponsored by EMC company, it was foretold that the total world amount of the created and replicated data in the 2011th can be about 1.8 zettabytes (1.8 trillion gigabytes) — approximately in 9 times more what was created in the 2006th.

IDC, Nexus of Forces Gartner
Traditional database and Base of Big Data

More difficult determination

Nevertheless 'Big Data' assume something bigger, than just the analysis of huge information volumes. A problem not that the organizations create huge amounts of data, and that бóльшая their part is provided in the format which is badly corresponding to the traditional structured DB format are web magazines, videos, text documents, machine code or, for example, geospatial data. All this is stored in a set of various storages, sometimes even outside the organization. As a result of corporation can have access to the huge volume of the data and not have necessary tools to set interrelations between these data and to draw significant conclusions on their basis. Add here that circumstance that data are now updated in increasing frequency, and you receive a situation in which traditional methods of information analysis cannot keep up with huge volumes of constantly updatable data that as a result and opens the road to technologies of Big Data.

The best determination

In effect the concept of Big Data means work with the information of huge volume and various structure which is very often updated and being in different sources for the purpose of increase in efficiency, creation of new products and improving competitiveness. The consulting company Forrester gives a short formulation: 'Big Data integrate technicians and technologies which make the meaning out of data on an extreme limit of practicality'.

The difference between a business intelligence and Big Data is how big?

Craig Bati, the chief executive on marketing and the director of Fujitsu Australia technologies, specified that the business analysis is descriptive process of the analysis of the results achieved by business during a certain time frame, meanwhile as the processing speed of Big Data allows to make the analysis predictive, capable to offer business of the recommendation on the future. Technologies of Big Data allow to analyze also more data types in comparison with business intelligence tools that gives the chance to be focused not only on structured storages.

Matt Slokum from O'Reilly Radar considers that though Big Data and a business intelligence have the identical purpose (search of answers to a question), they differ from each other in three aspects.

  • Big Data are intended for processing of more considerable information volumes, than a business intelligence, and it, of course, corresponds to traditional determination of Big Data.
  • Big Data are intended for processing of more quickly received and changing data that means an in-depth study and interactivity. In certain cases results form quicker, than the web page is loaded.
  • Big Data are intended for processing of unstructured data which use methods we only begin to study after could adjust their collecting and storage, and algorithms and a possibility of dialog for simplification of search of the trends which are contained in these arrays are required for us.

It agrees to the white book published by Oracle company 'An information architecture of Oracle: the guide of the architect according to Big Data' (Oracle Information Architecture: An Architect's Guide to Big Data), during the work with Big Data we approach information differently, than when carrying out the business analysis.

Work with Big Data is not similar to normal process of a business intelligence where simple addition of the known values brings result: for example, the result of addition of data on the paid bills becomes sales volume in a year. During the work with Big Data the result turns out in the course of their cleaning by consecutive modeling: at first the hypothesis is made, the statistical, visual or semantic model is under construction, on its basis correctness of the made hypothesis is checked and then following moves forward. This process demands from the researcher or interpretation of visual values or drawing up interactive requests on the basis of knowledge, or development of the adaptive algorithms of 'machine learning' capable to receive required result. And lifetime of such algorithm can be quite short.

Big Data≠Data Science

Big Data is:

  • Technologies of storage of large volumes of the structured and not structured data
  • Technologies of processing of such data
  • Quality management of data
  • Technologies of providing data to the consumer

Data Science is:

Analysis techniques of Big Data

There is a set of various analysis techniques of data arrays which cornerstone the tools borrowed from statistics and information science are (for example, machine learning). The list does not apply for completeness, however the approaches which are most demanded in the different industries are reflected in it. At the same time it is necessary to understand that researchers continue to work on creation of new techniques and improvement of existing. Besides, some of the techniques listed them are not necessarily applicable only to Big Data at all and can be used with success for arrays, smaller on volume (for example, A/B-testing, regression analysis). Certainly, than more volume and diversifiable array is exposed to the analysis, especially exact and relevant data manage to be obtained at the exit.

A/B testing. A technique in which the control sample in turn is compared to others. Thereby the best response of consumers to the marketing offer manages to reveal an optimal combination of indicators for achievement, for example. Big Data allow to carry out a huge number of iterations and thus to receive statistically reliable result.

Association rule learning. A set of techniques for identification of interrelations, i.e. associative rules, between variables in data bulks. It is used in data mining.

Classification. A set of techniques which the consumer behavior in a specific market segment allows to predict (decision making about purchase, outflow, consumption volume and so forth). It is used in data mining.

Cluster analysis. The statistical technique of classification of objects by groups due to identification beforehand not of the known general signs. It is used in data mining.

Crowdsourcing. A technique of data collection from a large number of sources.

Data fusion and data integration. And to compare a set of techniques which allows to analyze comments of users of social networks with results of sales in real time.

Data mining. A set of techniques which allows to define the most susceptible for the promoted product or service of category of consumers, to reveal features of the most successful workers, to predict a behavioral model of consumers.

Ensemble learning. The set of predicative models at the expense of what is involved in this method the quality of the made forecasts increases.

Genetic algorithms. In this technique possible solutions present in the form of 'chromosomes' which can be combined and mutate. As well as in the course of natural evolution, the most adapted individual survives.

Machine learning. The direction in information science (historically the name 'artificial intelligence' was assigned to it), which pursues the aim of creation of algorithms of self-training on the basis of the analysis of empirical data.

Natural language processing (NLP). Set borrowed from information science and linguistics of techniques of recognition of a natural language of the person.

Network analysis. A set of analysis techniques of communications between nodes in networks. In relation to social networks allows to analyze interrelations between certain users, the companies, communities, etc.

Optimization. A set of numerical methods for redesign of complex systems and processes for improvement of one or several indicators. Helps with acceptance of strategic decisions, for example, of structure of the product line brought to the market, carrying out the investment analysis and so forth.

Pattern recognition. A set of techniques with self-training elements for prediction of a behavioral model of consumers.

Predictive modeling. A set of techniques which allow to create a mathematical model beforehand of the set probable scenario of succession of events. For example, the analysis of the database of the CRM system regarding possible conditions which subscribers will push to replace provider.

Regression. A set of statistical techniques for detection of pattern between change of dependent variable and one or several independent. It is often applied to forecasting and predictions. It is used in data mining.

Sentiment analysis. Sensing technologies of a natural language of the person are the cornerstone of techniques of assessment of moods of consumers. They allow to isolate from the general information flow of the message, connected with the interesting subject (for example, a consumer product). Further to estimate polarity of judgment (positive or negative), emotionality degree and so forth.

Signal processing. A set of techniques borrowed from radio engineering which pursues the signal recognition aim against the background of noise and its further analysis.

Spatial analysis. Set partly the analysis techniques of space data borrowed from statistics – topology of the area, geographical coordinates, geometry of objects. As a source of Big Data in this case geographic information systems (GIS) often act.

Statistics. Science about collecting, the organization and interpretation of data, including development of questionnaires and carrying out experiments. Statistical techniques are often applied to estimated judgments of interrelations between these or those events.

Supervised learning. A set of the techniques based on machine learning technologies which allow to reveal functional interrelations in the analyzed data arrays.

Simulation. Behavior modeling of complex systems is often used for forecasting, prediction and study of different scenarios when planning.

Time series analysis. Set borrowed from statistics and digital signal processing of methods of the analysis of the sequences of data repeating eventually. Some of obvious applications – tracking of security market or incidence of patients.

Unsupervised learning. A set of the techniques based on machine learning technologies which allow to reveal the hidden functional interrelations in the analyzed data arrays. Has common features with Cluster Analysis.

Visualization. Methods of graphical representation of analysis results of Big Data in the form of charts or the animated images for simplification of interpretation of simplification of understanding of the received results.

Основная статья: Vizualiazation of data

The basic value for their interpretation has an evident idea of analysis results of Big Data. It is no secret that perception of the person is limited, and scientists continue to conduct researches in the field of improvement of modern methods of data view in the form of images, charts or animation.

Analytical tools

For 2011 some of the approaches listed in the previous subsection or their certain set allow to implement in practice analytical engines for work with Big Data. From free or rather inexpensive open systems of the analysis of Big Data it is possible to recommend:[1]

In this list is of special interest Apache HadoopSOFTWARE with open code which for the last five years is tested as the analyzer of the trackers of actions given by the majority[2]. As soon as Yahoo opened the Hadoop code to community with the open code, in the IT industry the whole direction on creation of products based on Hadoop without delay appeared. Practically all modern means of the analysis Big Data provide integration tools with Hadoop. Both startups, and the well-known world companies act as their developers.

Markets of solutions for management of Big Data

Platforms of Big Data (BDP, Big Data Platform) as means of fight against a digital hording

An opportunity to analyze Big Data, in a popular speech the called Big Data, is perceived as the benefit, and is unambiguous. But is that really the case? What can lead impetuous data storage to? Most likely to what domestic psychologists in relation to the person call pathological moneymaking a sillogomaniya or it is figurative "Plyushkin's syndrome". In English the vicious passion to collect everything call a hording (from engl. hoard - "stock"). On classification of mental diseases the hording is ranked as mental disorders. To a digital era to a traditional material hording it is added digital (Digital Hoarding), both individuals, and the whole enterprises and the organizations can suffer from it [2] (more detailed).

World and market of Russia

Big data Landscape - Prime vendors

Interest in instruments of collecting, processing, management and the analysis of Big Data showed nearly all leading IT companies that is quite natural. First, they directly face this phenomenon in own business, secondly, Big Data open excellent opportunities for mastering of new niches of the market and involvement of new customers.

In the market there was a set of startups which do business on processing of huge data arrays. Part of them use the ready cloud infrastructure provided by large players like Amazon.

The theory and practice of Big Data in the industries

Main article - The theory and practice of Big Data in the industries.

How to use analytical data for development of qualitative IT services

Main article - Use of analytical data for development of IT services

Development History


Forecast of TmaxSoft: following "wave" of Big Data will demand upgrade of DBMS

According to the report of IDC, in communication by growth of the amounts of data generated by the devices connected to the Internet, sensors and other technologies, the income connected with Big Data will increase from $130 billion in 2016 to more than $203 billion by 2020.[3] However those companies which have no IT infrastructure necessary for adaptation to revolution of Big Data will not be able to receive benefit from this growth, experts of TmaxSoft company consider.

The enterprises know that the huge amounts of data which are saved up by them contain important information on their business and clients. If the company is able successfully to apply this information, then it will have a powerful benefit in comparison with competitors, and she will be able to offer the best, than at them, products and services. However many organizations still cannot effectively use Big Data because their legacy IT infrastructure is incapable to provide the necessary capacity of storage systems, processes of data exchange, the utility and application necessary for processing and the analysis of big arrays of unstructured data for extraction from them to valuable information, specified in TmaxSoft.

Besides, increase in the processor power necessary for the analysis of constantly increasing amounts of data, can demand considerable investments into outdated IT infrastructure of the organization and also additional resources for maintenance which could be used for development of new applications and services.

According to Andrey Reva, the chief executive of TmaxSoft Russia, these factors will lead to the fact that the organizations which continue to use legacy infrastructure will be in the future forced to pay much more for transition to relevant technologies or will not be able to gain any effect of revolution of Big Data.

The phenomenon of Big Data forced many enterprises to realize need of collecting, the analysis and storage of the structured and unstructured data. However for implementation of these processes the action plan and the correct tools of process optimization is necessary. And really many companies are not able to gain notable effect of Big Data because of use of legacy DBMS in which there is not enough functionality and scalability, and as a result revolution of Big Data does not help their business in any way — Andrey Reva explained the forecast.

On belief of the representative of TmaxSoft, the enterprises the strategy considering, among other things, data sources for extraction, lifecycle of data, compatibility of different relational DBMS and scalability of storage is necessary.


Forecast of EMC: BigData and analytics in real time will integrate

In 2016 we will get acquainted with new chapter of history of development of analytics of "Big Data" in process of development of two-level model of processing. The first level will represent "traditional" analytics of BigData when data bulks are exposed to the analysis not in real time. New, second level will provide a possibility of the analysis of rather large volumes of data in real time, generally due to technologies of analytics in memory (in-memory). In this new phase of development BigData, such technologies as DSSD, Apache Spark and GemFire will be as important as Hadoop. The second level will offer us at the same time new and usual methods of use of "these lakes" - for "analytics on the fly" for the purpose of influence on events when they occur. It opens new opportunities for business in such scales which earlier nobody saw.

But in order that the analytics in memory became a reality, it is necessary that there were two events. First, the supporting technologies should gain necessary development to provide sufficient amounts of memory for placement of really large-scale data sets. It is also necessary to think of how effectively to move data between the big object storages and systems keeping the analysis in memory. These two elements work in essentially different modes, and IT groups will need to create special conditions that data could move with the necessary speed there and back and is transparent for users. Works are already conducted, appear the new object storages special a flash arrays for installation in a rack and also special technologies which can integrate them in one system. Initiatives with the open code will play an important role in search of the answer to this call.

Secondly, large-scale environments of calculations in memory require stability and dynamism of data. The problem consists that providing persistence of data in memory, we do steady also any their defects. As a result in 2016 we will see emergence of storage systems for the Wednesdays which are carrying out data processing to memories. They will provide services of deduplication, pictures of a status, multilevel storage, caching, replication and also a possibility of determination of the last status when data were correct and a system worked correctly. These functions will be extremely important in process of transition to analytics in real time when safer technologies of data processing in memory become commercial in 2016.


Gartner excluded Big Data from popular trends

On October 6, 2015 it became known of an exception of the report of Gartner "Cycle of a Maturity of Technologies 2015" of data on Big Data. Researchers explained it with washing out of the term — the technologies entering the concept "Big Data" became a daily reality of business[4].

The report of Gartner of Hype Cycle for Emerging Technologies excited the industry lack of technology of collecting and processing of data bulks. Analysts of the company explained the solution with the fact that a large amount of the technologies which are actively applied at the enterprises is a part of the concept "Big Data" they partially treat other popular spheres and trends and became the daily working tool.

Chart Gartner "Hype Cycle for Emerging Technologies 2015"

"Initially the concept "Big Data" was decrypted through determination from three "V": volume, velocity, variety. This term was meant as group of technologies of storage, processing and data analysis of large volume, with changeable structure and high update rate. But the reality showed that obtaining benefit in business projects is performed by the same principles, as earlier. And the described technological solutions in itself did not create any new value, having only accelerated processing of a large number of data. Expectations were very high, and the list of technologies of Big Data intensively grew. It is obvious that thereof borders of a concept were blurred to a limit" — Svyatoslav Shtumpf, the chief expert of group of marketing of the products "Petter Service" told.

Dmitry Shepelyavy, the deputy CEO of SAP CIS, considers - the subject of Big Data did not disappear, and was transformed to a set of different scenarios:

"Repairs on a status, exact agriculture (precision farming), systems on counteraction to fraud, systems in medicine allowing to diagnose and treat patients at qualitatively new level can be examples here. And also planning of a logistics system and transportation in real time, an advanced business intelligence for a support and maintenance of basic functions of the companies. One of the main trends now — Internet of Things allowing to connect machines among themselves (machine-to-machine). The installed electronic sensors produce millions of transactions per second, and the reliable solution capable to transform, save and work with them in real time is necessary".

In May, 2015 Andrew White, the vice president for the researches Gartner, in the blog reflected:

"[[Internet of Things of Internet of Things (IoT)|Internet of Things (Internet of Things, IoT)]] will overshadow Big Data as too focused technology. It can generate some more effective solutions and tools, but Internet of Things will become the platform of the future which in the long term will increase our productivity".

The similar ideas earlier - based on the report of Gartner for 2014, were published by the observer of Forbes Gil Press (Gil Press).

According to Dmitry Shepelyavy, there came an era when important it is not simple to be able to accumulate information, and to take from it business benefit. The first the industries which directly work with the consumer came to this conclusion: telecommunication and bank, retail. Now processes of interaction reach new level, allowing to establish relation between different devices using tools of augmented reality and open new opportunities of business process optimization of the companies.

"The concept "Big Data" lost interest for real business, on the chart Gartner its place was taken by other technologies with sounding more accurate and clear to business" — Svyatoslav Shtumpf emphasized.

This , first of all, machine learning — means of search of rules and communications in very large volumes of information. Such technologies allow not just to check hypotheses, but to look for influence factors unknown earlier. Segment of solutions on data storage and parallel access to them (NoSQL Database), on preprocessing of information streams (Marshalling), solutions for visualization and the independent analysis (Advanced Analytics with Self-Service Delivery). Besides, according to the expert, the means of intelligent data analysis (Business Intelligence and Data Mining) reaching new technology level save the value.

In understanding of "Yandex", according to the statement of the press service of the company, Big Data did not disappear anywhere and were not transformed. For processing of data bulks the company uses the same technologies and algorithms that applies in Internet search, service of " Yandex.stopper ", in the machine translator, in the recommendatory platform, in advertizing. Algorithms are based on ability of the company: accumulate, store and process large volumes of data and do them useful to business. Scopes of Yandex Data Factory are not limited — the main thing that there were data for the analysis. In focus of the company for October 6, 2015:

It is more than data – it is not better

Big data and price discrimination of clients

The chosen fragments from article Morgan Kennedy published on February 6, 2015 on the website InsidePrivacy devoted to a problem of protection of personal privacy[5] are included below[6].

On February 5, 2015 the White House published the report in which the issue of how the companies use "Big Data" for establishment of the different prices for different buyers - the practice known as "price discrimination" or "the differentiated pricing" (personalized pricing) was discussed. The report describes advantage of "Big Data" as for sellers, and buyers, and his authors come to a conclusion that many problematic issues which arose in connection with emergence of Big Data and the differentiated pricing can be solved within the existing antidiscrimination legislation and laws protecting the rights of consumers.

In the report it is noted that at this time there are only discrete facts testimonial of how the companies use Big Data in the context of one-to-one marketing and the differentiated pricing. This data show that sellers use pricing methods which can be separated into three categories:

  • studying of a demand curve;
  • Targeting (steering) and the differentiated pricing on the basis of demographic data; and
  • target behavioural marketing (behavioural targeting - behavioral targeting) and the individualized pricing.

Studying of a demand curve: For the purpose of clarification of demand and studying of a consumer behavior marketing specialists often make experiments in this area during which to clients one of two possible price categories is in a random way appointed. "Technically these experiments are a form of the differentiated pricing as the different prices for clients turn out to be their consequence even if they are "non-discriminatory" in the sense that at all clients the probability "get" on higher price is identical".

Targeting (steering): It is practice of representation of products to consumers on the basis of their belonging to a certain demographic group. So, the website of the computer company can offer the same notebook to different types of buyers at the different prices filled on the basis of information given by them about itself (for example, depending on whether this user is the representative of state bodies, scientific or commercial institutions, or the individual) or from their geographic location (for example, the computer determined by the IP address).

Target behavioural marketing and the individualized pricing: In these cases personal data of buyers are used for target advertizing and the individualized pricing on certain products. For example, online advertisers use the data on activity of users collected by advertizing networks and through cookies of the third parties on the Internet in order that is aimed to send the promotional materials. Such approach, on the one hand, gives the chance to consumers to receive advertizing of the goods and services which are of interest to them, It, however, can cause concern of those consumers who do not want that certain types of their personal data (such as data on visit of the websites connected with medical and financial questions) gathered without their consent.

Though target behavioural marketing is widespread, there is few evidence of the individualized pricing in the online environment. In the report it is suggested that it can be connected with the fact that the corresponding methods are still developed, or with the fact that the companies do not hurry to use individual pricing (or prefer to keep mum about it) - perhaps, being afraid of negative reaction from consumers.

Authors of the report believe that "for the individual consumer use of Big Data, undoubtedly, is connected both with potential return, and with risks". Recognizing that when using Big Data problems of transparency and discrimination appear, the report at the same time claims that the existing antidiscrimination laws and laws on consumer protection are enough for their solution. However in the report need of "constant control" is also emphasized when the companies use confidential information an opaque image or by methods which are not covered by the existing regulatory framework.

This report is continuing efforts of the White House on studying of application of "Big Data" and discriminatory pricing on the Internet, and the corresponding effects for the American consumers. Earlier[7] was already announced] that the working group of the White House on Big Data published the report on this question in May, 2014. The federal commission on trade (FTC) also considered these questions during the discrimination seminar held by it in September, 2014 in connection with use of the Big[8].


Gartner dispels myths about Big Data

In an analytical note of fall of 2014 of Gartner a number of the myths of rather Big Data distributed among IT heads is listed and their denials are given.

  • All implement processing systems of Big Data quicker us

Interest in technologies of Big Data is record-breaking high: in 73% of the organizations polled by analysts of Gartner this year already invest in the corresponding projects or gather. But the majority of such initiatives for the present at the earliest stages, and only 13% of respondents already implemented similar solutions. Most difficult — to define how to take income from Big Data, to solve from what to begin. Many organizations get stuck at a pilot stage as cannot tie new technology to specific business processes.

  • We have so many data that there is no need to worry about small errors in them

Some IT heads consider that small flaws in data do not influence the general analysis results of huge volumes. When there is a lot of data, each error separately affects result really less, analysts note, but also errors become more. Besides, the most part of the analyzed data — external, unknown structure or origin therefore the error probability grows. Thus, in the world of Big Data the quality actually is much more important.

  • Technologies of Big Data will cancel need in data integration

Big Data promise a possibility of data processing in an original format with automatic formation of the scheme in process of reading. It is considered that it will allow to analyze information from the same sources using several data models. Many believe that it will also give the chance to end users to interpret any data set at discretion. In reality most of users often needs a traditional method with the ready scheme when data are formatted as appropriate, and there are agreements on the level of integrity of information and on how it should correspond to the scenario of use.

  • There is no data warehouse sense to use for difficult analytics

Many administrators of information management systems consider that there is no sense to spend time for creation of the data warehouse, in view of that complex analytical systems use new data types. Actually in many systems of difficult analytics information from the data warehouse is used. In other cases new data types need to be prepared in addition for the analysis in processing systems of Big Data; it is necessary to make decisions on suitability of data, the principles of aggregation and the necessary quality level — such preparation can happen out of storage.

  • Data warehouses will be succeeded by lakes of data

In reality suppliers mislead customers, positioning lakes of data (data lake) as replacement to storages or as crucial elements of analytical infrastructure. Fundamental technologies of lakes of data lack a maturity and the width of the functionality inherent in storages. Therefore the heads who are responsible for data management should wait until lakes reach the same level of development, consider in Gartner.

Accenture: 92% of the Big Data which implemented systems, are happy with result

According to the research Accenture (fall of 2014), 60% of the companies already successfully completed at least one project connected with Big Data. The vast majority (92%) of representatives of these companies was enough result, and 89% said that Big Data became extremely important part of conversion of their business. Among other polled 36% did not think of implementation of this technology, and 4% did not finish the projects yet.

More than 1000 heads of the companies from 19 countries of the world participated in the research Accenture. The basis of a research was formed by data of poll of Economist Intelligence Unit among 1135 respondents worldwide[9].

Among the main advantages of Big Data respondents called:

  • "search of new sources of income" (56%),
  • "improvement of experience of clients" (51%),
  • "new products and services" (50%) and
  • "inflow of new clients and preserving of loyalty old" (47%).

At implementation of new technologies many companies faced traditional problems. For 51% security, for 47% — the budget, for 41% — the shortage of the necessary personnel, and for 35% — difficulties at integration with the existing system became a stumbling block. Practically all respondents of the company (about 91%) are going to solve shortly a problem with staff shortage and to employ Big Data specialists.

The companies optimistically estimate the future of technologies of Big Data. 89% consider that they will change business so strongly, as well as the Internet. 79% of respondents noted that the companies which are not engaged in Big Data will lose competitive advantage.

However, respondents disagreed about what it should be taken into account Big Data. 65% of respondents consider that it "big card files of data", 60% are sure that it "advanced analytics and the analysis", and 50% — that it "the visualization given tools".

Madrid spends 14.7 million euros for management of Big Data

In July, 2014 it became known that Madrid will use big data technologies for management of city infrastructure. Project cost - 14.7 million euros, the basis of the implemented solutions will be formed by technologies for the analysis and management of Big Data. With their help the city administration will manage work with each service provider and as appropriate to pay it depending on the service layer.

It is about contractors of administration who monitor a status of streets, lighting, an irrigation, green plantings, perform cleaning of the territory and export and also waste recycling. During the project 300 key performance indicators of work of city services based on which 1.5 thousand different checks and measurements will be daily performed are developed for specially selected inspectors. Besides, the city will begin use of the innovation tekhnologicheskly platforms under the name Madrid iNTeligente (MiNT) - Smarter Madrid.

Read in more detail: Why to Madrid analytics and Big Data?


Experts: Fashion peak on Big Data

One and all vendors in the market of data management conduct development of technologies for Big Data management at this time. This new technology trend is also actively discussed professional by community, both developers, and industry analysts and potential consumers of such solutions.

As the Datashift company found out, as of January, 2013 the wave of discussions around "Big Data" exceeded all imaginable sizes. Having analyzed number of references of Big Data on social networks, Datashift counted that for 2012 this term was used about 2 billion times in posts, the created about 1 million different authors worldwide. It is equivalent to 260 posts per hour, and the peak of references made 3070 references per hour.

Discussions of Big Data go to networks very actively. And, apparently from the pie charts given above, the peak of discussions only accrues: if in the first quarter 2012 there were more than 504 thousand references of the term, then in the fourth quarter – already more than 800 thousand. Principal themes of discussions in relation to Big Data are myths and reality, experience of use, a human factor, return of investments, new technologies. Among vendors Apache companies, 10gen, were most often mentioned by IBM, HP and Teradata.

Gartner: Every second Chief information officer is ready to spend for Big data

After several years of experiments with Big data technologies and the first implementations in 2013 adaptation of similar solutions considerably will increase, predict in Gartner Gartner Survey Finds 42 Percent of IT Leaders Have Invested in Big Data or Plan to Do So[10]. Researchers polled IT leading around the world and established that 42% of respondents already invested in Big data technologies or are going to make such investments within the next year (data for March, 2013).

The companies are forced to spend for technologies of processing of Big Data as information landscape promptly changes, I demand new approaches to information processing. Many companies were already realized that data bulks are crucial, and work with them allows to reach the benefits not available at use of traditional sources of information and methods of its processing. Besides, the permanent mussirovaniye of a subject of "Big Data" in media stirs interest in appropriate technologies.

Frank Buytendijk, the vice president of Gartner, even urged the companies to restrain an ardor as some show concern that lag behind competitors in Big data mastering.

"An opportunity should not worry, for implementation of the ideas based on technologies of "Big Data" are actually boundless", - he said.

According to forecasts of Gartner, by 2015 20% of the companies of the Global 1000 list will take strategic focus on "information infrastructure".

Waiting for new opportunities which will bring with themselves technologies of processing of "Big Data" already now many organizations will organize process of collecting and different storage of information.

For the educational and government organizations and also the companies of an industry the largest potential for transformation of business is put in a combination of the saved-up data with so-called dark data (literally – "dark data"), e-mail messages, multimedia and other similar content concern the last. According to Gartner, will defeat those who will learn to handle the most different sources of information a race of data.

Cisco poll: Big Data will help to increase IT budgets

During the research (spring of 2013) under the name Cisco Connected World Technology Report conducted in 18 countries by independent analytical company InsightExpress 1,800 students of colleges and the same number of young specialists aged from 18 up to 30 years were polled. Survey was conducted to find out the level of readiness of IT departments for project implementation of Big Data and to gain an impression about the problems connected with it, technology defects and strategic value of such projects.

Most the companies collects, writes and analyzes data. Nevertheless, the report says, many companies in connection with Big Data face a number of complex business and information and technology problems. For example, 60 percent of respondents recognize that the solutions Big Data can improve decision making processes and increase competitiveness, but only 28 percent said that they already get real strategic advantages from accumulated information.

More than a half of the polled IT heads consider that the Big Data projects will help to increase IT budgets in their organizations as increased requirements to technologies, personnel and professional skills will be imposed. At the same time more than a half of respondents expect that such projects will increase IT budgets in their companies in 2012. 57 percent are sure that Big Data will increase their budgets within the next three years.

81 percent of respondents said that everything (or, at least, some) the Big Data projects will demand use of cloud computing. Thus, distribution of cloud computing can affect the speed of distribution of the solutions Big Data and the value of these solutions for business.

The companies collect and use data of the most different types as structured, and unstructured. Here from what sources these survey participants (Cisco Connected World Technology Report) receive:

  • 74 percent collect continuous data;
  • 55 percent collect historical data;
  • 48 percent take off data from monitors and sensors;
  • 40 percent use data in real time, and then erase them. Most often data in real time are used in India (62 percent), the USA (60 percent) and Argentina (58 percent);
  • 32 percent of respondents collect unstructured data – for example, video. In this area China is in the lead: there unstructured data are collected by 56 percent of respondents.

Nearly a half (48 percent) of IT heads predicts doubling of load of their networks within the next two years. (It is especially characteristic of China where 68 percent of respondents, and Germany – 60 percent adhere to such point of view). 23 percent of respondents expect trebling of a network load for the next two years. At the same time only 40 percent of respondents declared the determination for the explosive growth of volumes of network traffic.

27 percent of respondents recognized that better IT politicians and measures of information security are necessary for them.

21 percent need transmission broad banding.

Big Data opens new opportunities for accumulation of value and formation of the close relations with business divisions before IT departments, allowing to raise income and to strengthen a financial position of the company. The Big Data projects are done by IT departments the strategic partner of business divisions.

According to 73 percent of respondents, IT department will become the main engine of strategy implementation Big Data. At the same time, respondents consider, other departments will be connected to implementation of this strategy too. First of all, it concerns departments of finance (it was called by 24 percent of respondents), research (20 percent), operational (20 percent), engineering (19 percent) and also marketing departments (15 percent) and sales (14 percent).

Gartner: For management of Big Data millions of new jobs are necessary

World IT expenses will reach $3.7 billion by 2013 that is 3.8% more than expenses on information technologies in 2012 (the forecast for the end of the year makes $3.6 billion). The Segment of Big Data (big data) will develop much higher rates, said in the report of Gartner Gartner Says Big Data Creates Big Jobs[11].

By 2015 4.4 million jobs in the field of information technologies 1.9 million jobs – in the USA will be created for service of Big Data, from them. Moreover, each such workplace will cause creation of three additional jobs outside the IT field so only in the USA in the closest four years of 6 million people will work for maintenance of information economy.

According to experts of Gartner, the main problem consists that in the industry for this purpose there are not enough talents: both a private, and state educational system, for example, in the USA are not capable to supply the industry with enough qualified personnel. So from the mentioned new jobs in IT the personnel will provide only one of three.

Analysts believe that the companies which are in great need in them as such employees will become the admission for them in new information economy of the future should undertake a role of cultivation of the qualified IT of the personnel directly.


The first scepticism concerning Big Data

Analysts of the companies Ovum also Gartner assume that for a subject, fashionable in 2012, Big Data time of release from illusions can come.

The term "Big Data", at this time as a rule, designate constantly growing information volume, arriving in a foreground mode from social media, from networks of sensors and other sources and also the growing range of the tools used for data processing and identification on their basis of important business trends.

"Because of hype (or despite it) concerning the idea of Big Data producers in 2012 with huge hope looked at this trend" — Tony Bayer, the analyst of Ovum noted.

The buyer reported that the DataSift company carried out the retrospective analysis of references of Big Data to Twitter for 2012. Limiting search by producers, analysts wanted to be focused on perception of this idea by the market, but not wide community of users. Analysts revealed 2.2 million tweets from more than 981 thousand authors.

These data differed in the different countries. Though the belief is commonly accepted that the USA is in the lead on an indicator of the set platforms for work with Big Data, users from Japan, Germany and France were often more active in discussions.

The idea of Big Data drew so great attention that even the business press, and not just specialized editions widely wrote about it.

The number of positive reviews about Big Data from producers exceeded number three times negative though in November in connection with purchase of Autonomy company by HP company the splash in a negative was observed.

The concept of Big Data is expected by much more severe times though, having passed them, this ideology will reach a maturity.

"For supporters of Big Data there comes parting time with illusions" — Svetlana Sikular, the analyst of Gartner explained. She referred to the obligatory stage entering a classical curve of a cycle of popularity (Hype Cycle) which is used in Gartner.

Even among those clients who achieved the greatest progress using Hadoop many "lose illusions".

"They do not feel at all that they are ahead of others, and believe that success drops out another while they have not the best times. These organizations have amazing ideas, and now they are disappointed because of difficulties in development of reliable solutions" — told Sikular.

However, optimism source for supporters of Big Data can be at this time the fact that the following cycle on curve popularity and also the final stages have very promising titles, namely "an education slope" and "the plateau of productivity".

Slow DWH constrains development of Big Data

If the capacity of modern computing systems for several decades grew by many orders and does not come within miles of the first personal PCs of a sample of the beginning of the 1980th of last century, then with DWH the situation is much worse. Certainly, available volumes repeatedly increased (however, they still in deficit), information storage cost in terms of bit fell sharply (though the ready systems still too expensive), however the speed of extraction and search of the necessary information leaves much to be desired.

If not to take still too expensive and not quite reliable and durable flash drives in consideration, technologies of information storage not really far left forward. Still it is necessary to deal with hard drives which rotational speed of plates even in the most expensive models is limited at the level of 15 thousand rpm. So far as their considerable quantity is about Big Data, obviously (if not suppressing) is placed on drives with the rotational speed of a spindle of 7.2 thousand rpm. Rather prosy and sadly.

The designated problem lies on a surface and is well familiar to Chief information officers of the companies. However, it is not the only[12]:

  • Technology lag.

Big Data can turn into a big headache or open great opportunities before government agencies if only they manage to use them. Authors of a research with the unfavourable name The Big Data Gap came to such conclusions in the second quarter 2012 (with engl. gap – "discrepancy", in this context between theoretical benefits and the real situation). On survey results of 151 Chief information officer in the next two years volumes of the stored data in public institutions are mutilated on 1 Petabyte (1024 Terabyte). At the same time it becomes more difficult to take benefits from constantly growing information flows, the lack of available space of DWH affects, access to the necessary data is at a loss, there is not enough computing power and skilled staff.

The technologies which are at the disposal of IT managers and the application show significant lag from requirements of real tasks which solution is capable to bring to Big Data additional value. 60% of authorized representatives and 42% of the Defense Ministries so far are only engaged in studying of a phenomenon of Big Data and conduct search of possible points of its application in the activity. The main, according to Chief information officers of federal authorities, there has to be an increase in efficiency – so consider 59% of respondents. Fall forward and the accuracy of the made decisions (51%), on the third – an opportunity to build forecasts (30%) is in the second place.

Anyway, but flows of the processed data continue to grow. Within the last two years pointed 87% of the polled Chief information officers to increase in volumes of the stored information, in the long term the next two years expect preserving of this trend already 96% of respondents (with an average gain of 64%). To manage to take all advantage which promises Big Data the organizations participating in poll need on average three years. So far only 40% of authorities make strategic decisions, based on the accumulated data, and only 28% interact with other organizations for the analysis of the distributed data.

  • Low quality of data.

To the big house it is always more difficult to bring order, than to the tiny apartment. Here it is possible to draw a perfect analogy to Big Data during the work with which it is very important to adhere to a formula 'garbage on an input — gold at the exit'. Unfortunately, modern management tools master these are insufficiently effective and quite often lead to the return situations ('gold on an input — garbage at the exit').

  • Metadata: it is informed – means it is armed.

The request which well copes with search hundreds of lines from one million can not cope with the table of hundred billion lines. If data often change, it is extremely important to keep the magazine and to book audit. Accomplishment of these simple rules will allow to locate a technique of storage, important for development, and work with data of information on amount of data, speed and frequency of its change.

  • Tell me who your friend – and I will tell who you are.

It is correct to interpret the trends hidden in arrays of Big Data and interrelation the read trained staff can in literal sense. Somewhat filters and recognizers of structures are capable to replace them, but the quality of the results received at the exit leaves much to be desired so far.

  • Visualization.

The section of article of the same name visually illustrates all complexity and ambiguity of the used approaches for visualization of Big Data. At the same time, in a form, available to perception, the crucial value sometimes has an idea of results.

  • Time is money.

Viewing data in real time means need of permanent recalculation that is not always acceptable. It is necessary to make a compromise and to resort to a retrospective method of analytics, for example, based on cubes, and to be reconciled with partly outdated results.

  • Scorch from the gun on sparrows.

Never it is impossible to foreknow on what period of time Big Data are of special value and are most relevant. And collecting, storage, the analysis, creation of backup copies requires considerable resources. It is necessary to perfect policy of storage and, of course, not to forget to put it into practice.

Oracle: Solution of the problem of Big Data in upgrade of DPCs

Results of a research of Oracle corporation demonstrate that many companies, most likely, are overtaken unawares by a boom of "Big Data".

"Fight against "Big Data", it seems, will become the biggest IT task for the companies in the next two years, – Luigi Freguia, the senior vice president for the hardware of Oracle in EMEA region considers. – By the end of this period they or will cope with it, or considerably will lag behind in business and will be far both from threats, and from opportunities of "Big Data".

The task of "mastering" of Big Data is unique, recognize in Oracle. Upgrade of the corporate data processing centers (DPC) should become the main reply of the companies to calls of big data.

To estimate degree of readiness of the companies for changes in DPCs, for nearly two years of Oracle together with Quocirca analytical company collected data for the research Oracle Next Generation Data Centre Index (Oracle NGD Index). This index estimates progress of the companies in a question of the thought-over use of DPCs for performance improvement of IT infrastructure and business process optimization.

The research consisted of two phases (cycles), and analysts noticed significant changes of all key indicators already on a threshold of the second stage. The GPA according to Oracle NGD Index which was typed by survey participants from Europe and the Middle East made 5.58. The maximum point-10.0 – reflects the thought most over strategy of use of DPCs.

The GPA (5.58) became higher in comparison with the first cycle of a research which is carried out in February, 2011 – 5.22. It means that the companies in response to a boom of "Big Data" increase investments into development strategies of DPCs. All countries, the industries and the directions within the industries covered by a research raised the Oracle NGD Index index based on the second cycle in comparison with the first.

Scandinavia and region Germany Switzerland of DCH/take the leading positions on sustainable development with the index (Sustainability Index) in 6.57. Further in this rating the Benelux countries (5.76) follow and, then, Great Britain with an indicator 5.4 that already below average.

At Russia which was included in the list of countries/regions only in the second cycle of a research and did not participate in the first, there is a considerable potential for growth (an indicator 4.62), analysts note.

According to a research, the Russian organizations consider support of growth of business as the important reason for investments into DPCs. More than 60% of the companies see need of such investments today or in the near future, assuming that the organizations will shortly find out that it becomes incredibly difficult to compete if not to make the corresponding investments yet.

In general in the world the share of respondents with own corporate DPCs decreased from 60% based on the first cycle of a research to 44% on the second cycle of a research, on the contrary, use of external DPCs increased on 16 points up to 56%.

Only 8% of respondents said that they do not need new capacities of DPC in the near future. 38% of respondents see need for new capacities of DPC within two next years. Only 6.4% of respondents reported that in their organization there is no sustainability plan connected with use of DPC. The share of heads of DPCs who browse copies of invoices for payment of the electric power grew from 43.2% to 52.2% for the entire period of a research.

Investments into Big Data-startups

In the second decade of October, 2012 at once three American startups received investment on development of applications and services for work with Big data. These companies on the example show the undying, and increasing interest of ventures in this segment of IT of business and also need of new infrastructure for work with data, TechCrunch writes on October 21, 2012.

The interest of investors in Big data is explained by the positive forecast of Gartner about development of this segment till 2016. According to a research, solutions for Big data will make about 232 billion dollars in structure of IT expenses of the companies.

At the same time, many companies and startups in a Big data segment begin to leave from the scheme of work of pioneers of the industry (Google, Amazon) when solutions on work with Big Data were only a part of data processing centers. Now they were transformed to the separate direction of IT market.

Big data is and infrastructure offers and applications of both boxed, and cloud types now, it is the instrument of work not only big corporations, but also an average, and at times and small businesses.

And this movement of the market forces to look differently vendors at Big data and to change the approach in work with them and also changes a view of clients consumers who not only telecommunication or finance corporations are now.

India prepares for a boom of Big Data

The Indian IT market gradually begins to reduce rates of development and the industry it is necessary to look for new ways of maintenance usual dynamics of growth or methods not to fail after other industries during the periods of the global economic crisis. Software developers and applications begin to offer new options of use of the latest technologies. So some Indian companies make the analysis of consumer activity on the basis of large volumes of unstructured data (Big Data) and then offer results of researches to large shops and retail networks. Reported about it on October 8, 2012 to Reuters.

Under fixed studying surveillance cameras, reports on purchases, requests on the Internet, reports on complete purchases using this or that web resource got.

"These data can let us know about tendency of the visitor to this or that purchase and consequently this information furnishes the clue to the conclusion of the profitable transaction for all parties", - Reutes quotes the CEO of the Bangalore company Mu Sigma Dkhiraya Radzharama (Dhiraj Rajaram), one of the largest organizations which is engaged in the analysis of Big Data.

Dkhiray Radzharam noticed that the main part of the similar analysis is manufactured in the USA, however now when rapid development of the Indian IT market began to weaken, the companies draw more and more close attention to this perspective segment.

At the same time, the Indian companies during the work with Big Data most often use cloud computing for storage and data processing and results of the activity.

The volume of the universal data made in 2011 is estimated, according to Dkhiraya Radzharam, at about 1.8 zettabyte - 1.8 billion terabyte that is equivalent to 200 billion full-length films of high definition.

In addition to the analysis of requests and results of image processing from surveillance cameras, Dkhiray Radzharam sees a huge scope for work in that how many he of information from users and buyers appears on social networks. According to him this rather new segment of IT market can shortly become the driver of all industry.

The Indian national association of the software and IT services (India's National Association of Software and Services Companies (Nasscom) predicts the sixfold growth of a segment of the solution for work with Big Data to 1.2 billion dollars.

At the same time the universal growth of Big Data will be more than 2 times with 8.25 billion dollars now, to 25 billion dollars in the next several years, consider in Nasscom.


The fashion on Big Data blossoms

In 2011 it was considered to be that modern software tools are not able to operate with large volumes of data within reasonable periods of time. The designated range of values has very conditional character and tends to increase upwards as ADP equipment continuously is improved and becomes more and more available. In particular, Gartner in June, 2011 considers "Big Data" in three planes at once – growth of volumes, growth of a data exchange rate and increase in information variety[13].

At this time it is considered that the main feature of the Big Data of approaches used within the concept is the possibility of processing of information array entirely for receiving more reliable analysis results. Before it was necessary to rely on so-called representative selection or a subset of information. Naturally errors at such approach were much higher. Besides, such approach required costs of a certain quantity of resources for data preparation for the analysis and their reduction to a required format.

According to media reports during this period, "it is difficult to find the industry for which the perspective of Big Data would be irrelevant". The ability to operate with large volumes of information, to analyze interrelations between them and to make the weighed decisions, on the one hand, bears the potential for the companies from different verticals for increase in profitability indicators and profitability, increase in efficiency. On the other hand, it is a fine opportunity for additional earnings to partners of vendors – to integrators and consultants.

To emphasize benefits from development and implementation of instruments of work with Big Data the McKinsey company offered statistics given below. It has a binding mainly to the market of the USA, but it is easy to extrapolate it also to other economically developed regions.

  • The potential size of the market of health care in the USA is $300 billion a year. A part of this huge amount goes for implementation of modern IT, and it is obvious, Big Data will not stand aside.

  • Use of tools of the analysis of "Big Data" in retail networks can potentially lead to increase in profitability by 60%.

  • Only in the USA effective processing of "Big Data" requires 140-190 thousand analysts and over 1.5 million managers for management of information arrays.

  • The American companies in 15 of 17 branches of the economy locate large volumes of data, than Library of Congress of the USA.

Why data became big

In 2011 apologists of the concept of Big Data say that sources of Big Data in the modern world great variety. In their quality can act:

  • continuously arriving data from measuring devices,
  • events from radio-frequency identifiers,
  • flows of messages from social networks,
  • meteorological data,
  • the lands given remote sensing,
  • data streams about location of subscribers of cellular transmission networks,
  • devices of audio-and video registration.

Actually, mass distribution of the listed above technologies and essentially new models of use variously of a sort of devices and Internet services served as a starting point for penetration of Big Data nearly in all fields of activity of the person. First of all, research activity, commercial sector and public administration.

File:1 BigData1.jpg

Growth of amounts of data (at the left) against the background of replacement of analog means of storage (on the right). Source: Hilbert and López, 'The world’s technological capacity to store, communicate, and compute information, 'Science, 2011Global

Several indicative facts of this time:

  • In 2010 corporations of the world saved up 7 exabytes of data, on our home PCs and notebooks 6 exabytes of information are stored.
  • All music of the world can be placed on a disk worth 600 dollars.
  • In 2010 in networks of mobile operators 5 billion phones were serviced.
  • Every month in Facebook network 30 billion new sources of information are uploaded publicly.
  • Annually volumes of the stored information grow by 40% while global costs for IT grow for only 5%.
  • As of April, 2011 in Library of Congress of the USA it was stored the 235th terabyte of data.
  • The American companies in 15 of 17 branches of the economy locate large volumes of data, than Library of Congress of the USA.

File:2 BigData.png

Growth of computing power of the computer equipment (at the left) against the background of transformation of a paradigm of work with data (on the right). Source: Hilbert and López, 'The world’s technological capacity to store, communicate, and compute information, 'Science, 2011Global

For example, the sensors installed on the aircraft engine generate about 10 Tb for half an hour. Approximately the same flows are characteristic of drilling rigs and petrochemical complexes. Only one service of short messages Twitter, despite restriction of length of the message 140 characters, generates a flow of 8 Tb / days. If to accumulate all similar data for further processing, then their total volume will be measured by tens and hundreds petabyte. Additional difficulties result from variability of data: their structure and structure are subject to permanent changes at start of new services, installation of advanced sensors or deployment of new marketing campaigns.

Recommendations to Chief information officers

A variety of the data unprecedented before resulting from huge number of various transactions and interactions provides itself great fundamental base for business on refining of forecasts, assessment of perspectives of development of products and the whole directions, the best cost control, efficiency evaluation – it is easy to continue the list as much as long. On the other hand, Big Data set difficult tasks for any IT department, experts of in 2011 wrote. Not only that they essentially new character, at their solution it is important to consider the restrictions for capital and current costs imposed by the budget.

The chief information officer who intends to derive benefit from the Big structured and unstructured Data should be guided by the following technical reasons[14]:

  • Divide and govern.

Movement and data integration are necessary, but both approaches raise capital and operating expenses on instruments of information extraction, its conversion and loading (ETL). Therefore you should not neglect standard relational environments, such as Oracle, and analytical data storages, such as Teradata.

  • Compression and deduplication.

Both technologies significantly left forward, for example, the multilevel compression allows to reduce the volume of 'crude' data in tens of times. However it is always worth remembering what part of condensed data can demand recovery, and already making a start from each specific situation to make the decision on use of the same compression.

  • Not all data are identical.

Depending on a specific situation the range of requests for a business intelligence changes over a wide range. Often for obtaining necessary information it is enough to receive the answer to the SQL query, but also the deep analytical requests requiring use of the tools allocated with business intelligence and opportunities of the panel board and visualization having a full range meet. Not to allow sharp increase in operating expenses, it is necessary to approach carefully drawing up the balanced list of necessary patent technologies in combination with the open source software of Apache Hadoop.

  • Scaling and controllability.

The organizations are forced to solve a problem of heterogeneity of databases and analytical environments, and in this regard the scaling option in horizontal direction and verticals has basic value. Actually, just ease of horizontal scaling also became one of basic reasons of fast distribution of Hadoop. Especially in the light of a possibility of parallel processing of information on clusters from normal servers (does not demand from employees of highly specialized skills) and economy thus investments into IT resources.

Increase in demand for administrators of big data

46% of directors of the IT services polled at the end of 2011 by Robert Half recruitment agency call the most demanded specialty administration of databases. Administration of networks was called by 41% of respondents, administration of the Windows systems — 36%, technical support of desktop applications — 33%, and to a business analyst and means of drawing up reports — 28%.

Processing of large volumes of data becomes a serious problem for many companies, and it increases demand for specialists in management of databases, conclude in Robert Half. In addition to growth of volumes of nestruktrirovanny data (for example, messages on social networks), demand increases because of preparation for introduction in Europe of new regulatory requirements — including standards of solvency of Solvency II for insurance companies and standards of the capital and liquidity of Basel III for the banking sector.

Analysts of Robert Half predict also deficit of specialists in mobile and cloud computing. Their output is based that 38% of the polled Chief information officers principal direction of investments called mobile technologies, and 35% — virtualization.

2008: Emergence of the term "Big Data"

Directly the term "Big Data" appeared in the use only at the end of the 2000th. It is among the few names having quite reliable date of the birth — on September 3, 2008 when there was special issue of the oldest British scientific magazine Nature devoted to search of the answer to the question "How Can the Opening Opportunities for Work with Large Volumes of Data Affect the Future of Science of Technology?". Special number summed up the previous discussions about a role of data in science in general and in electronic science (e-science) in particular[15].

It is possible to establish several reasons which caused a new wave of interest in Big Data. Information volumes grew under the exponential law and its lion share belongs to unstructured data. In other words, questions of correct interpretation of information flows became more and more relevant and at the same time difficult. Reaction from IT market followed without delay – large players purchased the most successful highly specialized companies and began to develop tools for work with Big Data, the quantity of the corresponding startups exceeded all imaginable expectations at all.

Along with growth of computing power and development of technologies of storage of a possibility of the analysis of Big Data gradually become available to small and medium business and stop being only a prerogative of the large companies and research centers. In no small measure it is promoted by development of cloud model of calculations.

At this time it is expected that with further penetration of IT into a business environment and everyday life the information flows which are subject to processing will continue to grow continuously. And if at the end of the 2000th Big Data are petabytes, it was expected that in the future it is necessary to operate with exabytes, etc. It was predicted that in the foreseeable future tools for work with such huge arrays of information will still remain excessively difficult and expensive.

The 1970th: An era of mainframes - emergence of the concept of Big Data

The concept of "Big Data" in itself arose at the time of mainframes and the related scientific computer calculations[16]. As you know, knowledge-intensive calculation always differed in complexity and are usually inseparably linked with need of processing of large volumes of information.

You See Also