RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2017/10/24 16:45:31

Big Data

The Big Data category includes information that can no longer be processed in traditional ways, including structured data, media, and random objects. Some experts believe that to work with them , new massive parallel solutions have replaced traditional monolithic systems.

Content

What is Big Data?

Simplest definition

From the name, it can be assumed that the term "big data refers simply to the management and analysis of large amounts of data. According to the McKinsey Institute's report Big Data: The next frontier for innovation, competition and productivity, the term big data refers to data sets that exceed the ability of typical databases (databases) to record, store, manage and analyze information. And the world's data repositories, of course, continue to grow. A mid-2011 report by IDC Digital Universe Study, which the company sponsored, EMC predicted that the total global volume of created and replicated data in 2011 could be about 1.8 zettabytes (1.8 trillion gigabytes) - about 9 times more than what was created in 2006.

IDC, Nexus of Forces Gartner
Traditional Database and Big Data Database

More complex definition

Nevertheless, big data involves more than just analyzing huge amounts of information. The problem is not that organizations generate huge amounts of data, but that most of it is presented in a format that does not match the traditional structured database format - web logs, videos, text documents, machine code or, for example, geospatial data. All this is stored in many different repositories, sometimes even outside the organization. As a result, corporations may have access to a vast amount of their data and lack the necessary tools to establish relationships between these data and draw meaningful conclusions from them. Add here the fact that data is now being updated more and more often, and you will get a situation in which traditional methods of information analysis cannot keep track of the huge volumes of constantly updated data, which ultimately opens the way for big data technologies.

Best Definition

In fact, Big Data refers to information of great volume and variety, which is very often updated and available in different sources in order to increase efficiency, create new products and increase competitiveness. The consulting company Forrester gives a brief wording: "Big data combines technologies and technologies that extract meaning from data at the extreme limit of practicality."

How big is the difference between business intelligence and big data?

Craig Bati, Chief Marketing Officer and Chief Technology Officer at Fujitsu Australia, pointed out that business analysis is a descriptive process of analyzing the results achieved by a business over a period of time, while the speed of big data processing makes analysis predictive, capable of offering business recommendations for the future. Big data technologies also allow you to analyze more types of data than business intelligence tools, which allows you to focus not only on structured storage.

Matt Slocum of O'Reilly Radar believes that although big data and business analytics have the same goal (finding answers to a question), they differ from each other in three aspects.

  • Big data is designed to handle more information than business analytics, and this, of course, corresponds to the traditional definition of big data.
  • Big data is designed to handle faster and changing information, which means deep research and interactivity. In some cases, results are generated faster than a web page is loaded.
  • Big data is designed to handle unstructured data, the uses of which we are just beginning to study after we have been able to collect and store them, and we need algorithms and dialogue to facilitate the search for trends contained within these arrays.

According to Oracle the white paper "Oracle Information big data Architecture: An Architect's Guide to Big Data" published by the company, when working with big data, we approach information differently than when conducting business analysis.

Working with big data is not like the usual business intelligence process, where simply adding known values ​ ​ brings results: for example, the total of adding data on paid accounts becomes the volume of sales for the year. When working with large data, the result is obtained in the process of cleaning them by sequential modeling: first, a hypothesis is put forward, a statistical, visual or semantic model is built, on its basis the correctness of the hypothesis put forward is checked, and then the following is put forward. This process requires the researcher to either interpret visual values or compile interactive knowledge-based queries, or develop adaptive 'machine learning' algorithms that can obtain the desired result. Moreover, the lifetime of such an algorithm can be quite short.

Big Data≠Data Science



Big Data is:

  • ETL\ELT
  • Storage technologies for large volumes of structured and non-structured data
  • Technologies for processing such data
  • Data Quality Management
  • Technologies for providing data to the consumer

Data Science is:

Big Data Analysis Techniques

There are many different methods for analyzing data arrays, which are based on tools borrowed from statistics and computer science (for example, machine learning). The list does not apply for completeness, but it reflects the most popular approaches in various industries. It should be understood that researchers continue to work on creating new methods and improving existing ones. In addition, some of the methods listed above do not necessarily apply exclusively to large data and can be successfully used for smaller arrays (for example, A/B testing, regression analysis). Of course, the more voluminous and diversified the array is analyzed, the more accurate and relevant data can be obtained at the output.

A/B testing. A technique in which the reference sample is alternately compared with others. Thus, it is possible to identify the optimal combination of indicators to achieve, for example, the best consumer response to a marketing offer. Large data allow you to carry out a huge number of iterations and thus obtain a statistically reliable result.

Association rule learning. A set of techniques for identifying relationships, i.e. associative rules, between variables in large data arrays. Used in data mining.

Classification. A set of techniques that predict consumer behavior in a specific market segment (buying decisions, outflow, consumption volume, etc.). Used in data mining.

Cluster analysis. Statistical method of classifying objects into groups by identifying previously unknown common characteristics. Used in data mining.

Crowdsourcing. Methodology for collecting data from a large number of sources.

Data fusion and data integration. A set of techniques that allows you to analyze comments from users of social networks and compare with sales results in real time.

Data mining. A set of techniques that allows you to identify the categories of consumers most susceptible to a promoted product or service, identify the features of the most successful workers, and predict the behavioral model of consumers.

Ensemble learning. In this method, many predicative models are involved, thereby improving the quality of the forecasts made.

Genetic algorithms. In this technique, possible solutions are in the form of 'chromosomes' that can be combined and mutated. As in the process of natural evolution, the most adapted individual survives.

Machine learning. A direction in computer science (historically it was assigned the name "artificial intelligence"), which aims to create self-learning algorithms based on the analysis of empirical data.

Natural language processing (NLP). A set of techniques borrowed from computer science and linguistics for recognizing a person's natural language.

Network analysis. A set of methods for analyzing connections between nodes in networks. With regard to social networks, it is possible to analyze the relationships between individual users, companies, communities, etc.

Optimization. A set of numerical methods for redesigning complex systems and processes to improve one or more indicators. It helps in making strategic decisions, for example, the composition of the product line being brought to the market, conducting investment analysis, etc.

Pattern recognition. A set of techniques with self-learning elements to predict consumer behavioral model.

Predictive modeling. A set of techniques that allow you to create a mathematical model in advance of a given probable scenario. For example, analyze the CRM system database for possible conditions that would prompt subscribers to change their provider.

Regression. A set of statistical methods to identify a pattern between a change in a dependent variable and one or more independent ones. Often used for forecasting and predictions. Used in data mining.

Sentiment analysis. Methods for assessing consumer sentiment are based on technologies for recognizing the natural language of a person. They allow you to isolate messages related to an object of interest (for example, a consumer product) from the general information flow. Then evaluate the polarity of judgment (positive or negative), the degree of emotionality, etc.

Signal processing. A set of techniques borrowed from radio engineering, which aims to recognize the signal against the background of noise and its further analysis.

Spatial analysis. A set of methods for analyzing spatial data, partly borrowed from statistics - terrain topology, geographical coordinates, geometry of objects. The source of big data in this case is often geographic information systems (GIS).

Statistics. The science of data collection, organization and interpretation, including the development of questionnaires and conducting experiments. Statistical methods are often used to assess the relationship between events.

Supervised learning. A set of technology-based machine learning techniques that identify functional relationships in analyzed data arrays.

Simulation. Modeling of the behavior of complex systems is often used to predict, predict and study various scenarios during planning.

Time series analysis. A set of methods borrowed from statistics and digital signal processing for analyzing repeated data sequences over time. One of the obvious applications is tracking the securities market or patient morbidity.

Unsupervised learning. A set of technology-based machine learning techniques that identify hidden functional relationships in analyzed data arrays. Shares features with Cluster Analysis.

Visualization. Techniques for graphically presenting big data analysis results as diagrams or animated images to simplify interpretation and facilitate understanding of the results.

{{main 'Data Visualization
}}

Visualizing the results of big data analysis is critical to their interpretation. It is no secret that human perception is limited, and scientists continue to conduct research in the field of improving modern methods of presenting data in the form of images, diagrams or animations.

Analytical Tools

For 2011, some of the approaches listed in the previous subsection or some of their totality allow the implementation of analytical engines for working with big data. From free or relatively inexpensive open Big Data analysis systems, you can recommend:[1]

Of particular interest in this list is Apache Hadoop - open source software, which over the past five years has been tested as a data analyzer by most stock trackers[2]As soon as Yahoo opened Hadoop code to the open source community, a whole direction for creating Hadoop-based products immediately appeared in the IT industry. Almost all of today's big data analytics tools provide integration with Hadoop. Their developers are both startups and well-known world companies.

Big Data Management Markets

Big Data Platform (BDP) as an anti-digital hording tool

The ability to analyze big data, colloquially called Big Data, is perceived as a good thing, and uniquely. But is that really true? What can rampant data accumulation lead to? Most likely, domestic psychologists in relation to a person are called pathological accumulation, sillogomania or figuratively "Plyushkin syndrome." In English, the vicious passion to collect everything in a row is called hording (from the English hoard - "stock"). According to the classification of mental diseases, chording is classified as a mental disorder. In the digital age, Digital Hoarding is added to the traditional real chording, they can suffer both individuals and entire enterprises and organizations (more).

World and Russian Market

Big data Landscape - Major Vendors

Almost all leading IT companies showed interest in big data collection, processing, management and analysis tools, which is quite natural. Firstly, they face this phenomenon directly in their own business, and secondly, big data opens up excellent opportunities to master new market niches and attract new customers.



Many startups have appeared on the market that do business in processing huge amounts of data. Some of them use a ready-made cloud infrastructure provided by large players like Amazon.

Big Data Theory and Practice in Industries

Main article - Big Data Theory and Practice in Industries.

How to use analytics to develop quality IT services

White Paper - Using Analytics to Develop IT Services

History of development

2017

Forecast TmaxSoft: next Big Data wave will require DBMS upgrade

According to the IDC report, due to the growth in data generated by Internet-connected devices, sensors and other technologies, big data revenues will increase from $130 billion in 2016 to more than $203 billion by 2020.[3] However those companies which have no IT infrastructure necessary for adaptation to revolution of big data will not be able to receive benefit from this growth, experts of the TmaxSoft company consider.

Enterprises know that their vast amounts of data contain important information about their business and customers. If the company can successfully apply this information, then it will have a significant advantage over competitors, and it will be able to offer better products and services than they have. However, many organizations still cannot use big data effectively because their legacy IT infrastructure is unable to provide the necessary storage capacity, data exchange processes, utilities, and applications needed to process and analyze large amounts of unstructured data to extract valuable information from them, the TmaxSoft said.

In addition, the increased processing power required to analyze ever-increasing data volumes may require significant investments in the organization's legacy IT infrastructure, as well as additional support resources that could be used to develop new applications and services.

According to the Andrei Reva executive director, TmaxSoft Russia these factors will lead to the fact that organizations that continue to use the legacy infrastructure will be forced to pay much more in the future for switching to current technologies or will not be able to get any effect from the big data revolution.

File:Aquote1.png
The big data phenomenon has made many enterprises aware of the need to collect, analyze, and store structured and unstructured data. However, to implement these processes, you need an action plan and the right process optimization tools. And in reality, many companies are not able to get a tangible effect from big data due to the use of inherited DBMSs, which lack functionality and scalability, and as a result, the big data revolution does not help their business in any way, "Andrei Reva explained his forecast.
File:Aquote2.png

According to the representative of TmaxSoft, enterprises need a strategy that takes into account, among other things, the sources of data for extraction, the life cycle of data, the compatibility of different relational databases and the scalability of storage.

2016

EMC Forecast: Real-Time BigData and Analytics Will Come Together

In 2016, we will get acquainted with the new chapter of the history of the development of big data analytics as we develop a two-level processing model. The first level will be "traditional" BigData analytics, where large amounts of data are not analyzed in real time. The new, second tier will provide the ability to analyze relatively large amounts of data in real time, mainly through in-memory analytics technologies. In this new phase of development, BigData such technologies as DSSD, Apache Spark and GemFire will be as important as Hadoop. The second level will offer us at the same time new and familiar ways to use "data lakes" - for "analytics on the fly" in order to influence events, at the time when they occur. This opens up new business opportunities on a scale that no one has seen before.

But in order for analytics in memory to become a reality, it is necessary that two events happen. First, supporting technologies must be developed to provide sufficient memory to accommodate truly large datasets. You also need to think about how to efficiently move data between large object stores and systems that conduct in-memory analysis. After all, these two elements work in fundamentally different modes, and IT groups will need to create special conditions so that data can move back and forth at the right speed and transparently for users. Work is already underway, new object stores, special flash arrays for mounting in a rack, as well as special technologies that can combine them into one system. Open source initiatives will play an important role in finding an answer to this challenge.

Second, large-scale in-memory computing environments require data stability and dynamism. The problem is that by ensuring the persistence of the data in the memory, we also make any of their defects stable. As a result, in 2016 we will see the emergence of storage systems for in-memory environments. They will provide deduplication, snapshot, tiered storage, caching, replication services, and the ability to determine the last state when data was correct and the system worked correctly. These features will be critical as we move to real-time analytics when more secure in-memory technologies become commercial in 2016.

2015

Gartner eliminated Big Data from popular trends

On October 6, 2015, it became known about the exclusion of big data information from the Gartner "Technology Maturity Cycle 2015" report. Researchers attributed this to the blurring of the term - the technologies included in the concept of "big data" have become the daily reality of business[4].

The Gartner report, Hype Cycle for Emerging Technologies, stirred the industry with the lack of technology to collect and process large amounts of data. Analysts explained their decision by the fact that the concept of "big data" includes a large number of technologies that are actively used in enterprises, they partially relate to other popular areas and trends and have become an everyday working tool.

Gartner Chart "Hype Cycle for Emerging Technologies 2015"

"Initially, the concept of" big data "was deciphered through the definition of three" V ": volume, velocity, variety. By this term was meant a group of technologies for storing, processing and analyzing large-scale data, with a variable structure and a high update rate. But reality has shown that the gain in business projects is carried out according to the same principles as before. And the described technological solutions themselves did not create any new value, only accelerating the processing of a large amount of data. Expectations were very high, and the list of big data technologies grew intensely. Obviously, as a result of this, the boundaries of the concept have blurred to the limit, "said Svyatoslav Stumpf, chief expert of the Peter-Service product marketing group.

Dmitry Shepelyavy, the deputy CEO of SAP CIS, considers - the subject of big data did not disappear, and was transformed to a set of various scenarios:

"Examples here can be state repairs, precision farming, anti-fraud systems, systems in medicine that allow patients to be diagnosed and treated at a qualitatively new level. As well as real-time logistics and transportation planning, improved business analytics to support and support the core functions of companies. One of the main trends now is the Internet of Things, which allows you to connect machines to each other (machine-to-machine). The electronic sensors you install produce millions of transactions per second, and you need a robust solution that can transform, save, and work with them in real time. "

In May 2015, Andrew White, vice president of research at Gartner, reflected on his blog:

"[[Internet of Things (IoT)|Internet of Things, IoT)]] will overshadow big data as too focused technology. It may generate several more effective solutions and tools, but the Internet of Things will be the platform of the future that will increase our productivity in the long term. "

The similar ideas earlier - by results of the report of Gartner for 2014, were published by the observer of Forbes Gil Press (Gil Press).

According to Dmitry Shepelyavy, an era has come when it is important not just to be able to accumulate information, but to derive business benefits from it. The first to come to this conclusion were industries that work directly with the consumer: telecommunications and banking, retail. Interaction processes now take you to a new level, allowing you to connect between different devices using augmented reality tools and opening up new opportunities to optimize business processes of companies.

"The concept of" big data "has lost interest in real business, in the Gartner diagram, other technologies with a clearer and more understandable sound to the business took its place," said Svyatoslav Stumpf.

This, first of all, machine learning is a means of finding rules and connections in very large amounts of information. Such technologies allow you to not only test hypotheses, but also look for previously unknown influences. Storage and Parallel Access (NoSQL Database), Marshalling, Advanced Analytics with Self-Service Delivery. In addition, according to the expert, the tools of data mining (Business Intelligence and Data Mining), which reach a new technological level, remain important.

In the understanding of Yandex, according to the company's press service, big data has not disappeared or transformed anywhere. To process large amounts of data, the company uses the same technologies and algorithms that it uses in Internet search, the Yandex.Traffic service, in a machine translator, in a recommendation platform, in advertising. Algorithms are based on the ability of the company: to accumulate, store and process large volumes of data and make them useful to the business. The applications of Yandex Data Factory are not limited - the main thing is that there is data for analysis. The focus of the company on October 6, 2015:

More data - no better

Big data and customer price discrimination

Below are selected snippets from an article by Morgan Kennedy published on February 6, 2015 on the InsidePrivacy website on the issue of protecting[5] privacy[6].

On February 5, 2015, the White House published a report discussing how companies use big data to set different prices for different buyers - a practice known as "price discrimination" or "personalized pricing." The report describes the benefits of "big data" for both sellers and buyers, and its authors conclude that many of the problematic issues that arose from the emergence of big data and differentiated pricing can be resolved within the framework of existing anti-discrimination and consumer protection laws.

The report notes that at this time there are only isolated facts showing how companies use big data in the context of individualized marketing and differentiated pricing. This information shows that sellers use pricing methods, which can be divided into three categories:

  • study of the demand curve;
  • Steering and differentiated pricing based on demographic data; and
  • targeted behavioral marketing (behavioral targeting) and individualized pricing.

Study of the demand curve: Marketers often conduct experiments in this area in order to clarify demand and study consumer behavior, during which customers are randomly assigned one of two possible price categories. "Technically, these experiments are a form of differentiated pricing, as they result in different prices for customers, even if they are" non-discriminatory "in the sense that all customers have the same probability of" getting "at a higher price."

Steering: This is the practice of presenting products to consumers based on their belonging to a certain demographic group. So, the website of a computer company can offer the same laptop to different types of customers at different prices, set on the basis of the information they reported about themselves (for example, depending on whether the user is a representative of government bodies, scientific or commercial institutions, or a private person) or their geographical location (for example, determined by the IP address of the computer).

Targeted behavioral marketing and individualized pricing: In these cases, customers' personal data is used for targeted advertising and individualized pricing of certain products. For example, online advertisers use data collected by advertising networks and through third-party cookies about user activity on the Internet in order to target their advertising materials. This approach, on the one hand, makes it possible for consumers to receive advertising of goods and services of interest to them. However, it may cause concern for those consumers who do not want certain types of their personal data (such as information about visiting sites related to medical and financial issues) to be collected without their consent.

Although targeted behavioral marketing is widespread, there is relatively little evidence of individualized pricing in an online environment. The report suggests that this may be due to the fact that appropriate methods are still being developed, or the fact that companies are in no hurry to use individual pricing (or prefer to keep quiet about it) - perhaps for fear of a negative reaction from consumers.

The authors of the report believe that "for the individual consumer, the use of big data is undoubtedly associated with both potential returns and risks." While acknowledging that issues of transparency and discrimination arise in the use of big data, the report argues that existing anti-discrimination and consumer protection laws are sufficient to address them. However, the report also emphasizes the need for "ongoing control" when companies use confidential information in an opaque manner or in ways that are not covered by the existing regulatory framework.

This report is a continuation of the White House's efforts to study the use of big data and discriminatory pricing on the Internet, and the corresponding consequences for American consumers. Earlier it was reported[7] that the White House Big Data Working Group published its report on this issue in May 2014. The Federal Trade Commission ( FTC) also addressed these issues during its September 2014 FTC[8]

2014

Gartner dispels Big Data myths

The fall 2014 analysis note Gartner lists a number of myths about Big Data that are common among IT managers and refutes them.

  • Everyone is implementing Big Data systems faster than us

Interest in Big Data technologies is record high: 73% of organizations surveyed by Gartner analysts this year are already investing in relevant projects or are going to. But most such initiatives are still in the very early stages, and only 13% of respondents have already implemented such solutions. The most difficult thing is to determine how to extract income from Big Data, decide where to start. Many organizations are getting stuck in the pilot phase because they cannot link new technology to specific business processes.

  • We have so much data that there is no need to worry about minor errors in them

Some IT managers believe that small data flaws do not affect the overall results of the huge volume analysis. When there is a lot of data, each error individually really affects the result less, analysts note, but the errors themselves become more. In addition, most of the analyzed data are external, unknown structure or origin, so the probability of errors increases. Thus, in the world of Big Data, quality is actually much more important.

  • Big Data technologies will eliminate the need for data integration

Big Data promises the ability to process data in an original format with automatic circuit shaping as you read it. It is believed that this will allow the analysis of information from the same sources using several data models. Many believe that this will also enable end users to interpret any dataset themselves at their discretion. In reality, most users often need a traditional method with a finished schema, where data is formatted appropriately, and there are agreements on the level of integrity of the information and how it should relate to the use case.

  • Data warehouses make no sense to use for complex analytics

Many information management system administrators believe that it makes no sense to spend time creating a data warehouse, given that complex analytical systems use new types of data. In fact, many complex analytics systems use information from the data warehouse. In other cases, new data types need to be further prepared for analysis in Big Data systems; You have to make decisions about the suitability of the data, the principles of aggregation and the required level of quality - such preparation can occur outside the repository.

  • Data warehouses will be replaced by data lakes

In reality, suppliers are misleading customers by positioning data lake as a replacement for storage or as critical elements of the analytical infrastructure. The underlying technologies of data lakes lack the maturity and breadth of functionality inherent in repositories. Therefore, managers responsible for data management should wait until the lakes reach the same level of development, according to Gartner.

Accenture: 92% of those who implemented big data systems are satisfied with the result

According to a study by Accenture (Aksencher) (fall 2014), 60% of companies have already successfully completed at least one big data project. The vast majority (92%) of representatives of these companies turned out to be quite a result, and 89% said that big data has become an extremely important part of the transformation of their business. Among the remaining respondents, 36% did not think about introducing this technology, and 4% have not yet completed their projects.

The Accenture study involved more than 1,000 company executives from 19 countries. The study was based on data from the Economist Intelligence Unit survey among 1,135 respondents worldwide[9].

Among the main advantages of big data, respondents named:

  • "search for new sources of income" (56%),
  • "Improving customer experience" (51%),
  • "new products and services" (50%) and
  • "the influx of new customers and the preservation of the loyalty of the old" (47%).

When introducing new technologies, many companies faced traditional problems. For 51%, security became a stumbling block, for 47% - the budget, for 41% - the lack of necessary personnel, and for 35% - the difficulty in integrating with the existing system. Almost all surveyed companies (about 91%) plan to soon solve the problem with a shortage of personnel and hire specialists for big data.

Companies are optimistic about the future of big data technology. 89% believe that they will change the business as much as the Internet. 79% of respondents noted that companies that do not deal with big data will lose their competitive advantage.

However, the respondents disagreed on what exactly should be considered big data. 65% of respondents believe that these are "large data files," 60% are sure that this is "advanced analytics and analysis," and 50% are "visualization tool data."

Madrid spends 14.7 million euros on big data management

In July 2014, it became known that Madrid will use big data technologies to manage urban infrastructure. The project cost is 14.7 million euros, the basis of the implemented solutions will be technologies for analyzing and managing big data. With their help, the city administration will manage work with each service provider and pay it accordingly depending on the level of services.

We are talking about contractors of the administration who monitor the state of the streets, lighting, irrigation, green spaces, clean the territory and export, as well as garbage processing. During the project, 300 key indicators of the efficiency of urban services were developed for specially allocated inspectors, on the basis of which 1.5 thousand different inspections and measurements will be carried out daily. In addition, the city will begin using an innovative technology platform called Madrid iNTeligente (MiNT) - Smarter Madrid.

Read more: Why Madrid analytics and big data?

2013

Experts: Big Data Fashion Peak

Without exception, vendors in the data management market at this time are developing technologies for Big Data management. This new technology trend is also being actively discussed by the professional community, both developers and industry analysts and potential consumers of such solutions.

As Datashift found out, as of January 2013, the wave of discussions around the "big data" exceeded all conceivable sizes. After analyzing the number of Big Data mentions on social networks, Datashift estimated that in 2012 this term was used about 2 billion times in posts created by about 1 million different authors around the world. This is equivalent to 260 posts per hour, with a peak of 3070 mentions per hour.

Discussions of Big Data on the network are very active. Moreover, as can be seen from the pie charts presented above, the peak of discussions is only increasing: if in the first quarter of 2012 there were more than 504 thousand references to the term, then in the fourth quarter there were already more than 800 thousand. The main topics of discussion in relation to big data are myths and reality, experience in use, the human factor, return on investment, new technologies. Among the vendors, Apache, 10gen, IBM, HP and Terata were most often mentioned.

Gartner: Every second CIO is ready to spend on Big data

After several years of experiments with Big data technologies and the first implementations in 2013, adaptation of such solutions will increase significantly, they predict in the Gartner[10]. Researchers interviewed IT leaders around the world and found that 42% of respondents have already invested in Big data technologies or plan to make such investments in the coming year (data as of March 2013).

Companies have to spend on big data technologies, as the information landscape is rapidly changing, requiring new approaches to information processing. Many companies have already realized that large amounts of data are critical, and working with them can achieve benefits that are not available when using traditional sources of information and how it is processed. In addition, the constant musing of the topic of "big data" in the media is fueling interest in relevant technologies.

Frank Buytendijk, vice president of Gartner, even urged the company to moderate the fervor, as some show concern that they are lagging behind competitors in the development of Big data.

"You
should not worry, the opportunities for implementing ideas based on big data technologies are virtually limitless," he said.

According to Gartner forecasts, by 2015, 20% of Global 1000 companies will take a strategic focus on the "information infrastructure."

In anticipation of the new capabilities that big data technologies will bring with them, many organizations are already organizing the process of collecting and storing various types of information.

For educational and government organizations, as well as industry companies, the greatest potential for business transformation is laid down in a combination of accumulated data with the so-called dark data (literally - "dark data"), the latter include e-mail messages, multimedia and other similar content. According to Gartner, it is those who learn to deal with a variety of sources of information who will win the data race.

Cisco Survey: Big Data Helps Boost IT Budgets

In a study (spring 2013) called Cisco Connected World Technology Report, conducted in 18 countries by the independent analytical company InsightExpress, 1,800 college students and the same number of young specialists aged 18 to 30 years were interviewed. The survey was conducted to determine the level of readiness of IT departments to implement Big Data projects and get an idea of ​ ​ the associated problems, technological shortcomings and strategic value of such projects.

Most companies collect, record and analyze data. However, according to the report, many companies face a number of complex business and information technology problems in connection with Big Data. For example, 60 percent of respondents admit that Big Data decisions can improve decision-making and increase competitiveness, but only 28 percent said that they already receive real strategic benefits from the accumulated information.

More than half of IT managers surveyed believe that Big Data projects will help increase IT budgets in their organizations, as there will be increased demands on technology, personnel and professional skills. At the same time, more than half of respondents expect that such projects will increase IT budgets in their companies in 2012. 57 percent are confident that Big Data will increase their budgets over the next three years.

81 percent of respondents said that all (or at least some) Big Data projects would require cloud computing. Thus, the proliferation of cloud technologies can affect the speed of Big Data solutions and the business value of these solutions.

Companies collect and use a variety of types of data, both structured and unstructured. Here are the Cisco Connected World Technology Report sources:

  • 74 percent collect current data;
  • 55 per cent collect historical data;
  • 48 percent take data from monitors and sensors;
  • 40 percent use the data in real time and then erase it. The most common use of real-time data is in India (62 per cent), the United States (60 per cent) and Argentina (58 per cent);
  • 32 percent of respondents collect unstructured data - for example, video. In this area, the leader: 56 China percent of respondents collect unstructured data there.

Nearly half (48 percent) of IT managers predict a doubling of their network load over the next two years. (This is especially true in China, where 68 percent of respondents hold this view, and Germany - 60 percent). 23 percent of respondents expect the network load to triple over the next two years. At the same time, only 40 percent of respondents declared their readiness for an explosion-like increase in network traffic.

27 percent of respondents recognized that they needed better IT policies and information security measures.

21 percent needs bandwidth expansion.

Big Data opens up new opportunities for IT departments to build value and form close relationships with business units, allowing them to increase revenues and strengthen the financial position of the company. Big Data projects make IT a strategic partner of business units.

According to 73 percent of respondents, it is the IT department that will become the main locomotive for implementing the Big Data strategy. At the same time, according to respondents, other departments will also be involved in the implementation of this strategy. First of all, this applies to the departments of finance (24 percent of respondents named it), research (20 percent), operational (20 percent), engineering (19 percent), as well as marketing (15 percent) and sales (14 percent).

Gartner: Big Data Management Needs Millions of New Jobs

Global IT spending reached $3.7 billion by 2013, which is 3.8% more than information technology spending in 2012 (year-end forecast is $3.6 billion). The big data segment will develop at a much faster pace, according to the report of Gartner[11]

By 2015, 4.4 million information technology jobs will be created to serve big data, of which 1.9 million jobs will be in the United States. Moreover, each such workplace will entail the creation of three additional jobs outside the IT sphere, so that in the United States alone in the next four years, 6 million people will work to maintain the information economy.

According to Gartner experts, the main problem is that the industry does not have enough talent for this: both the private and state educational systems, for example, in the United States are not able to supply the industry with enough qualified personnel. So from the mentioned new jobs in IT personnel will be provided only one of three.

Analysts believe that the role of nurturing skilled IT personnel should be taken directly by companies that are in dire need of them, as such employees will become a pass for them in the new information economy of the future.

2012

The first skepticism about Big Data

Analysts at Ovum and Gartner suggest that for the fashion topic in 2012, big data may come a time of liberation from illusions.

The term "Big Data," at this time, usually refers to the ever-growing amount of information coming online from social media, from sensor networks and other sources, as well as the growing range of tools used to process data and identify important business trends based on it.

"Due to the hype (or despite it) regarding the idea of ​ ​ big data, manufacturers in 2012 looked at this trend with great hope," said Tony Bayer, an analyst at Ovum.

Bayer said that the company DataSift conducted a retrospective analysis of big data references on Twitter for 2012. By limiting the search to manufacturers, analysts wanted to focus on the perception of this idea by the market, and not by a wide community of users. Analysts revealed 2.2 million tweets from more than 981 thousand authors.

These data differed from country to country. Although it is generally accepted that the US leads the way in the number of installed platforms for working with big data, users from Japan, Germany and France were often more active in discussions.

The idea of ​ ​ Big Data attracted so much attention that even the business press, and not only specialized publications, wrote about it widely.

The number of positive reviews of big data from manufacturers was three times the number of negative, although in November, due to HP's purchase of Autonomy, there was a surge in negativity.

The concept of big data awaits much harsher times, although, having passed them, this ideology will reach maturity.

"For big data supporters, the time is coming to part with illusions," explained Svetlana Sikular, an analyst at Gartner. She referred to the mandatory stage, which is part of the classic popularity cycle curve (Hype Cycle), which is used in Gartner.

==

==

==

==

2011

==

==

==




==

==

==

==

Notes