RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2017/08/28 17:21:13

Why Data Scientist is more sexual, than the BI analyst

Due to growth of popularity of data science[1](DS) there are two obvious questions. The first – in what consists qualitative difference of this recently created scientific direction from the business intelligence direction (BI) existing several decades and actively used in the industry? The second - perhaps more important from the practical point of view - with what functions of specialists of two related specialties data scientist and BI analyst differ? In the material prepared especially for TAdviser these questions are answered by the journalist Leonid Chernyak.

Content

The appeal to Network with requests "data science vs. business intelligence" and "data scientist vs. BI analyst" allows to detect great variety of answers to them. However, without having satisfied with them, we will try to expand this set with own answers constructed on the basis of "pyramidal model the" of DIKW integrating data, information, knowledge and deep knowledge or the truth (data, information, knowledge, wisdom).

Differences between Data science and BI

Data science and BI differ on the end result received by methods of work with data, characteristic of them. For BI the resulting product is information, and for data science as such product serves knowledge. The general is that both information, and knowledge are taken from data with participation of the person and are intended for transfer to other person.

From BI to data science. Reporting is reports generation, Modeling is modeling, Decision Making is development of solutions, Understanding is understanding

Therefore in both cases the crucial role is played by specialists. Without their intelligence and ability generally such transformation is impossible. In a limited number of applications information, but, we will emphasize, not knowledge at all, it can be received by artificial intelligence techniques.

The second factor defining distinctions – completeness of the used data. Any data, eventually, are reflection of the world around, but these reflections can differ on completeness of representation. In BI only the structured digital data giving very limited picture of the world around are used, and in data science any data sufficient for reflection of a picture of the world around with any required completeness can be used.

Data scientist prototype

Edwards Deming is "father" of the Japanese economic miracle, the expert in the field of management and statistics (1900 - 1993) - one of the first adapted statistical techniques for production efficiency evaluation therefore it is quite often called data scientist prototype.

The most known aphorism of Edwards Deminga: "In god our hope, and all the rest to us will be brought by data"

The Deming left behind big scientific heritage and still a surprising set of aphorisms. Among them is and devoted to data. His opinion on data is disclosed, for example, by the following thoughts:

  • "Without data you only one more person with the opinion" (Without data you’re just another person with an opinion)
  • "The most important things cannot be measured" (The most important things cannot be measured)
  • "The digits, most necessary for management, were and will remain unknown" (The most important figures that one needs for management are unknown or unknowable)

Evolution of BI and emergence of Data science

Those who are engaged in business, understand as data not the bits and bytes written on carriers, and digital indicators, ready to use, placed in data warehouses. Quite so for many years these followers of Edwards Deminga interpret. Also specialists in quantitative analysis (quantitative analysts) work with such data statistics, they are called still by quants.

There is a set of determinations of BI, among them:

  • BI is not a product and not a system. Most likely, this some architectural construction or a set of the interconnected means and also applications supporting decision making and databases which provide to business community simple access to business data.

  • The scope of the BI-applications supporting decision making extends to the different actions connected with forecasting, the analysis of business processes, preparation of balance statements.

Essence their one — the BI systems are intended for the translation of data from a form clear to the machine, in such representation which allows the person to take from them a maximum of useful information.

In the course of evolution of BI three generations were replaced:

  • The generation of Business Intelligence 1.0 demanded participation of the competent specialists using difficult tools, working by big machines in the mode the client-server. Usually reports generation was performed on monthly base.

  • The generation of Business Intelligence 2.0 opened an opportunity for work of applied specialists (data explorers). The efficiency increased up to week or even day level and, limited opportunities for forecasting appeared.

  • The generation of Business Intelligence 3.0 continued democratization process. Now as the user different employees, from accounting to managers of level C could act (the CEO. CFO …). Speed of preparation of reports approached real time.

In the late nineties, when there was a problem of explosive growth of data (data deluge), a variety of the stored data increased and considerably technologies for aggregation, the analysis and preparation of reports on the basis of diverse sources improved, there was a direction which received the name New Business Intelligence (NBI). His creators aimed to integrate knowledge management (Knowledge Management, KM) and BI.

At the beginning of the 2000th future BI seemed as it is shown in the drawing below.

Expected convergence of KM and BI

But this expected development changed with the advent of what is called Big Data of which not only huge volumes are characteristic, but also a bigger variety is essential. And here then there was the fact that called Data science. This direction became the response to emergence of Big Data.

Under the general umbrella the various processes serving for extraction, collecting and processing of large volumes of the most different data integrate. Let's emphasize that data science is not "data science" as it is written in Russian-speaking Wikipedia. Data are not a subject of this science therefore data science a synonym of datalogy science offered by Peter Naur to call absolutely mistakenly. At the same time about data science it is fair to say as about science, in that sense that it represents a complex of scientific methods for extraction of knowledge from data.

Therefore into data science Russian, perhaps, it would be worth translating as "science of work with data" or "scientific methods of work with data". Therefore, the task solved by those who are engaged in data science consists in extraction of knowledge using the methods integrated under the general name data mining in consolidation of statistics and other methods of data analysis for the purpose of understanding that comprise data.

From an instrumental position the complex is significantly wider and deeper from the scientific point of view, than tools for BI. He turns on various statistical packets, SQL, Hadoop, languages R, Python, Perl and others.

Structure of technologies for development of the data used in data science: databases, statistics, visualization, other disciplines, information science, machine learning

As the note to an illustration it should be noted that information science is the independent discipline, information science, the exact science studying the analysis, collecting, classification, manipulation, storage, search, the movement, distribution and data protection. It should not be confused with information science or a mathematical information theory.

A specific place is held supporting data science data engineering or otherwise data wrangling and data munging technologies. So call process of preparation of crude data for accomplishment of the subsequent analytics over them, conversion of the crude data which are stored in any any formats to required for analytical applications.

Despite novelty of the technologies which are standing up for data engineering, their economic value it is already now highly appreciated and constantly grows. As the price of storage and processing steadily decreases, and the cost of work of analysts, on the contrary, steadily grows — there was an urgent need to optimize work of specialists of data scientist.

Why Data scientist is more sexual, than BI analyst

The general public like huge interest in hardly arisen specialty in 2012, especially after it was called "The Sexiest Job of the 21st Century", i.e. the most attractive work of this century. It is the quote from article in Harward Business Rewiew of Thomas Davenporta, one of the most recognized experts in a part[1].

And I that thought that it at me the most attractive work of the 21st century!

There was improbable demand for data scientist, such that necessarily the marvelous Georgian movie "Blue Mountains" where the idiot official Vazha Zazayevich demands to himself "a couple of good surveyors" comes back to memory, without understanding sense of this profession. Now in deficit of data scientist and their work is paid one and a half-two times above, than good old business intelligence analyst.

Nature abhors a vacuum, hundreds immediately were engaged in training in the data scientist profile if not thousands of the universities. Who does not want to become The Sexiest? However experienced professionals drop a hint of doubt in an opportunity "teach" the student on data scientist for several years as this type of activity requires the whole complex of knowledge and abilities.

Some R and/or Python rates, studying of these or those tools "this expert in data" (perfect data scientist) is absolutely not enough for acquisition of full-fledged qualification. Qualification requirements to data scientist are provided in detailed versions of a popular three-unit Venn diagram.

Looking at the drawing, it is not difficult to understand that no educational institution can enclose all required complex of knowledge in the student. It is possible to receive it only for years of purposeful independent work.

The unrolled Venn diagram in the annex to qualification of data scientist

Having all this knowledge and skills, using a set of versatile and specialized tools and various, mostly unstructured data sources, data scientist should provide to leadership team of the top management (S-level) informative answers about what occurs now and what can be expected in the future. He as equals participates in development of solutions.

Recently the C-level list was replenished with positions of Chief data officer (CDO), Chief analytics officer (CAO) and Chief data science officer (CDSO). On this background the task of the specialist in the field of BI is more modest and is more traditional. It should, using first of all historical structured data from data warehouses and the known tools for the corporate analysis, to create reports on what occurred by this moment. It provides information for the persons making decisions.

The main difference between two specialties is in what data scientist should understand as well as that needs to be done while the expert in BI is capable to provide an objective picture of the past till a present situation.

More precisely to define differences in these two types of activity, we will return to the DIKW model (see in more detail). From this point of view the essence of activity of data scientist consists in transformation of crude data into knowledge, using various analytical methods in combination with own competence of a certain area. And the expert in business intelligence will only transform data to available to the persons making the decision, information in a report form and infographics.

These are two essentially different specialties, they differs on the used technologies, on immersion level on that as they can transfer results of the work to data domain, but the main thing to the client.

With methods information transfer more and more or less clearly, are usually used various report types, the including texts, infographics and various acceptances of modern interactive visualization.

With knowledge transfer the situation is more difficult. As of 2017 it is possible to speak about two possible technicians. One is under construction on promotion and discussion of hypotheses (Hypothesis-driven thinking). A support on hypotheses, as well as a scientific method, justify the use of a word science in the name data science.

Practically all existing scientific knowledge historically developed according to the same scheme. At first the first hypotheses are made, in the course of discussion it becomes clear that the most part from them is wrong, alternate hypotheses appear then and eventually objective knowledge is chopped off.

Data scientist works according to the same scheme, its functions consist not in work with data, and in promotion of business hypotheses and selection of the most reliable. Using the available data, it should come to reasonable conclusion.

For knowledge transfer to the customer of data scientist can use also the second acceptance which is called by data storytelling, i.e. the story about data. The story appears the most effective remedy for knowledge transfer and for transition from knowledge to action.

Principal components of data storytelling: Narraive is the narration, Explain is an explanation, Data is data, Change is actions, Enlighten is education, Engage is involvement

In the story, as shown in the drawing, principal components of data storytelling are combined. Integrating the narration with data, it is possible to explain how the outside world is reflected in data that occurs what ideas and judgments are of the greatest value. That this or that idea was properly estimated, it should be placed in a full-fledged context and is respectively commented.

Adding of visualization to data improves knowledge transfer. People can see what in a verbal or tabular form is unavailable to them. The narration combination to graphics provides the involvement, the same effect of presence, as at cinema is gained approximately.

The first steps for knowledge transfer were taken in promoting of public statements at the TED conference (Technology, Entertainment, Design) which is carried out since the end of the ninetieth years of the 20th century. On it come in order that in an available form to tell about serious. From Stanford in a narrative form of knowledge are better remembered in estimates of scientists. The special indicator "memorability" (memorability) is entered. Contents of the story to 63% of audience, and the given statistical data usually remember less than 5%.

Not casually in 2009 the chief economist of Google of Varian Challahs told:

File:Aquote1.png
Capabilities are crucial to take data, to understand them, to overwork, visualize and transfer to others. These abilities will become the major in the next decades
File:Aquote2.png

Summing up the result, it is necessary to tell that BI and data science can be provided as two poles on the general axis of technologies of work with data. On one pole from data information, on another – knowledge is taken. As in many cases in life - the border between them has indistinct character.

Robotics



Notes