RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2010/04/19 17:57:27

Data mining Intelligent data analysis

The process of identifying hidden, useful facts and relationships in large data sets. Literally translates as "data extraction." Over time, the management of many companies has a problem: the more data they have in the clientele, the more difficult it is to analyze consumption and develop in the right direction. Intlectual data analysis is a powerful tool of Business Intelligence. The roll of Data Mining systems and projects is available on TAdviser.

Content

Categories

The data that was obtained by using data mining tools describe new relationships between properties, predict some feature values based on others. The tasks that Data Mining solves include:

  • Classification - Structuring Objects by Specified Classes
  • Association - Identifies associative chains. for the first time, this method was used to analyze the market basket of a typical consumer.
  • Clustering - Grouping events and observations into clusters. Based on properties describing the essence of the grouped events themselves
  • Prediction - prediction based on available data of possible development of events both progressive and regressive.
  • Analysis of changes - identification of typical situations, templates. This includes establishing patterns between various time events, as well as detecting dependencies and causal relationships.

Tasks

Intelligent data analysis can be used to solve any business problem that involves various information that changes over time, including:

  • Increase business or enterprise profitability
  • Analysis of desires and needs
  • Identify profitable customers and acquire new ones
  • Keep customers and increase loyalty
  • Increased return on investment (ROI) and lower promotion costs for goods and services
  • Selling additional goods and services to existing customers
  • Detection of fraud, misuse and misuse
  • Credit Risk Assessment
  • Increase the capacity of the trading unit and optimize the distribution of goods to increase sales
  • Overall Business Performance Monitoring

Data mining in the banking sector

Data mining allows you to obtain results that serve as the basis for making various business decisions. To increase their validity and increase the company's profit, a wide range of information is collected and analyzed. First of all, these are confidential data about customers that any company accumulates during the activity (age and family status of a person, preference for certain products, frequency of purchases, participation in various promotions and other parameters). By processing historical information about similar customers, the company can assess risks and predict the vital values ​ ​ of potential customers for whom no data is available. Moreover, factors of influence and resulting indicators can have both obvious and hidden connections.

Today, almost all adults have plastic cards tied to a bank account. Many have two such cards: one for salaries, the second for borrowed funds with a limited credit limit. Everyone knows what the maximum limit is, but not everyone understands how it is calculated[1]

In most cases, the maximum amount of the loan depends on the total credit history of the client. You can count on raising the credit limit if the conditions set by the bank are met. To do this, you must:

  • Produce a certificate of income;
  • Have a separate account in the bank to transfer salaries
  • Return borrowed funds regularly and in a timely manner.

When issuing the first credit card, many banks use two tools: official confirmation of income and credit history. Usually, in the absence of a certificate of income and credit history, the borrower can count on the approval of the same type of minimum credit limit, because in this case it is impossible to predict the occurrence of probable problems. Some banks additionally use data on marital status, seniority, availability of a vehicle, etc., to assess the customer's solvency. But still, these indicators of the stability of the borrower do not allow bankers to increase the credit limit without risks. Therefore, banks use such a way of collecting and analyzing information as big data analytics, which allows them to identify potentially unreliable borrowers.

The concept of credit assessment of bank customers in the middle of the last century was developed by the software company Fair Isaac Company (USA). A few years ago, experts from the same company proposed a method for assessing adherence to treatment, which reveals with what probability different patients will follow the prescriptions of a pharmacological course. This direction is under development, but already gives good results.

For example, it was found that the probability of fulfilling medical prescriptions increases if the patient has a car and a family, rarely changes his place of residence. Such data allow medical staff to be very likely to identify patients who will heed the doctor's recommendations and pass prescribed tests. Naturally, systematic use of drugs has no causal relationship with the presence of a car, but a high correlation indicator of historical data makes it possible to make high-precision forecasts. And analyzing information taking into account the likelihood of illness or death helps calculate the cost of patient insurance (or increase the credit limit for the borrower).

Open data, such as user accounts on large social networks, are of great importance for data mining projects. And this is fully justified - for example, last year Facebook totaled more than 850 million active users (a tenth of the world's population), which formed more than 100 billion connections. As a result, the analysis of information from the most significant social Internet sites allows you to get almost any data.

To assess the reliability of potential borrowers, Fair Isaac Company uses fifteen variables from the Facebook network. An anonymous startup predicts the likelihood that the borrower will pay off the loan based on the behavior of his friends in various situations. Such an analysis is based on up-to-date data and is carried out online, so that the banking specialist can use the information received to increase the credit limit.

In terms of obtaining the necessary data, the social network Twitter is also of great interest. Grip and DataSift, working with Twitter, access information about 100 million people who send about 250 million tweets (short messages, often without any connection). Although everyone can get access to tweets, the systematization of information and the organization of its aggregate sale is only possible for specialized companies. They can perform comprehensive data collection and analysis, summarize consumer feedback about goods or services and make a real assessment of the effectiveness of a particular advertising campaign. But there is another side to the coin - confidentiality. Almost all information that citizens transmit to commercial or non-profit companies is protected by state laws, regulations, contractual obligations. That is why it is very difficult, and most often impossible, to obtain additional data about a particular person in a legal way.

At the same time, information on social networks does not have such protection. As a result of analyzing a person's profile using certain algorithms, you can get a forecast of failure to fulfill obligations to repay debts, lose health or even commit a crime. In this case, the most likely will be a denial of a loan or health insurance. In fact, it turns out that a person receives a negative assessment for imperfect actions, which is a violation of the presumption of innocence. Here there is a problem: what is more important - to focus on the client or protect yourself from possible risk?

It is impossible to unambiguously answer the question posed. But we can say for sure that in the era of big data it is necessary to introduce effective legal norms that will allow to process and protect information on absolutely legal grounds. A similar situation arose in the past, when, due to the distribution of printing presses, laws had to be adopted to limit the freedom of the press (although before the mass appearance of newspapers and magazines, such a problem simply did not exist).

Differences between Process Mining and Data Mining

  • Data mining is primarily used to find hierarchical dependencies in large amounts of data. For example, in which channels which customer categories which product categories buy and how often.
  • Tables with heterogeneous data from different domains are input.
  • Uses multidimensional representations (cubes) with the ability to change the level of detail (different levels of aggregation) of information.

  • Process mining does not focus on the semantic relationships of data, but on the representation of data as processes.
  • Transaction data for accounting objects is input. Usually, such objects are (Tasks, Orders, Requisitions, Work Orders, and so on). An example of transactional data is event logs, audit trails, events and object states (be they object status or change of business unit).
  • Uses data sampling methods to build a process model from the most representative scenarios in the process. Process mining looks for more than just links between data: its task is to determine the links between process steps, deviations from the normal process, factors of deviation, process efficiency, process scripting, as well as bottlenecks in the process.

Players and Solutions

The roll of Data Mining systems and projects is available on TAdviser.

Notes