RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

CST: Technology of cluster analysis of arrays of speech data

Product
Developers: MDG Innovations
Date of the premiere of the system: 2016
Technology: Call centers,  Speech technologies

The MDG Innovations company together with the Ministry of Education of the Russian Federation developed technology of cluster analysis – automatic structuring and understanding of big arrays of speech data. The technology is based on the basic principles of work with Big Data and uses the most successful modern methods of machine learning for implementation of algorithms.

Development will be applied in large contact centers and support services where the large volume of telephone recordings "client operator" daily collects and often there is a need to obtain information on structure, structure and contents of the database new, unfamiliar to the analyst.

The acquired information can be used for identification of the most frequent occasions of the appeal of subscribers to contact center, detection of communication between these addresses, determinations of volumes of clusters of such addresses, transfer of service on some types of addresses to the automatic mode (IVR), etc.

"Advantages of the technology developed by us are the possibility of automatic adaptation of the used algorithm to new data domain (training of a system at target selection happens without participation of the expert-analyst that does this system economic and more productive) and existence of specially developed algorithms of predata processing allowing to select the most informative semantic centers (so-called "patterns") of the client operator dialogs and to exclude not informative ("garbage") fragments of dialogs from consideration that considerably increases reliability and effectiveness", - the director of Group of CST Lyovin Kirill comments R&D.

The technology of cluster analysis is a part of the universal technique offered MDG Innovations based on such consecutive principles of information extraction (information retrieval) from unstructured arrays of speech data and their intellectual analysis (data mining) as:

The universal technique offered MDG Innovations is based on such consecutive principles of information extraction (information retrieval) from unstructured arrays of speech data and their intellectual analysis (data mining) as:

  • The cluster analysis (or "clustering") speech data assuming separation of an array of unstructured data into the clusters integrated by a general criterion (subject). The received clusters have the entitled hierarchical (treelike) structure that allows to solve real problems of analysts of contact centers as such data view fully reflects connectivity, enclosure and relative amount of data of different clusters (subjects). The algorithm is based on a method of machine learning "without teacher" (unsupervised learning) using algorithms k-means and LDA on each step of a hierarchical clustering.
  • Search and filtering of "statistical emissions" or "anomalies", i.e. sound recordings of the negotiations atypical for this selection by any criterion (for example, existence of a household talk (with relatives or acquaintances) among sound recordings of working negotiations). The algorithm is based on a method of machine learning "without teacher" (unsupervised learning) using the one-class-svm method.
  • Identification of the most significant words and phrases and the subsequent drawing up the text summaries comprising an informative component of the speech.