2022/08/02 10:44:18

Data processing in deep neural networks: achievements and calls of the current moment

Of course, today's AI mechanisms do not know how to "think" and make decisions at the human level. However, along this path, AI can achieve impressive success, doing what a person cannot, for example, process huge amounts of data in almost real time. It was these capabilities that became the basis for the powerful development of machine learning mechanisms in its current most popular form - deep neural networks (GNS or DNN - Deep Neural Network). The article is included in the TAdviser review "Artificial Intelligence Technologies"

The GNA finds the correct mathematical transformation method to turn the outgoing data into outgoing data, regardless of linear or nonlinear correlation. The network moves through the layers, calculating the probability of each exit. For example, the GNA, which is trained to recognize dog breeds, will go through a given image and calculate the likelihood that the dog in the image belongs to a certain breed, - explains Andrei Ostroukh in his monograph "Intellectual Systems" (Krasnoyarsk, Scientific Innovation Center, 2020).

From the point of view of data processing, the following happens: the stream to be processed (recognized) information enters the input layer, passes through the inner layers, and the results of information processing are output through the output layer of artificial neurons. In the inner layers, connections are established between the input and output signals of the neural network. The variability of the connections between the input and output of such a network is ensured by the difference in the sensitivity thresholds of the input and output layers, which are established and corrected during the training of the network.

Source: habr.com/ru/post/456186/

Additional levels allow elements to be composed from lower levels, potentially modeling complex data with fewer units than a small network with similar metrics.

It is understood that the composition of particular nonlinear layers depends on the problem being solved. This uses both hidden layers of the neural network and layers of complex logical transformations.

Deep learning today is therefore algorithmachine learning for simulating high-level abstractions using numerous nonlinear transformations. It is with these transformations that the researchers experiment, choosing the best functions for specific tasks.

What else can be "squeezed" out of the GNA data?

Automation of GNA training

A fairly representative training sample is prepared in advance - a set of pairs of input and output signals. The input data of the training sample is then sequentially inputted into the network to obtain the output data of the network, which is then compared with the output data of the training sample. If they match, then the network is considered trained and no adjustment of connections within the network is made. Otherwise, these connections are corrected and the learning process is repeated until the required accuracy of matching the network output with the training sample output is achieved.

The main feature of these artificial neural networks is that they are focused on using examples (precedents) or samples of acceptable execution of the objective function. When designing them, you do not need to formalize the process of solving the problem. It is only necessary to prepare a sufficiently representative sample of training examples and conduct system training on its basis.

Source: Source: Artificial neural networks and applications: tutorials. Manual/F.M. Gafarov, A.F. Galimyanov. - Kazan: Publishing house Kazan. un-ta, 2018. -121 s.

The development of technologies for training neural networks for new tasks is a separate extremely important problem, and it is obvious that without such technologies there is no artificial intelligence, - emphasizes Dmitry Nikolaev, Ph.D., technical director of Smart Engines.

If we are able to quickly synthesize data, then when we receive a new task, we can formally, even if we have not seen a single such example from life, synthesize data in advance, "says Dmitry Nikolaev. - And if our modeling system had fairly accurate data, was realistic, then we will train AI in advance to solve problems even before they arose.

Image synthesis

One of the most important breakthroughs of AI in 2021, experts call the creation by OpenAI of a neural network of DALL·E, which generates images from a text description in natural language.

DALL·E is a version of GPT-3 (the third generation of the natural language processing algorithm) trained to generate images from text descriptions on a datacet of text-image pairs. OpenAI has created several DALL·E options: 125 million to 175 billion features. The results of the new program impressed the public with how creative it is in creating new images.

The result of the work of the DALL·E at the request of "Snail-harp"

Source: habr.com/ru/post/536236/

Despite the word "Open," the GPT-3 model is software with closed program code, access to which must be purchased for money. But the SberDevices team, based on an article about the development of OpenAI and the GPT-2 model code, by the fall of 2021 developed a Russian-language analogue of the DALL·E called ruGPT-3 (ruDALL-E Kandinsky) with, indeed, open source. It uses the power of the Christofari supercomputer to support five software options: from 125 million to 13 billion features (the model of the highest level is not open).

The era of "great unification" of language models, computer vision and generative networks is coming. What we see now already amazes the imagination with its results, not to mention how much such approaches can change the process of generating content, exclaims Mikhail Konstantinov @ Dirac on habr.com/ru/.

Figuratively speaking, today's AI is a child in "short pants" who is only learning to take the first steps. The child is certainly smart and promising, but it is still completely premature to entrust him with serious "adult" affairs.

For example, Alice and other voice assistants are called artificial intelligence. But everyone who communicated with them knows: it is worth asking a more or less non-standard question - and you get a surprisingly illogical answer. When we talk with a smart speaker for fun, such answers do not affect our lives in any way, except to cheer up. But if a doctor relies on incorrect AI advice, the consequences for the patient can be catastrophic.

The faces that AI paints based on studying photos of many people look as if they are alive. "Sort of" here is the key word. In fact, we immediately recognize some sixth feeling - this is a model of the face, and not an image of the face of a real person. Eerie believable imitation that has nothing to do with reality. Probably, portraits, sculptures and photographs convey the image of a real person imperfectively. But there is humanity in these imperfections.

With how to give the generated entities even more naturalness, no doubt the researchers will figure it out further. However, even without this, modern neural systems have accumulated a sufficient load of problems that require an operational solution.

Problem points of modern GNA

In the ML/DL community, there has already been a strong belief that the larger the model, the better the results it produces. A striking example is scoring models, with the help of which banks assess the solvency of potential borrowers. For example, Mobile Scoring, which offers scoring services, uses 3.5 thousand borrower parameters, according to its CEO Vitaly Shchipkov.

The main thing due to which AI achieves excellent results in scoring is the amount of disparate data that is collected and analyzed by different components of systems, "says Ivan Barchuk, Director of the Data Collection, Storage and Analysis Department of VS Lab. - The more data, the more sophisticated combinations of them can build AI, identify more patterns and connections, learn about the client maximum necessary information.

We are talking information about the client's employer, both from commercial databases (,, Unified State Register of Legal Entities, Unified State Register SPARK Contour Focus of Legal Entities), and from and. MEDIA From social networks the databases, information is taken about the state of the company, its activity in public procurement, well-known data on the turnover and number of the company. And in the media and social networks, rumors are collected about the deteriorating or improving situation of the company, information about scandals related to the director, owner or founders. The bank Data Lake also receives information from social networks about the composition of friends, information about which of them is already a bank client, confirmation of family ties, groups in which a person is a member, etc.

OSINT systems are offered on the market, for example, IQPLATFORM, which provide information of this kind in marked form. It remains for corporate AI algorithms to pick it up from the OSINT system and run it into processing, "notes Ivan Barchuk.

Another source of information is information about customers from the banking ecosystem: grocery delivery, taxi orders, online purchases, hobbies, family composition - all this data is used for analysis. Also, sources can be data from exchanges engaged in the sale of cookies, because they collect all the interests of the bank's client, his search queries and much more. With this data, AI can, for example, find out that the borrower is a regular client of an online casino, and he decided to take out a loan.

• Big data challenges. Vladimir Kozlov, an expert in the field of risk assessment of the financial sector, draws attention to the fact that the volume of available open data on people allows you to form almost any sample of clients on any property/non-property basis. People literally turn out to be "undressed," from the point of view of information security. Perhaps it is worth raising the question of closing the information? - the expert asks the question. In addition, with large amounts of data on people - a lot of headaches.

For example, the development and testing of machine learning models is usually carried out on real data, for which Data Science analysts need access to as much data as possible, which cannot be clearly defined in advance. This means that customers need to create specialized industrial environments with a special information security mode to reduce the risks of data breaches.

In addition, a large amount of data is simply expensive, in terms of the cost of IT infrastructure for storing and processing large amounts of information. That is why Lydia Khramova, Lead data scientist at QIWI, laments the passion of data Saintists for huge sets of signs: "It is better to use 50-70 carefully selected parameters than to control the risks of degradation of five thousand signs."

• Data quality issues. Perhaps the key problem of ML models today is a decrease in the quality of the models, which can sometimes manifest itself literally a few days after the model is put into operation. Various factors affect the degradation of the model. For example, minor changes in the structure and static properties of data, which are difficult to notice or difficult to assess their impact within the traditional approach to analytics, can significantly affect the quality of work of machine learning models.

Therefore, the model parameters must be constantly "tuned," moreover, this is done by trial and error. Yury Sirota, Senior Vice President, Chief Analytics Officier of URALSIB Bank, notes that it is important to focus not on mathematical metrics, although the work goes with mathematical models, but on business metrics. So, for example, an excessive emphasis on short-term patterns in data can damage the accuracy of the model in the future. Therefore, those decision-making models that directly affect the financial results of the company should be updated more often and more quickly. But today such work is akin to art.

In a word, in elementary situations with a simple and clear scenario, AI already helps, but where you need to look at the situation from different angles of view, it is still powerless. Therefore, there are few cases of really useful use. Moreover, there are examples of large-scale failures, says Valery Andreev.

Artificial intelligence, as a rule, makes a decision in large systems. A small error that creeps into the algorithm can lead to very big consequences. In our practice, we lost a lot of money on this. Due to the fact that the car made a small mistake on large volumes, we lost billions of rubles.

ML Model Lifecycle

Organizations should consider using models (ModelOps) to operate AI solutions. ModelOps reduces the time it takes to move AI models from pilot project to production with a principled approach that can help deliver a high degree of success. The ModelOps approach also offers a system for managing and managing the life cycle of all AIs (graphs, linguistic, rule-based systems, and others) and decision models.

We are talking about the pitfalls and routine associated with the life cycle management of ML models: So, a timely update of the model is necessary. You have to constantly look for new data that can be useful for the operation of models and support their relevance, provided that the parameters of the situation under study can change over time. At the same time, the standard approach of IT specialists - automatic overtraining on a schedule - does not work here, since the model's loss of its predictive properties is a thing that practically cannot be formalized. This is an area akin to art, where the role of personal talent of dataceientists and business analysts is great.

In addition, the organization must clearly formulate the criteria for the quality of ML models not only in terms of mathematical methods, but also from a business perspective. And traditionally, Vladimir Kozlov notes, for example, scoring is considered, first of all, as a fight against the source data. But this is because the data is usually handled by the IT department, and the business uses the results of the models. Linking the quality of models to business metrics is an area with which the corporate sector has yet to learn to work.

How to cope with model risks

VTB To reduce model risks, the bank has built a whole internal control system of three lines of protection. So, at the first level, the quality of the models is handled by the model developers themselves. On the second - an independent division of validation and management of model risk. It independently checks the developed AI models. And on the third line, there is a separate internal audit unit that monitors compliance with internal standards by employees on the first two lines.

Pavel Nikolaev, Managing Director of the Integrated Risk Department of Otkritie Bank, compares the set of different ML systems in the bank with a patchwork: different models created using various tools, different groups of data scientists work at the same time. This state of affairs causes a specific risk: teams can interfere with data, breaking the necessary connections.

The solution is the implementation of the company's IRIS enterprise-scale ML modeling platform. Neoflex According to the Lina Chudnova head of the Fast Data business area DevOps and Neoflex's practice, the idea of Continuous Integration/Continuous Delivery/Continuous Training models based on a common model repository and a single platform for all business units of the bank has been implemented. This approach allows a platform with to be integrated all credit pipelines, while at the same time ensuring the right level of decentralization of work with models - each data scientist team has at its disposal a piece of the common space for working with intelligent models, that is, its own environment with a dynamic expansion of resources for training the model.

Research company Mediascope, together with Neoflex, launched the Data Science platform for the development and implementation of machine learning models into commercial operation. According to the company, Mediascope has received a scalable and manageable space for developing ML models, which allows you to quickly connect internal teams of data scientists with the ability to evaluate the results of their work. The company will also be able to attract external ML teams quickly and with minimal labor costs. A centralized catalog of ready-made pipes is available to all specialists, which will make it possible to reuse the ready-made components.

At the same time, the platform architecture provides an automated process for developing and implementing models, their transfer to the industrial environment, and also provides tools for visualizing experimental metrics. The platform is based on the Kubeflow SPO, which provides centralized development tools ML models, pipes and managing artifacts. Argo Workflow is also used - a developed workflow orchestrator on Kubernetes, which is part of Kubeflow - it facilitates the process of using the developed models.

At VTB, the correctness of AI models helps to track a special automated model monitoring system: internal accounting of developed models is carried out in the model management system, and the rules and process of interaction of departments are regulated by the approved standard model life cycle.

The high level of consumption of computing resources by solutions based on ML models inevitably leads to the idea of cloud services. Thus, the solution for fast processing of ML-models of Neoflex can be equally easily deployed both on the local infrastructure of the bank and in the cloud: AWS, Yandex Cloud, Mail Cloud Solutions.

Their supercomputer cloud was opened to all developers Sberbank and: SberCloud a cloud platform ML Space based on computing power supercomputer "Christofari" with more than a thousand GPU is intended for developers of AI services of any scale. The ML Space platform is focused on the full cycle of developing application solutions based on machine learning and the collaboration of data teams to create and deploy machine learning models. The ML Space architecture is formed from integrated service modules, each of which is designed to solve certain problems: storage, analysis, access and lifecycle control of data, datacets, models, and Docker containers more.

Decentralized machine learning

Growing volumes of machine learning models lead to the formation of a new problem point: MLs on large models are able to create and support only a few very large technology companies.

In order to make machine learning a more accessible and democratic process, AI researchers at Microsoft have published open source code for a decentralized and collaborative AI project on the blockchain. Thanks to transparent accounting and organic cooperation at all stages of the machine learning life cycle, Microsoft says, it is possible to simplify the control of new versions of the model as much as possible, including the correlation of specific changes with a certain productivity. The solution was called "Decentralized and Collaborative AI (Decentralized & Collaborative AI, DCAI) on the Blockchain."

DCAI is a framework for hosting and training machine learning models on the blockchain infrastructure. The current version of DCAI is based on Ethereum and uses smart contracts to implement training mechanisms in machine learning models. From a functional point of view, DCAI structures the process of adding data/training a machine learning model based on three main components:

Motivation mechanism. This component should facilitate the entry of high-quality data.
DataHandler. The component stores data and metadata in the blockchain.
Model. The component contains a specific machine learning model that is updated according to predetermined training algorithms.

DCAI Structures Data Addition/Training Process of Machine Learning Model

Source: Microsoft, Habr

DCAI is not the only relevant initiative in the field of decentralized AI/ML. Several more similar initiatives are being implemented in the world:

SingularityNet. The platform developer is a company known for developing software for the popular Sofia robot. Built on the Ethereum blockchain, SingularityNet provides a model where various network members are motivated to implement or use AI services.
Ocean Protocol. It offers a decentralized network of data providers and consumers that allows you to implement and use AI applications. Ocean implements many traditional infrastructure elements of AI applications, such as storage, calculations and algorithms, through a tokenized service layer using the main components of decentralized AI programs.
Erasure. Created by innovative hedge fund Numerai, Erasure offers a decentralized protocol to create and run predictive models. Erasure's goal is to provide a decentralized platform where data specialists could download forecasts based on available data, bet on them using cryptocurrency tokens and receive rewards based on the effectiveness of forecasts.
OpenMined. One of the most active projects in the decentralized AI market. This is an ecosystem of tools and frameworks for implementing decentralized AI applications. OpenMined managed to form a very active developer community and ensure coordinated integration with mainstream machine learning technologies.

Next Overview Material > >
> Browse Home > > >