RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

Kryptonite: A combined technique for re-recognizing people

Product
Developers: Kryptonite NPC
Date of the premiere of the system: 2022/08/15
Technology: Information Security - Biometric Identification

The main articles are:

2022: Announcement of a combined methodology for re-recognition of people

On August 15, 2022, the company Kryptonite"" announced that its employee of the Department of Advanced Studies Nikita Gabdullin proposed an original approach to the problem of repeated identifications human (person reidentification, re-id), combining analytical methods with deep learning (, Deep Learning DL). It optimizes recognition people she has not seen before, that is, it contributes to the generalization of the model to others. data

Kryptonite has proposed a combined method for re-recognizing people. Photo: speechpro.ru.

According to the company, re-identification of a person (re-identification) refers to the tasks of AI, in which the neural network determines whether the image of a person provided to it corresponds to one of those recognized earlier. To do this, a set of parameters is created that should characterize a particular person as accurately as possible. Unlike facial recognition, re-identification considers a person as a whole (full-length), and the face is only one of the elements and plays a secondary role. Moreover, the method also works in cases where a person's face is not visible at all. We can say that re-identification and facial recognition complement each other.

In machine vision and video analytics systems, the source of data for re-identification is scenes with images of people taken from different angles, with different recording quality and under changing lighting conditions. Additional difficulties for analyzing images are created by changing hairstyles and hats, changing other elements of clothing and human poses in the picture.

In solving this problem, in August 2022, machine learning and especially deep learning technologies can achieve the highest level of accuracy. However, DL models only achieve proper results when the object for re-identification was already present in the training dataset. So that they are just as effective in analyzing objects, they have to be further trained. This requires a lot of time and computing resources, and sometimes it turns out to be practically unacceptable at all, since there may not be an up-to-date additional data set. In the context of re-identification, this means that the DL model will be less effective at recognizing people who were not in the database at the time of training - a situation that is commonplace in modern video surveillance systems.

Nikita Gabdullin proposed using a model that combines a DL parser with an analytical method to calculate the similarity measure of image elements. It combines DL-based human image parsing and fully analytic feature extraction of an object (that is, automatic creation of attribute vectors without the participation of an operator), as well as a ranking scheme for determining the degree of similarity of people in images.

By parsing an image of a person here is meant the division of the image into smaller semantic parts containing separate anatomical regions. For example, for a full-length image of a person, these will be both body parts: head, torso, arms, legs, and clothing elements (hats, shoes). Parser, although a DL model, trains on data that is not directly related to re-identification tasks. As experiments have shown, it itself has a proper degree of generalization when used as an element of the combined method.

The proposed model uses analytically calculated functions of color and texture, with which human-understandable similarity measures are associated. The study shows that this combination largely addresses the drawbacks of both existing analytical and "pure" DL methods. This technique of describing the characteristics of people allows you to search for "verbal portraits." That is, a set of attributes can be easily compiled, according to which the model will find, for example, "all blondes in a red sweatshirt." Such a problem is impossible for "pure" DL models due to the fact that the parameters with which the DL model operates are abstract and do not correlate in any way with the parameters of the object in the real world. The operator cannot "explain" the model what "red sweatshirt" means. Moreover, the vast majority of DL models require an image at its input, while the proposed method allows the use of voice and text queries converted by the operator into parameter vectors.

To test the effectiveness of the proposed method, tests were carried out on Market1501 datasets (photos of 1,501 pedestrians taken in front of a supermarket near Tsinghua University with five high-resolution cameras and one low-resolution camera) and CUHK03 (photos of 1,467 different students, each taken with at least two cameras from six installed at the Chinese University of Hong Kong). The test model achieved competitive accuracy comparable to that of classic DL models.

It is most significant that the proposed method achieves an accuracy of about 60-90% when working with data from several datasets (demonstrates high cross-domain accuracy) without retraining or any additional settings. This is significantly higher than that of "pure" DL models, which showed an accuracy of 30-50% under such experimental conditions.

Re-identification plays a large role in ensuring security, especially in crowded places (train stations, airports, shopping centers, educational institutions). This is a universal technology that helps both find lost children and track suspicious subjects.

The developed method can be used to re-identify people in any photo and video recordings. It is able to make existing access control and control tools more "intelligent," and in video surveillance systems it can work both in real time and on demand, analyzing previously captured materials.

From a technical point of view, the proposed method is simpler and less demanding of "hardware" than "pure" DL models. In a number of applications, it can be implemented directly into cameras, implementing the concept of "edge computing."

The scientific work of Nikita Gabdullin was published in the electronic archive of Cornell University.