Developers: | HFLabs, formerly HumanFactorLabs |
Date of the premiere of the system: | 2023/05/02 |
Technology: | Information Security - Information Leakage Prevention |
White Paper: DLP - Data Loss/Leak Prevention
2024: Reidentification Risk Assessment Model Testing
June 19, 2024 Big Data Association HFLabs and presented the results testings of the repeated risk assessment model identifications using the data depersonalization product "Camouflier." "Camouflage" is a smart camouflage solution. It personal data reduces the risk leaks during testing -systems while IT maintaining the quality of the tests. HFLabs announced this on June 19, 2024.
As part of the testing, the Association's methodological specialists big data modeled cyber attacks to obtain personal from information impersonal kits data prepared using the "Camouflage" for the cases "Assessment of Bank Customer Outflow (Churn Rate)" and "Marketing attribution on independent kits." Based on the results of the attacks, the risks of repeated attacks were calculated, identifications as well as recommendations were prepared for adapting the masking parameters.
When processing impersonal data, we assess the likelihood of a successful attack on them, as a result of which privacy may be violated. A successful experiment of determining the level of risk of de-exposing business case data using the Camouflage allowed us to solve the problem of maximizing the product utility function while minimizing data risks. During the experiment, we reduced the complex risks of masked data by 97.5% while maintaining their high utility score, which was 71%. These results highlight the effectiveness of our depersonalization methods and their ability to protect data privacy without compromising their analytical value, "said Alexey Neiman, Executive Director of the Big Data Association. |
The re-identification risk assessment model allows you to calculate the probability of highlighting personal information in an impersonal datacet. Based on this assessment, it is possible to make informed decisions about the means of protection used and the methods of data processing. Based on the results of testing, the risk model has proven its performance, and has also been supplemented with attack simulation approaches that deepen the understanding of the risks of re-identification by taking into account the risks of isolation and binding.
When creating the Masker, we put saving the data context at the forefront. Smart masking takes into account gender, age group, address and phone binding to the region, and more. This approach allows you to maximize the use of impersonal data in testing highly loaded IT systems. At the same time, the issue of safe use of such data remains key for business. We are grateful to ABD for working together: thanks to the risk model, we were able to refine the product and reduce the risks of working with impersonal data to a minimum. The development of the risk model and its validation is a big step towards the withdrawal of impersonal data from the gray zone. I am sure that this ABD project will make it possible to move towards legalizing their use in the business environment, "said Nikita Nazarov, Technical Director of HFLabs. |
The risk model can become the basis for finding a balance between the safety and utility of the data, when the obtained data can be trusted and used, but for which it is impossible to re-identify specific subjects. At the same time, it is advisable to assess the risks of de-denigration in each specific case of the use of methods, including to assess the contextual risk (based on the conditions under which the impersonal datacet will be processed).
Thanks to the fruitful collaboration of HFLabs and the Big Data Association, "Camouflage" when depersonalizing data preserves its quality and context, making it as similar as possible to original ones and significantly reducing the risks of re-identification. The implementation of a risk-based approach in data depersonalization has proven important, providing more accurate risk management and maximum data usefulness.
2023: Solution Presentation
On May 2, 2023, HFLabs introduced a product for smart masking (depersonalization) of personal data. The solution reduces the risk of leakage when testing IT systems while maintaining the quality of tests. The pilot project using the "Camouflage" was successfully completed in one of the banks.
The HFLabs product is available as a boxed solution and as a SaaS service. It depersonalizes different types of data: full name, dates of birth, addresses, phones, email, TIN, SNILS, bank cards and accounts, TCP and driver's licenses. Other types of data can be masked by selecting simple mutations from the predefined rules.
Using the logic of smart replacement, "Camouflage" when depersonalized preserves the quality and context of the data and makes it as similar as possible to the real ones. When masking, socio-demographic characteristics, geographical distribution, family ties and even format-logical control of documents are not lost. Thanks to this approach, impersonal data using the Masker can be used to correctly build analytical models.
For example, "Camouflage" replaces the full name, taking into account its popularity and the gender of the client. When smart masking, phone numbers do not lose their binding to the operator or region, addresses remain valid within the region or city, and people living at the same address receive another real address.
To preserve socio-demographic signs, birth dates change within a small interval (for example, 1991 to 1992). Separate age frames significant for marketing can be set harshly: for example, a person under 18 will not become an adult. Also, "Camouflage" retains the features of documents - the validity of passports, TIN, SNILS. It takes into account their format, checksums and validity.
The business has a request to mask all available test environments to secure operations, reduce the risk of leaks and make it easier for both employees and contractors to negotiate access. At the same time, it is important that the data is similar to real ones, and one client is masked in the same way in all data sources, "explained Olga Serdobintseva, owner of the Masker product in HFLabs. |
The product from HFLabs depersonalizes data for all test stands of the company using a single algorithm as part of the masking iteration. Replacements are selected randomly, stored in encrypted form and removed at the end of the process of depersonalization of all stands. This ensures consistency between all masked databases and eliminates the possibility of reverting the original values.