RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

AIRI: Method to Improve Reliability of Data Query Generation

Product
Developers: Institute of Artificial Intelligence (AIRI)
Date of the premiere of the system: 2025/07/02
Technology: Speech technology

The main articles are:

2025: Introducing a Method to Improve the Reliability of Data Query Generation

In Russia, a method has been developed to improve the reliability of generating requests for working with data. The Institute of Artificial Intelligence (AIRI) announced this on July 2, 2025.

During the tests, the technique made it possible to identify up to 90% of errors when generating SQL queries, significantly reducing the risks of incorrect results.

Scientists from the AIRI Institute have improved the work of language models for generating SQL queries, creating a system that helps to more accurately create queries to databases using large language models, as well as evaluate their own confidence in the correctness of the result.

The development is directly related to the fundamental task of machine learning - generalization, that is, the ability of the model to work correctly with new data that has not previously been encountered. Modern language models such as ChatGPT or GigaChat achieve high generalization through the impressive amount of data and parameters they learn from. For example, 175 billion parameter models are trained on about 15 TB of text. However, even they face generalization restrictions, especially in specialized areas where accuracy is critical.

During the study, scientists tested the model on the task of generating SQL queries for hospital staff to their internal database. This is necessary for doctors to, among other things, quickly monitor the occupancy of departments, monitor the process of passing analyzes, without the need for manual analysis of internal databases.

The main difficulty was the specifics of the wording of the requests: many of them concerned diagnoses and diseases. Large language models, despite their versatility, often lose accuracy in highly specialized topics, which leads to errors in SQL generation. Specialists trained the model in such a way that it gave out 60-70% of the correct generations. This means that approximately 30% of query generation contained errors.

In order for non-technical specialists to trust such a model, 2 techniques were developed - an external classifier and a calibration assessment. The external classifier operates as an automatic filter, deciding to issue a request to the user based on the model confidence assessment. To assess the confidence of the models in their decisions, the researchers applied entropy methods that analyze the probability distribution of the output data. They do not require significant computing resources, which makes them convenient for integration into real systems. In the task of generating SQL queries, entropy evaluation allows you to determine how much you can trust the generated query, reducing the risk of errors in critical processes such as medical data analysis or business intelligence management. The calibration of the confidence score shows how the confidence of the model corresponds to its real quality at a given confidence score. Thanks to calibration methods and an external classifier, 90% of errors from 30% of incorrect language model generations were successfully detected.

The study tested several language model architectures, including encoder-decoder architectures, where input text is first encoded into an intermediate state and then decoded into a final response. It was this architecture that showed the best calibration of the original entropy confidence estimates.

File:Aquote1.png
We concluded that AI models are often mistaken in SQL generation, but they do it confidently, but when the question has no answer, they fluctuate. Therefore, the external classifier as a filter more reliably defines precisely such 'undetectable' questions. Using this knowledge of the model and the developed methodology, we were able to identify 90% of errors, which directly increases the final reliability of the system, "said Elena Tutubalina, head of the Applied NLP scientific group at the AIRI Institute, senior researcher at the ISP RAS.
File:Aquote2.png

File:Aquote1.png
The combination of calibration methods and the use of external classifiers radically increases the reliability of language AI to generate code for specific tasks. This is critically important for areas where the cost of error is high, and the use of AI tools should remain fully controlled by the specialist, - said Oleg Somov, researcher at the Applied NLP group at the AIRI Institute.
File:Aquote2.png