| Developers: | Yandex |
| Date of the premiere of the system: | 2017/07/18 |
| Technology: | Application Development Tools |
Content |
CatBoost is a machine learning method.
2025: Making it to the list of the most popular machine learning algorithms in the world along with Google, Microsoft and Intel technologies
The CatBoost algorithm developed by Yandex has become one of the most popular machine learning tools in world fundamental and applied science. This fact is established in the December 2025 report of the American edition of MarkTechPost. Its authors analyzed 5 thousand scientific publications of the journal Nature for 2025.
According to the Yandex press service, CatBoost is the only Russian technology presented in this global review. The algorithm is part of a narrow group of five non-American developments that have become the world scientific standard, along with the French Scikit-learn, German U-Net, Canadian GAN and RNN, as well as the British AlphaFold. According to the report, Yandex's algorithm is used in every thirtieth research paper, competing with the solutions of corporations such as Google, Microsoft, Intel and Amazon.
The leader in citation of the algorithm in Nature was China with a share of 32% of publications. The high demand for CatBoost in China is associated with the active development of research based on machine learning, in particular ensemble methods. In these methods, several models are used to solve a single problem, including CatBoost.
In the United States, the Yandex algorithm is used along with local counterparts in leading scientific centers, including Harvard and Stanford universities. The report indicated that the share of American publications mentioning CatBoost is 13%, which is comparable to the performance of its direct competitors: XGBoost (15%), the classic Gradient Boosting Model (12%) and LightGBM (10%).
Originally created for search tasks, CatBoost is used today in many Yandex services, such as Weather"," Direct"," "" and MarketMusic"." According to the press service of the company, the algorithm specializes in analyzing tabular data, revealing complex patterns in them. Its key advantage is the ability to work with categorical features without prior transformation.[1]
2017: Technology Development
On July 18, 2017, Yandex announced the creation of the CatBoost machine learning method. It is designed to train models on heterogeneous data.
The basis is information about the user's location, operation history and device type. The CatBoost machine learning library is published in the public domain, it can be used by everyone.
CatBoost is declared as the heir to the Matrixnet machine learning method - it is used in almost all Yandex services. Like Matrixnet, CatBoost uses the boost mechanism: it is suitable for working with heterogeneous data.
CatBoost takes into account models of numerical and non-numerical data - cloud views or building types. Previously, this data was translated into the language of numbers, and this could change their essence, affect the accuracy of the model. Now they can be used in their original form. This helps CatBoost demonstrate increased quality of learning. It can be used in various areas - from banking to production.
| Yandex has been engaged in machine learning for many years, and CatBoost was created by the best specialists in this area. By making the CatBoost library publicly available, we want to contribute to the development of machine learning. I must say that CatBoost is a Russian machine learning method that has become available in open source. We hope that the community of specialists will appreciate it and help to do even better. Mikhail Bilenko, Head of Machine Intelligence and Research at Yandex |
The method has been tested on Yandex services. As part of the experiment, it was used to improve search results, rank the Yandex.Zen recommendation feed and to calculate the weather forecast in Meteum technology. In the future, CatBoost will work on other services. It is used by the Yandex Data Factory team - in their solutions for industry, in particular to optimize raw material consumption and predict defects. The European Center for Nuclear Research (CERN) has implemented CatBoost: the center uses the product to combine data obtained from different parts of the LHCb detector.
To work with CatBoost, just install it on. computer The library supports, and operating systems Linux Windows macOS is available in the Python and R programming languages.
CatBoost download is available on GitHub.
