Developers: | Yandex |
Last Release Date: | 2023/10/05 |
Branches: | Internet services |
Service audience
2023: The struggle of European residents with the Google monopoly has led to an increase in the popularity of Yandex in the EU
In mid-October 2023, it became known that antitrust proceedings against Google in the European Union led to an increase in the popularity of Yandex in the region. The Russian service entered the top five most popular search engines on Android devices in Germany, Italy, the Czech Republic, Denmark and some other European countries.
Under pressure from EU regulators, Google, part of the Alphabet holding, was forced to give owners of Android-based smartphones and tablets the ability to choose competing search engines to use by default. This measure applies to 23 European states. Through the selection screen, users can specify one of the five most common search engines.
According to StatCounter estimates, Yandex's share in the European mobile search services market in September 2023 is 2.75%. By comparison, Google controls 95.35% of the corresponding segment. Another 0.53% is, Bing approximately 0.49% DuckDuckGo is, and about 0.43% is. Yahoo Thus, Google continues to dominate the mobile search market in the EU, despite the struggle of officials with the monopoly of the corporation.
At the same time, in the September update of the sanctions list, the European Union accused Yandex of "being responsible for promoting state media and certain content in search results, as well as for downgrading and removing some materials, in particular those related to the current geopolitical situation." The Russian search engine categorically rejects these claims, stating that it does not adjust the algorithms "for any political side." Nevertheless, European regulators are "concerned that Yandex is available in the EU."[1]
2014
According to TNS, in May 2014, search results in Yandex were viewed on average by 20.4 million people per day.
Inverted index
The search engine is arranged as an index in the book, where each word corresponds to a listing of the pages on which it is contained. This is one of the basic structures for searching, although not the only one. Such an index (it is called inverted) contains the word ID and an enumeration of the documents in which it is contained. It also indicates word positions, that is, the places in which the word appears in this document. Knowledge of word position is very important for search: one thing - when a word occurs in a glimpse in one of the paragraphs, another thing - when it is contained in the title of the document.
An index is a word and all word positions in all documents where it occurs. Each word position is encoded by 64 bits, in them the document ID is encrypted (in the Yandex search index for June 2013, only Russian documents are more than 5 billion, twice as many in other languages), the zone in which this word, sentence number, word number in the sentence and several service bits are found.
The index does not record the words themselves, but lemmas, that is, their main grammatical forms. For example, if the document says "My uncle of the most honest rules," the word "honest" will be written as "honest," and the code of the grammatical form will be indicated nearby. Thus, the basic linguistic analysis is carried out already at the stage of compiling the index, and not during the search. It turns out that if we are looking for a separate word, then the search result is already written in the index.
Preuning
If there are two or more words in the search, for example, the words "uncle" and "rule" in the index are presented as separate entities and are not related to each other in any way, then how does the search engine find those documents in which they both meet? Is it really going through everything?
Firstly, not all documents need to be sorted out. Imagine a list of all indexed documents by ID: No. 1, 2, 3, and so on. This list is usually sorted by the degree of "utility" of the pages. This is called pruning from the English word pruning, pruning, pruning (tree branches). And now, if the word "rule" itself appears in document N100, and along with the word "uncle" - in N1000, then documents from 1 to 999 can simply be missed.
Secondly, finding page intersections is not so difficult. We compare entries in the index for "uncle" and "rule" and find common document numbers. As a rule, only documents that are closer to the beginning of the list of documents sorted by spinning fall on the first page, so it is not necessary to search for all intersections to the end.
Ranking factors
After these first intersections are found, you need to rank them, or sort them, that is, arrange them in such an order that the more relevant ones are higher than the less relevant ones for a particular query. The quality of the search actually depends on how the ranking is performed. When we form a response to a query, a variety of factors are used to rank the results. As of June 2013, Yandex has about eight hundred such factors, and not all of them are taken from the inverted index.
Among the ranking factors, of course, not only the number of words searched in the document, but also:
- site traffic,
- page attendance,
- references to the document,
- user preferences for specific requests.
User Intent Index
In order to use this, Yandex already has another index - in terms of the likelihood of switching to sites using popular user Intentas (from the English word intent - "intention"), that is, depending on what the user wants to do.
For example, for some requests, the user wants to see encyclopedic information more likely, for others - multimedia content, for others - to make a purchase.
Such a list of classes of popular intents is not compiled manually. Yandex creates this list based on an analysis of user behavior logs. As of June 2013, Yandex receives about 200 million requests per day, for each of which the user clicks - again, on average - on two sites. Logs with all requests in the company are stored for some time in a large cluster. As of June 2013, there is something like eight petabytes of data.
With this data, you can test a variety of user behaviors. For example, it is not very indicative if a site is simply often accessed by this request - this may indicate that on this site the text (which the user sees in the search results) and the title are successfully compiled. Another thing is how the user further interacts with this site and search results - by these things you can already judge whether he found what he was looking for there.
This itself is a non-trivial task, and, moreover, the assessment of the success of its solution is always very subjective. Yandex has some models that try to understand on the basis of the user's transition graph whether he has found what he needs.
Information about transitions in Yandex is received primarily from browsers.
"Crypt": What Yandex knows about the user
Information about the user in the Yandex database is both a set of requests, visited sites and the like and this is something like "a man from 20 to 30, unmarried, loving cats."
As for social demography, the Crypt technology was developed for it. It is based on the same "Matrixnet" machine learning. As a training sample, searches of a million people from the Yandex social network My Circle were used, where the sex and age of a person are known and these data are likely to correspond to reality.
"Crypt" allows you to predict a person's gender and age with a good result, and this is important for advertising, where it is used. But somehow miraculously, all this social demography doesn't help ranking much. The history of requests turns out to be much more important here. She says many times more about what a person now wants than their gender and age.
Very revealing examples are visible with travel. If, for example, a person has been looking for the words "Turkey," "Tunisia" for some time, then the car understands that when requesting "Madagascar," he does not mean the film, but the island. All because the words from the query are mapped to certain categories in the Matrix tree. From this point of view, the Yandex user looks like a rarefied space from the words of the request, the categories of sites he visited.
Is it possible to raise the necessary sites in the issuance
In June 2013, Yandex said that wishes (for example, the Ministry of Culture) to raise the right resources in search results from someone's point of view are not feasible at all.
"We have machine learning, it, like a mirror, reflects exactly what users want to find. We are machine learning fans, we never interfere in the search "manually" at all, "said Andrei Styskin, head of the Yandex ranking department[2] June 2013.
New outcome ranking factors have a very limited shelf life. For example, there are search optimizers, people who try to deceive the search engine and raise a certain URL in the output. Yandex has a whole department to combat this phenomenon. Take the vaunted pagerank algorithm, which analyzes the graph of links on pages. When optimizers realized how it works, the network became simply clogged with links, and in 2013 pagerank makes almost no sense.
2019
Five companies accused Yandex of restricting access to their sites
Several companies, among them, online movie theater ivi.ru accused Yandex in June 2019 of restricting access to their services and violating the law on competition, write "" Sheets with reference to a note prepared by the companies. In addition to ivi, these companies include Avito"," and "CIAN.RU Profi.ru Avito, 2GIS[3]
According to companies, resources related to Yandex receive priority in issuing a search engine. It is claimed that this is due to "sorceresses," that is, interactive responses to requests that appear in search results immediately after advertising and before organic issuance. At the same time, only services that are associated with Yandex have access to the "sorcerers."
What Yandex answered:
- Yandex has never lowered the natural positions of other companies in search and has never prevented other companies from advertising in Yandex.Direct. The "Search" religion is user metrics, and the "Directive" religion is equal access in contextual advertising.
- "Sorcerers" (blocks with information from services related to the company) of "Yandex" are not fixed in the top position, they appear in the output results only when they increase the quality of the response to the user's request. And this is a big difference from the antitrust proceedings in Europe with the Google Shopping service.
- We have always been open to partnerships. In particular, we have already once made an attempt to provide a more complex and structured snippet format in search for partner companies (the 2014 Islands program), but then the companies were not ready for integration with Search. At the same time, a special structured format similar to the "sorcerer" exists now in Yandex.Direct.
- Yandex is now working on the next version - a new special interactive format in the Directive, which will provide extended functionality to partner companies. However, user metrics still remain above all else.
- In response, we expect counter actions of companies (including from all authors of the letter). We hope that links to Yandex services will be able to appear, for example, on the issuance of Avito (Авто.ру and Yandex.Real Estate), on 2GIS (Navigator and Directory), on ivi (Kinopoisk), on Кассир.ру, etc.
- We would like to join the authors of the letter and expand the list of its addressees to other companies. For example, we should all have equal access to Google, Mail and Rambler search results.
- We would like to support the authors of the letter and prohibit access to the resources of large advertising networks. Unlike Yandex.Direct (which allows any competitors to the site), some of the largest advertising platforms (including part of the authors of the letter) simply prohibit the placement of competitors at home.
- The letter addresses important issues, but they need to be discussed in the context of the general principles of relationships in the industry. We are always open to such a discussion.
Search Enhancement Process
2023: Adding a Video Banner
Yandex's media inventory has been replenished with a premium format. The video banner is shown to the right of the search results and allows you to attract the attention of up to 85 million Search users. According to the results of the first tests of advertisers, this format helped to increase knowledge of the brand and increase conversions. The developer announced this on October 5, 2023.
The video banner consists of a video up to 15 seconds, a logo, a picture, a text block and a button for going to the site or making a purchase. The video is played automatically, and the sound is turned on at the initiative of the user.
One of our key areas is the expansion of the media inventory line, taking into account market trends. The demand for video ads continues to grow, so we have added a Video Banner to the search. This format will attract more user attention to the advertised product, increase brand metrics, and, as shown by the first placements of our partners, ensure the growth of conversion indicators, "said Anna Sorokina, head of the Yandex media project development department. |
The video banner works on a model with a fixed cost per thousand impressions (fixCPM). You can evaluate the effectiveness of the video banner using the usual video advertising metrics, Brand Lift and post-campaign reports.
2022
Run on go.mail.ru and mail.ru pages
The pages go.mail.ru mail.ru and began to work search string with "," Yandex previously the holding VK used its own search engine technologies. This became known on December 13, 2022. More. here
Launch of Y2 with voice-over translation in search video, with search for objects, with the appearance of children's accounts and large topics
On November 28, 2022, Yandex announced the launch of an updated version of the search - Y2. The most important thing in the update: searching for video with voice-over translation, searching for objects, the appearance of children's accounts and large topics. The main page (ya.ru) of the mobile Yandex application is now focused on search.
Homepage and Application
On the main Yandex and application, all attention is directed to the search string. You can ask a question in any way - text, voice or using a picture. To use visual search, just point a smart camera at the object or upload an image from the phone gallery.
Search for objects, not sites
Y2 will help you choose a doctor, online course or product. Now the search not only indexes web pages, but also finds information about objects on them, accumulates it and shows it in search results. A person does not need to study different sites himself to choose a Python course or, for example, find a pediatrician. Everything important can be found in the search results.
The information will be presented in a structured form. So, at the request of web development, Yandex courses will show a list of curricula with the price, duration of training and feedback from students. All that remains is to select the appropriate course and go to the website of the training center. Yandex searches for more than 22 thousand educational courses, 730 million goods, as well as 740 thousand doctors of various specialties.
Big topics
With the help of Y2, it is possible to solve problems that cannot be dealt with in five minutes. It can be organizing a wedding, buying a home or a car. Yandex learned to see for individual requests a long-term interest in a particular topic and structure the information collected by a person - so that you can return to it in one click. A block appears on the ya.ru and in the application, where it is easy to find the history of your queries on the topic, saved articles and hints, what else to pay attention to.
Yandex search highlights such big topics as education career, finance family, apartment, renovation , and so on. To recognize interest in a particular topic, Yandex uses a complex classifier based on generative. neural network YaLM
Video with translation
Y2 has filters for searching for videos with automatic voice translation of the Narussian language. For example, to cook onion soup, you can watch a French video receptor with voice-over translation. A translation of the video of Sangli, German, French, Italian and Spanish is already available. It works in the Yandex application and in the Browser.
Yandex is able to quickly translate video, even if it is streaming. Now you can watch international conferences live or, for example, interviews with foreigners. Moreover, it will be a polyphonic machine translation based on the most complex technologies.
Children's account
The older the child, the more questions he asks the search. For children to learn the world in a safe environment, children's accounts appeared in Y2. They protect the child from inappropriate age content, for example, adult sites or obscene language. You can link a children's account to your own. It will work not only in search, but also on other Yandex services : in particular, in Music and on Kinopoisk.
Search for programmers
Y2 better answers questions from developers and ML specialists. To do this , the CS YATI transformer neural network was introduced into the search, which was trained on requests for programming and developer assessments. This neural network takes into account one and a half times more parameters than the YATI launched two years ago. In the future, Yandex will use the updated transformer to improve responses to other highly specialized requests.
Improve Search with CS YATI Neural Network
Yandex on September 21, 2022 announced an improvement in search using the CS YATI neural network - an updated model trained on documents for IT specialists and assessments by programming experts. Search results for developers and ML specialists have become better, and navigation on requests is convenient.
The updated model takes into account one and a half times more information from the page than its previous version - YATI. The updated neural network-transformer analyzed many search queries and sites that are shown according to requests related to programming. This helps her to better assess the quality and relevance of the document to the request. Having passed through terabytes of programming documents and an expert search history, CS YATI also learned to predict clicks of qualified programmers in order to issue the most relevant response.
Yandex has significantly improved the enriched Stack Overflow response. Right in the search results, without going to the site, the user will see additional information: the question itself, the best answer to it and other comments that can be useful to programmers. Yandex also improved the display of snippets for GitHub and NPM, adding useful information there.
It is known that the lion's share of programming requests are requests in English. CS YATI was trained mainly in English-language sources. We didn't just improve search for programmers: in the process, we also improved search on English sources, noted Alexey Gusakov, head of the Department of Machine Intelligence and Research.
|
Start Page ya.ru
Yandex on August 23, 2022 announced that it would abandon the news aggregator and develop ya.ru as its main page. After the closure of the transaction for the sale of media assets - Zen and News - ya.ru will become the main entry point to Search, Mail and other services of the company, including the user's personal account Yandex ID. The company will also release an updated Yandex application with Alice for Android.
The kernel of the updated homepage and application will be Search. On the ya.ru, users will have access to a universal search - to search anywhere, search verticals - to find, for example, pictures or goods, and Alice - to quickly get answers.
The existing Yandex application for Android will change its name to Yandex Start. It will function as a browser, and users will be able to select the start page in the settings. The Yandex application for iOS will continue to work as before, but without Zen and News.
The former media main Yandex will become the dzen.ru portal after the deal closes and will come under the control of VK. VK will also receive rights to technology and trademarks of Zen and News.
The companies signed a binding agreement, under which VK buys Zen and Novosti, and Yandex buys food delivery service Delivery Club, on August 22. The deal is scheduled to close in the coming months; it requires coordination with the Federal Antimonopoly Service[4]
Ability to track and compare prices for items in a search
Yandex announced on August 5, 2022 that the search can now monitor the prices of goods.
Choosing a convenient moment for your planned purchase has become easier. In Yandex's search for goods, you can now see how their cost has changed. The minimum price at which time the user's desired product was sold for the latter is shown in his card. A graphic graph helps you decide whether to make a purchase now or whether it makes sense to wait.
In the item card, you can subscribe to price reduction notifications. When a TV or, for example, a coffee machine that is interested in the user becomes cheaper, a message will be received about it.
If the user participates in store loyalty programs, the search will indicate which cashback or how many bonus points will be awarded for the purchase.
At the same time, it also became easier to decide where to buy specifically - thanks to store ratings and reviews from other customers.
With the update, convenient and useful filters appeared in the search. You can only search for the products of your favorite brand or set certain characteristics of the model, and the "You looked" section will help not to miss interesting finds.
Searching the goods helps you decide on the model, find a really good price, and also makes it possible to assess the reliability of the seller. Purchases, especially large ones, take time to study options, to compare offers in different stores. The company is configured to save time for the user, we also want to simplify the task, remove the routine from it and leave only the pleasure of buying, commented on the head of ecom in Yandex search, Sergey Lyadzhin.
|
Yandex launched a search for goods in early 2022. It allows you to compare prices in stores and marketplaces. Personal assistant Alice can also participate in this - suffice it to say "Alice, where is cheaper" and indicate the desired thing. Recently, Alice learned to select electronics and household appliances for users. She asks simple questions, and then offers the most suitable options.
Ability for all advertisers to advertise under the Search bar
Yandex on June 7, 2022 announced the opening of the opportunity for all advertisers to advertise in cards right under the Search line. Previously, this tool was only available for advertising goods from the categories of electronics, household appliances, household goods and repairs, as well as clothing.
The product gallery is one of the most notable places for advertising on Yandex. This tool consists of several product cards from different stores, which are located immediately below the search bar. They contain a photo of the product, price, domain trade sites, as well as a discount or promotion, if any. Advertisers who had already tried the Commodity Gallery received 19% more purchase conversions and reduced advertising costs by a third.
{{quote "In the flow of information and in conditions of limited time, the user wants to immediately receive a quick response to his request. Due to visualization and a profitable position, the Commodity Gallery allows you to immediately show the user what he wants to see, and the business to reduce the time before making an order, "said Viktor Gryaznov, head of the department for interaction with key partners in Yandex. }}
There is a separate auction to place ads in the commodity gallery. And statistics on placement in the product gallery are available in the Report Wizard.
Yandex removed the sites of Instagram, Facebook and publications blocked in Russia from the search
On April 18, 2022, Yandex announced the removal of official Instagram and Facebook sites from search results (banned in Russia; belong to Meta, which is recognized as extremist in the Russian Federation). VK did the same.
According to TASS with reference to the press service of Yandex, when introducing the names of social networks in the Yandex search bar, a message now appears: "Some links are missing in the search results due to the requirements of the legislation of the Russian Federation." In addition, the Yandex search service instead of the official site issues a link to an application on Google Play, an article about the social network on Wikipedia and offers to download an application file for Android from third-party services.
According to the legislation of the Russian Federation, search engines are obliged to exclude links to sites and their "mirrors" as soon as Roskomnadzor enters them into the [prohibited] register, the company explained and added that synchronization with the registry occurs automatically. |
Also, the sites "Meduza," "Mediazones" and "Present Time" were removed from the search (recognized in Russia as foreign media agents).
The press service of VK added TASS that the company acts in accordance with the law and the instructions of the regulator.
Earlier in 2022, Roskomnadzor reported that the Russian media should not display the logos of the Meta organization and its social networks Facebook and Instagram. Social networks Facebook and Instagram are banned in Russia by court order for extremism. At the same time, the court decision banning the activities of Meta (Facebook and Instagram) in Russia does not apply to the WhatsApp messenger.[5]
2021
Yandex has changed the rules for working with video hosting to combat piracy
Yandex"" from the beginning of 2022 will exclude from the search results video partners who have not signed agreements on the fight against. piracy This became known on December 2, 2021.
Yandex search indexes material posted in the public domain, but does not have the ability to check its legality. Video hosting sites hosting content can do this. After signing the agreement, the partner will be responsible for its content and undertakes to take the necessary measures to prevent the distribution of pirated materials.
As of December 2021, Yandex.Video shows about 30 video players of other sites in search results. At the end of 2020, the largest third-party video players in terms of views were YouTube, Rutube, Odnoklassniki, VKontakte, Mail.ru.
VK Video does not intend to conclude an agreement with Yandex and is working to open access to video content "without the help of search engines." The company said that after the launch of the combined VK Video platform over the past week, a multiple decrease in video views from Yandex was recorded.
Yandex planned that this policy would enter into force in November 2021, but goes to meet partners who did not have time to sign the agreement, and moved the deadlines to the beginning of 2022[6] the[7].
Russian authorities forcibly made Yandex a default search engine on all gadgets
For the third time in the last two months, the Russian authorities have expanded the list of pre-installed programs for devices sold in Russia. They also approved Yandex as the main search engine selected by default. This became known on September 28, 2021. Read more here.
Cardinal update of the search engine
On June 10, 2021, Yandex announced a radical update (it was called Y1) of its search engine. According to the developers, they have implemented more than 2,100 improvements. The five most noticeable of them are listed by the company itself:
Search for video snippet
From June 10, 2021, on request, for example, "how to cook tuna steak," Yandex will show a video receptor, and will also offer the user to immediately turn on the video from the place where the essence is told. In order to find the desired fragment, the search compares the meaning of the request with the content of the video: both with the picture and with the audio track.
Quick answers
There are more quick answers in Yandex's search, and they have become more diverse. Users can now ask "how to tame a horse in a minecraft" or "hidden features ios" and get an answer right in the search results. In the update, Yandex first used generative neuronets YaLM, which can compose texts in Russian and help give answers to the search and. voice assistant "Alice"
Smart camera
The application has an updated "smart" camera that can recognize objects, tell you how much they cost and where they can be bought, translated from foreign languages and automatically improved scans of documents. The company noted that the camera has become five times better at recognizing objects in the frame in real time.
Reviews of organizations
People often choose cafes, shops and other organizations based on reviews. In order for people to make a decision faster, Yandex began to analyze reviews, summarize them and show a visual rating scale in the search results.
Number identifier and spam blocking
Users of the Yandex application on iOS and Android can turn on the automatic number identifier to get rid of unwanted calls. The company said that in July 2021, the service will learn not only to determine, but also to automatically block or drown out such calls.
2020: AppGallery Availability
On April 8, 2020, it became known about the full integration company mobile applications Yandex in the app store. AppGallery More. here
2019: Instant, accurate search and people help
Yandex launched at the end of December 2019 an update to a search engine called Vega. This was reported to CNews by the press service of the company. Compared to the old version, more than 1.5 thousand improvements have been added to Vega over the past year, Yandex notes[8].
In particular, the search engine now provides more accurate and faster responses to queries, and the search algorithm is trained taking into account signals from assessment experts. There is also the possibility of a hyperlocal search in a specific microdistrict, quarter or even house.
The head of the Yandex search portal, Andrei Styskin, noted that the update includes a new system for storing web documents, technology for pre-loading search results and other solutions. According to him, the share of Yandex search on all platforms currently reaches 57.9%, and the share on Android for the year showed an increase of 4.8 percentage points and amounted to 54.7%. |
The search database is now compiled using neural networks that sort web documents by "semantic clusters," combining documents that are similar in meaning. Focusing on the meaning of the search query, the system searches for answers not in the entire database, but in suitable clusters. This helps save time and computing resources. The resulting surplus of resources made it possible to double the volume of the base. Thanks to this, even those pages that users access once or twice a year are now falling into search results.
Pre-rendering technology was built into mobile search - pre-loading search results. The technology is trying to predict the full text of the search query at the stage when the user typed only the first words. Based on the results of this forecast, the search engine generates search results in advance and shows it immediately after pressing the "Find" button. This approach saves time, especially if the user's Internet is slow.
There are hints under the search line in the output that can give the user an answer to his question without going to sites. Yandex reports that over the past year, the number of cases when the user had enough information from a prompt has grown by 20%.
Yandex has also improved the technology of turbo pages designed for site owners. We are talking about special versions of web pages that are downloaded when users go to the site from a search engine on mobile devices. The point of the technology is that turbo pages load quickly - currently 15 times faster than the usual mobile version of the site. In 75% of cases, the necessary information is shown to the user in less than a second.
In addition, Vega has launched a new algorithm for ranking results, which involves the participation of people - assessment experts, the so-called assessors. Each assessor is a specialist in a certain area, all of them are selected. For example, if an expert is a hydrologist by profession, then he knows what information is best shown to the user at the request "formation of two-story rivers." From his position as a specialist, he gives an expert assessment of how well search results reveal the topic of the request. To train the new algorithm, expert evaluation is a more important signal than any other.
Yandex also announced the launch of the Kew service, where users can ask questions to scientists, professionals and people who are just well versed in a topic and get answers from them. The service combined the capabilities of TheQuestion and Yandex.Connoisseurs. Yandex shows the answers of experts in search results for relevant requests. The company assures that the language of answers will be clear to the average user.
In "Vega" there is a possibility of hyperlocal search, which takes into account in which microdistrict the user is located. For the same purpose, the District and Services services were updated. Recall, "District" is a social network for neighbors in the district, and now there are chats for communication between residents from the same house. In the chat, you can ask if someone saw a runaway cat, or discuss the breakdown of the elevator. "Services" is a service for finding specialists, a function for displaying offers on a map has been added here. Through this service, you can find a tutor's child or call a plumber who can quickly come.
2017: Yandex search learned to match the meaning of a query and a web page
On August 23, Yandex launched the next version of the search. It is based on the Korolev search algorithm - using a neural network, it compares the meaning of a query and a web page. Thanks to this, the search understands what exactly the user needs and answers difficult questions even more accurately. The updated search uses search statistics more widely and takes into account the estimates of Yandex.Toloki.
According to the company, Yandex took the first step towards searching within the meaning last year by introducing the Palekh algorithm, which is based on a neural network - it compares the meaning of the request and the title of the web page in real time. Yandex then managed to teach the neural network to transform search queries and web page headers into groups of numbers - semantic vectors.
The Korolev search algorithm compares the semantic vectors of search queries and web pages in their entirety, and not just their titles. This allows us to reach a new level of understanding of meaning, the company emphasized. This is a difficult computational task, so Yandex determines the essence of the page in advance, at the indexing stage. Thanks to this, the number of pages that the search compares in meaning with the request has grown from 150 documents to 200 thousand. Another important feature of Korolev is that, in addition to comparing the meaning of the request and the page, it also takes into account the meaning of other requests for which people switch to it.
For a neural network to appreciate the semantic closeness of a query and a document, it needs a huge number of examples. Such examples, according to Yandex representatives, are given by impersonal search statistics: which sites people go to on request and how much time they spend there. So, if a person went to the page and viewed it for a while, most likely, it is close in meaning to the request. Using the search statistics of millions of people, Yandex learns to understand semantic connections. For example, he will understand that in the request [the picture where the sky swirls] we are talking about the picture of Van Gogh, and in the request [the lazy cat from Mongolia] - about the manula, the company explained.
Search is a very complex system. Thousands of engineers are working to make her understand a person and help solve his problems. In Korolev, we combined machine intelligence and the efforts of millions of people. Our users improve search with us by asking questions and helping to train our algorithms, "said Andrey Styskin, head of Yandex Search. |
To train the search engine, you also need assessments of the quality of answers. Moreover, the more complex the system, the more grades are required. Previously, Yandex assessed the quality of search with the help of its experts - assessors. Now the ratings set by users of Yandex.Toloki, a service where anyone can complete tasks and receive remuneration for them, are also taken into account.
Assessors
Yandex has the current formula for ranking results, data on user preferences, there are special people - assessors whose task is to assess relevance. It is they who help measure what our absolute search quality is now and how it will change if we introduce a new amendment to the ranking formula.
There are at least two reasons to use manual assesor assessments. First, people lie. They can look for, say, an abstract on history, and switching to porn sites is more interesting. Secondly, the authors of the sites lie. They can create the appearance that there is some content on the site, but in fact it is not there. After all, according to the snippet, that window with a fragment of the site that the search engine issues, it is not always possible to understand whether this is a suitable site. The user went to the site, spent some time there. And he found there what is needed or not - Yandex does not know and can only guess about it.
Another important problem when assessing quality is rare requests for which there are no statistics, the so-called long tail. There are actually a lot of them - in June 2013, out of all requests, about 30-40 percent are in those that no one has ever asked. Therefore, without live assessors, it is impossible to understand how well the search works.
Assessors evaluate not search engine pages and not individual URLs. They evaluate the request-URL pairs, and in the request the user's geography information is filed, and this information is taken into account in the assessment. Because, conditionally, a site relevant for Yekaterinburg at the request of a "sushi restaurant" will be irrelevant for Novosibirsk, and vice versa.
To measure search quality, developers pass a random sample of queries through assessors that evaluate request-URL pairs, giving them ratings:
- "vital,"
- "important,"
- "relevant" or
- "irrelevant."
Each of the ratings corresponds to a certain probability that a person will find what he needs on this site.
Vital means the page of the VKontakte network in response to a VKontakte request. Or a corresponding article describing a proboscis mammal when asked for an "elephant wikipedia article." A vital URL is one that does not have reasonable alternatives when it is completely clear where the user wants to go. There can be several useful URLs when asked for "weather," these are "Gismeteo," "Yandex.Weather," and several other sites, each of which receives the same rating.
"When evaluating sites, in no case do we give preference to our own services," said Andrei Styskin, head of the Yandex ranking department[2]," in June 2013.]
Pfound metric
Having a ranked page with search results, where all URLs are rated by assessors, the developers evaluate the quality of the search using a special pfound metric. It calculates the probability that a person has found what he was looking for on the issuance page, summing up such probabilities for different URLs - each of the four assessor ratings is assigned a different probability of utility. In this case, the summation takes into account that the probability of the usefulness of this string must be multiplied by the probability that it will be read at all. What the user needs can be found in the previous line, in addition, he can simply get tired and stop reading the list. In general, such a probability summation formula is obtained, which allows developers to evaluate the quality of search - both their own and competitors.
"User happiness" metric
The pfound metric is specific to a particular query. But a person does not think with requests, he thinks with tasks. There are ways to measure whether a person has found what they were looking for, regardless of the request.
In Yandex slang, this metric is called "user happiness." It looks like this: a person is given the task of, say, finding the heroes of the Battle of Kulikovo. He can set any requests, reformulate them, read some new information, reformulate requests again. At one point, he finds what he needs and records the answer. Yandex, for its part, is trying to minimize the time that a person spent on this.
All the experiments that the developers conducted suggest that the happiness metric correlates very well with the pfound metric. That is, the user, of course, behaves more difficult than the pfound model implies, but there is so much data that all this complexity is averaged.
Matrixnet Machine Learning System
Next, the Matrixnet machine learning system comes into play. It looks for some non-obvious dependencies between different factors of the page and how much assessors consider it relevant to a certain request.
In order to explain how it works, there is the following working analogy. Let's say you need to teach the robot to distinguish tasty apples from tasteless ones. The robot itself does not distinguish tastes and cannot cope with such a task, but we can ask a special person to divide the test set of apples into a tasty and tasteless bunch. With such heaps, the robot can associate the taste of apples with certain extraneous qualities, for example, with the color of the skin or size. The "matrix" for query-URL pairs just performs such an operation - it looks for non-obvious properties of pages that reliably affect its relevance to a specific query.
Such machine learning first began to be used in search engines back in 2000, but "Matrix" has certain important advantages over analogues. He, for example, is much more resistant to so-called retraining. This is the Achilles heel of many machine learning systems, it manifests itself in the fact that systems in small samples find all sorts of meaningless dependencies - for example, between relevance and font color.
So, on the one hand, there is a pfound metric for evaluating search quality, on the other hand, a machine learning system that tries to maximize this metric. The more evaluated requests we send to Matrix, the better the search will work.
User Testing and Search Personalization
After the machine learning system has received a certain non-conceptual dependence, which allows improving the relevance of the query for evaluating assessors in a search engine, they roll out this change in the general ranking formula to parts of users and look at their reaction. This is done according to a technique that was recently developed (June 2013), FML (friendly machine learning). Simply put, this is done as follows: two ranking results are taken, according to the old (C) and according to the new formula (H), and mixed in turn - about the same way football teams are selected in courtyard football. Two versions of the "mixture" are obtained: S1, N1, S2, N2,... and N1, S1, N2, S2,... where S1 is the first URL using the old formula, N1 is the first URL using the new formula, S2 is the second URL using the old formula, and so on. What "mixture" is shown to the user is determined by chance. And then the developers actually deal with voting users for this or that ranking system, which they themselves do not know about. At the same time, we, of course, conduct statistical analysis and see whether the improvement is significant or not.
The pfound metric cannot be used to evaluate personalized search. This is where the method of mixing different ranking results helps. If you add new factors related to personalization to the formula, you can check their effectiveness in this way.
"Let's say you like to listen to music on one site, and I on another. In a personalized giveaway, when you enter the title of a song, you get it on the site you love, and I get it on the one I love. The results of the issuance are different, but in both the formula includes the previous history of page visits. Search results with and without history can be mixed and see which one users like best. Experience shows that you usually like it very much. This is especially evident in such classes of search tasks, when a person wants to do an action on a familiar site, that is, for example, buy something, download, play an online game. That is, if a person is used to watching a movie on a certain site, then he finds it very well in the top ten. When the search engine starts to personally raise such results for him in the issue, he responds perfectly to this, quickly finds what he needs. And the other loves another hosting and finds it, "said in June 2013, Andrei Styskin, head of the Yandex ranking department[2]
In 2013, Yandex launched the so-called instant personalization technology, when the history of requests affects ranking within one session. How does the width of the time window relate to relevance?
"For sure, we don't know this, but we estimate that 30 percent of the profit from personalization gives account of the" long "search history, and 70 percent gives account of the short history, within one day."
Making Improvements to the Search Formula
The previously achieved improvements are made to the formula. About a hundred such amendments are made a year, several pieces every two weeks.
For example, by June 2013, Yandex had learned to evaluate the likelihood for "watch online" class requests that the user had actually looked at something on this page. For video hosting, find out how many percent of this video was viewed by the user before closing the tab. It is clear that if the video was not watched, then it did not very much meet expectations.
2010: Release of Yandex.Obninsk search program for processing geo-independent queries
In September 2010, the Obninsk search program, created to process geo-independent requests, left the beta testing stage. This was reported in the official blog of Yandex. The program improves ranking by geo-independent requests, the number of which reaches 70% of the total.
The first users of the new ranking were Yandex users in Ukraine and Belarus. The Russian version required additional optimization to achieve maximum performance.
How Search Quality Changes
The quality of search in the human understanding of this phrase both in Yandex and in the world is constantly growing. But this is smooth growth, there are no special outbursts in it. This is due to the fact that quality primarily depends on the presence on the Internet of the information that the user is looking for. The Internet is growing, there is more information, along with it the quality is growing. Over the past five years (2008-2013), even without taking into account the improvement in search technology, the likelihood that the answer to the user's question will be found on the Internet has increased significantly. The quality of the search engine algorithm itself is also growing. Someone is faster, someone is a little slower.
Search models, of course, have become much more complex, and what used to seem something out of the ordinary is now being done by part-time trainees. Nevertheless, somehow miraculously, the growth rate can be sustained. We are constantly introducing new factors and at the same time improving the machine learning system. The combination of both gives constant growth - since 2011, for example, the quality of search has grown almost linearly. In 2009, there was an exception, a noticeable jump associated with the introduction of "Matrixnet."
"But themain thing is that the world is changing, people's needs are changing a lot. A good ranking by the complex formula of the 2013 sample would be poor for the 2005 sample user. It is enough to compare how requests like a "phone application" have changed in recent years. Users change, they need different, therefore the ranking will be different, "said in June 2013, Andrei Styskin, head of the Yandex ranking department[2]
Interesting facts
- Try to enter the "frog color in fainting" in Yandex, and you will see what this color really looks like) And in general, a drum with a whole bunch of different colors will appear. There is even the color of the electrician, by the way. He's close to pale blue.
See also
- Search for Mail.ru
- Google Search
- Search engine
- Baidu
- Panguso.com (Chinese Internet search engine)
- Satellite Search Portal
- Internet search in Russia
- Internet Search (Global Market)
Notes
- ↑ EU’s Google Feud Aids Russian Rival Blamed for Kremlin Lies
- ↑ 2.0 2.1 2.2 2.3 " We are machine learning fans "in
- ↑ Cian.ru and other companies united against Yandex.
- ↑ . Yandex's new main page will be ya.ru.
- ↑ Yandex and VK removed official Instagram and Facebook sites from search results
- ↑ [https://www.securitylab.ru/news/527125.php , Yandex changed
- ↑ rules for working with video hosting to combat piracy]
- ↑ Yandex has extensively updated its search: Now it is instant, accurate and people help it