2019: Google does from the REP standard
At the beginning of July, 2019 Google opened source codes C ++ libraries for analysis of the robots.txt files containing the rule REP (Robots Exclusion Protocol) for restriction of indexation of the websites by robots of search systems. Thanks to the initiative the company hopes to make the REP protocol official the standard for websites.
Robots make the index — base of the search system from where links for search issue of Google, "Yandex", etc. undertake. Robots scan the websites and add the page to the index or move away them from there. This indexation can be managed: for example, to permit or prohibit robots a bypass of pages in the special robots.txt file.
REP was considered as the standard de facto 25 years that allowed developers and users to interpret the protocol at discretion. Besides, the protocol was never updated to correspond to modern realities. Having made available the parser of the robots.txt file under the license Apache License 2.0 and having submitted the REP specifications to Engineering council of the Internet (IETF), the Google company wants to reduce differences between implementations.
Google together with the author of the original protocol Martijn Koste, the famous webmasters and developers of other search systems submitted recommendations for use of Robots Exclusion Protocol. They do not change the basic principles described in the document of 25-year, and fill spaces in them taking into account features of a modern World Wide Web
The official documentation will allow webmasters to fill out correctly robots.txt and to hide a part of content from search robots.[1]
Together with Google library offered the code of the utility for validation of determination of rules in robots.txt. The provided code is used in the working systems of Google executing processing of robots.txt.
