RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2010/04/29 13:26:39

Search algorithms

* Direct search — consecutive search of all data;

  • Inverted indexes — the list of words (index file) the positions documented in alphabetical order about the instruction and other parameters of occurrence of a word of the document.

Content

Return index

As you probably guessed searchers the algorithm of inverted indexes, t.k is used. use of direct search much more resursoyemko. Recovery from the return index will happen to losses (cases, hyphens, commas, and t.p.). Therefore the direct index of the document for display of a snippet (the fragment of the found text of the document displayed searching) is also stored.

Example

Document

Once upon a time there was a priest, Tolokonny forehead. The priest on a market went Look which-what to goods.


Return index of the document

market (3.4) was (1.2) lived (1.1) what (1.1) which (4.2) forehead (2.1) on (3.3) priest (1.3) (3.2)

Parameters are specified the most primitive and only for an example — (a line, a position in line). Cases of words, and belonging to a passage are also stored in parameters.

Mathematical model

By search 3 types of mathematical models, here they are used:

  • Boolean (logical) — are a word — is found, is not present — is not found;
  • Vector (PS are used by all) — document weight = TF * IDF

TF is word frequency in the document, IDF is a word rarity in a collection

  • Probabilistic — selection of issue in manual (using asessor) — independent determination of relevance of pages

Links

SEO is the Theory. Part 1: Algorithms