Content |
Return index
As you probably guessed searchers the algorithm of inverted indexes, t.k is used. use of direct search much more resursoyemko. Recovery from the return index will happen to losses (cases, hyphens, commas, and t.p.). Therefore the direct index of the document for display of a snippet (the fragment of the found text of the document displayed searching) is also stored.
Example
Document
Once upon a time there was a priest, Tolokonny forehead. The priest on a market went Look which-what to goods.
Return index of the document
market (3.4) was (1.2) lived (1.1) what (1.1) which (4.2) forehead (2.1) on (3.3) priest (1.3) (3.2)
Parameters are specified the most primitive and only for an example — (a line, a position in line). Cases of words, and belonging to a passage are also stored in parameters.
Mathematical model
By search 3 types of mathematical models, here they are used:
- Boolean (logical) — are a word — is found, is not present — is not found;
- Vector (PS are used by all) — document weight = TF * IDF
TF is word frequency in the document, IDF is a word rarity in a collection
- Probabilistic — selection of issue in manual (using asessor) — independent determination of relevance of pages