Tutorial: Robot Technology and Applications

Determining Relevance

Search tools like the Fish-Search, retrieve information from the entire contents of (text) documents. This enables the best judgement of whether documents are relevant for the user's query or not. The fish-search uses a simple algorithm to evaluate the relevance of documents:

The user can select whether all given words must occur or not. (boolean and and or.)
If some (or, if chosen, all) words occur in a document the document is considered relevant.
If more than one of the given words occur the document is more relevant.
If the words occur often, relative to the size of the document, the document is more relevant.

Information retrieval research shows that one should also take into account the "average" number of times a given word occurs in the document or the whole collection (i.e. the whole WWW). However, the whole collection is not available to the fish-search, so neither are these numbers.

In case a regular expression is given instead of a set of words the fish-search simply counts the number of matches, relative to the size of the document. The fish-search offers the agrep syntax and library which can tolerate errors in the given pattern, possibly meaning that it can find matches even where the document contains typos.