Department of Mathematics and Computing ScienceWorld Wide Web OrganizationPaul De Bra

Determining Relevance

Search tools like the Fish-Search, retrieve information from the entire contents of (text) documents. This enables the best judgement of whether documents are relevant for the user's query or not. The fish-search uses a simple algorithm to evaluate the relevance of documents:

Information retrieval research shows that one should also take into account the "average" number of times a given word occurs in the document or the whole collection (i.e. the whole WWW). However, the whole collection is not available to the fish-search, so neither are these numbers.


home blue tour

In case a regular expression is given instead of a set of words the fish-search simply counts the number of matches, relative to the size of the document. The fish-search offers the agrep syntax and library which can tolerate errors in the given pattern, possibly meaning that it can find matches even where the document contains typos.