


Words may not have the same importance in documents and in queries.
(See also Determining Relevance in General)
- In an index-database word occurrences may have an associated weight,
depending on how often the words appear in the document.
- The weight of a word should also depend on how often a word appears
in the entire database, but this is often not the case.
- In a query the user may consider some words more important than others.
These weights may be different from the ones in the database.
None of the popular search tools provide an explicit way to control the
weight of search terms. (WebCrawler allows a trick by repeating search terms.)
The figure below shows that words that occur very frequently or
very rarely should not receive a large weight compared to words that
occur often enough to match a reasonable number of documents but not
too often so they don't match thousands of documents.