


Finding exactly the information a user wants is a two step process:
- An information server offers tools for selecting information.
This selection may be as primitive as offering listings of document names,
and as complicated as handling natural language questions.
In any case, they offer the possibility to retrieve less than the entire
contents of the server.
- From the retrieved information the user has to select which documents
are considered relevant, and are kept, and which are irrelevant, and thus
discarded. A filtering process can be used to automate this at least partially.
The biggest problem with information retrieval on Internet is that because
of the limited network bandwidth it is important to be very selective in
step one, thus leaving less work for step two.
Retrieving as little data from Internet as possible is crucial for
finding the information the user wants.