In the WWW step 1 of the information retrieval process (finding the documents) is the most difficult. In order to create an "index-database" a list of pointers to documents needs to be built. Since the WWW uses hypertext, the pointers to documents are "hidden" inside the documents themselves. Almost all search-facilities for the WWW use the following approach:
The problem of deciding when to stop, or when to not retrieve a document, is inherently undecidable. When retrieving documents from a WWW server it is impossible to detect whether this is a "static" document, or whether it is generated by a program, and therefore may change even over short periods of time. Often, such dynamic documents contain a link to a document with the same name (but possibly with a different contents). There is an inherent danger of going into an infinite loop retrieving (different) documents, all with the same name. This happens for instance with a document that shows the current time and that has an 'update' button.