Department of Mathematics and Computing ScienceWorld Wide Web OrganizationPaul De Bra

Robots Retrieving the entire World Wide Web

Assuming a robot can find all documents on the Web, the speed of the network may prevent the robot from doing this in a reasonable amount of time:


home blue tour

There is no standard way to know when a document becomes obsolete, other than to try to download it again and check whether it still exists. A conditional-get can be used to avoid the actual download if the document is not modified. Even the largest search tools like Lycos and Alta Vista need a few weeks between visits to the same documents.

There is no standard way to know when and where new documents are created, other than to try to download the whole Web over and over and look for new links. This means that the robots of major search tools take a long time to discover new documents.