Tutorial: Robot Technology and Applications

Navigation Strategy

Robots take URLs from the list of known (and still not retrieved) URLs, and add newly found URLs to that list. The order in which this is done determines the navigation strategy of the robot.

When URLs are added to and taken from the list from the same end, the robot navigates depth-first. This generates the best distribution of retrieved documents over the Web. Going depth-first makes avoiding infinite recursive loops important.
When URLs are added to the other end of the list, the robot navigates breadth-first. When the initial list is a large list of Web servers (such as the official list of Web servers) a breadth-first search gives very good initial results, but fails to penetrate servers deeply. Documents that are several links away from the root of the servers are less likely to be found than documents near the root.

The official list of Web servers can be found at http://www.w3.org/hypertext/DataSources/WWW/Servers.html.