Department of Mathematics and Computing ScienceWorld Wide Web OrganizationPaul De Bra

Robots: Retrieving Documents

A robot needs to be able to send HTTP requests and receive and analyze HTTP responses. There are three common libraries which make this easy:


home blue tour

The disadvantage of single-threaded libraries like libwww-perl and libwww 2.17 is that they block for a long time when requesting a document from an unreachable or very slow server. Their main advantage is that they are easy to use.

The disadvantage of multi-threaded libraries is that it is more difficult to administer the outstanding requests and to avoid overloading servers by generating multiple requests to the same server. Their main advantages are that they reach a better overall throughput and they continue to receive data even when one of the connections is very slow.