

A robot needs to be able to send HTTP requests and receive and analyze
HTTP responses. There are three common libraries which make this easy:
- libwww-perl is a Perl-4 HTTP library written by Roy Fielding,
and available from the Univ. of California at Irvine, at
http://www.ics.uci.edu/WebSoft/libwww-perl/.
Many robots are written in Perl and use this library.
The library is single-threaded and supports the
robot exclusion protocol.
- libwww 2.17 is the HTTP library used by NCSA Mosaic for X.
It is written in C and created at CERN. It is still available from
W3C at
ftp://ftp.w3.org/pub/libwww/.
It is single threaded: A function is called to retrieve a document.
A (char) pointer to the contents of the document is returned.
- libwww 4.0 is the latest HTTP library offered by the W3C.
It is written in C. It is an object-oriented, event-based multithreaded
library. Requests are registered and queued, and events are generated upon
completion of requests. A single robot process can retrieve multiple
documents in parallel using this library.
The disadvantage of single-threaded libraries like libwww-perl
and libwww 2.17 is that they block for a long time when requesting a
document from an unreachable or very slow server. Their main
advantage is that they are easy to use.
The disadvantage of multi-threaded libraries is that it is more
difficult to administer the outstanding requests and to avoid overloading
servers by generating multiple requests to the same server.
Their main advantages are that they reach a better overall throughput
and they continue to receive data even when one of the connections is
very slow.