Tutorial: Robot Technology and Applications

The Robot Exclusion Protocol

Most robots check a server's /robots.txt file before attempting to retrieve documents from that server. Apart from comments that file contains two kinds of fields:

User-agent:: This attribute is followed by a name of a robot, or an asterisk (*) which means all robots.
Disallow:: This attribute is followed by a path prefix which should be avoided by the robots mentioned in the previous user-agent clause.

Below is an example:

# First disallow all robots from accessing /tmp and /infinite
User-agent: *
Disallow: /tmp/
Disallow: /infinite/

# Now allow the 'smartrobot' access to /infinite anyway
User-agent: smartrobot
Disallow: /tmp/

Note that the last rule that applies is taken. Putting the two rules in the opposite order would disallow /infinite access to the smartrobot.