

- Robots may monitor several netnews groups, scanning messages for
embedded URL's.
- Robots may scan directories of ftp servers for the existence of HTML
files (possibly containing links to WWW documents).
- Robots may manipulate URL's to generate new URL's of documents that
may or may not exist. If document http://host/dir/subdir/file.html
exists then http://host/dir/subdir/, http://host/dir/
and http://host/ should also exist.
- Robots may manipulate URL's to retrieve directory listings.
If document http://host/dir/subdir/file.html exists then
http://host/dir/subdir/., http://host/dir/. and
http://host/. may generate directory listings, containing
HTML files.
- Robots may try a limited number of coordinates in clickable images.
- Robots may try to fill out forms that contain only one text field.
- Robots may retrieve the same URL more than once, and check whether
the returned document is always the same.
Robots that try to be too smart about finding documents may run into
trouble. Not only may there infinite loops (like in the
time example), but they may also find the
same server more than once, under a different name.
(Checking the actual IP number would solve this.)
Also, robots may stumble upon documents protected by user/password
combinations. Guessing for the password is not acceptable behavior.