Tutorial: Robot Technology and Applications

The Fish-Search Navigation Algorithm

The Fish-Search maintains a list of (URLs of) documents still to be examined. The list has 3 positions: B, M, E (which stand for Begin, Middle and End). The search also has two parameters: depth and width.

Initially the list of documents to be examined contains a single element. That document is retrieved and checked for relevance.
Each URL has an associated depth-value. When a document is relevant, the embedded URLs get the value depth. When a document is not relevant the embedded URLs get the depth-value of the URL of that document, minus 1.
Embedded URLs in documents are added to the list as follows:
- When a URL has depth 0, all embedded links are added at position E.
- From a non-relevant document, width embedded URLs are selected and added at position M.
- From a relevant document, 1.5 x width embedded URLs are added at position B.
The remaining URLs from the documents are added at position E.
While retrieving documents the average transfer rate for documents from their server is monitored. The depth-value for URLs of documents on servers from which retrieval is very slow is set to 0.
The algorithm stops when a specified amount of time has passed or when the list is empty.

The increased width for relevant documents represent that fish produce more offspring after finding food. The increased depth represents that those fish also produce healthier offspring. The URLs added to the end of the list represent fish that have died. These URLs are only used when the algorithm runs out of other URLs.

The original Fish-Search implementation retrieves only one document at a time, and is therefore prone to blocking when a slow server or unreliable connection is encountered. A new implementation uses a number of agents in parallel to avoid this problem and to improve the overall performance.