

The Fish-Search maintains a list of (URLs of) documents still to be
examined. The list has 3 positions: B, M, E (which stand for
Begin, Middle and End). The search also has two parameters:
depth and width.
- Initially the list of documents to be examined contains
a single element.
That document is retrieved and checked for relevance.
- Each URL has an associated depth-value. When a document is relevant,
the embedded URLs get the value depth.
When a document is not relevant the embedded URLs get the depth-value of
the URL of that document, minus 1.
- Embedded URLs in documents are added to the list as follows:
- When a URL has depth 0, all embedded links are added at
position E.
- From a non-relevant document, width embedded URLs are selected
and added at position M.
- From a relevant document, 1.5 x width embedded URLs
are added at position B.
The remaining URLs from the documents are added at position E.
- While retrieving documents the average transfer rate for documents
from their server is monitored.
The depth-value for URLs of documents on servers from which retrieval
is very slow is set to 0.
- The algorithm stops when a specified amount of time has passed
or when the list is empty.
The increased width for relevant documents represent that fish
produce more offspring after finding food. The increased depth represents
that those fish also produce healthier offspring. The URLs added to the
end of the list represent fish that have died. These URLs are only used
when the algorithm runs out of other URLs.
The original Fish-Search implementation retrieves only one document
at a time, and is therefore prone to blocking when a slow server or
unreliable connection is encountered. A new implementation uses a number
of agents in parallel to avoid this problem and to improve the overall
performance.