Tutorial: Robot Technology and Applications

Data Structures for Index Databases

The most typical data structure for complete index databases is obtained as follows:

The text is divided into unique semi-infinite strings, or sistrings. Each sistring has a starting position in the text, and continues to the right until it is unique.
The sistrings are typically stored in (the leaves of) a tree, the suffix tree. (Common parts are stored only once.)
Each sistring can be associated with a location within a document where the sistring occurs. Subtrees below a certain node represent all occurrences of the substring represented by that node.
Suffix trees (and similar suffix arrays) have a size of the same order of magnitude as the input documents.

Below is an example suffix tree for the following words: