Bioscape Indexing

Bioscape places the following requirements on text indexing solutions:

Efficient storage of position information

Since many tokens are produced and none discarded by the default tokenisation policy, positions must not occupy excessive amounts of space when stored in an index.

Efficient access to index storage

For large datasets, the solution must be able to deal with the corresponding large files and not rely on reading them into RAM in their entirety.

Convenient access to document identifiers

Many indexing solutions assign arbitrary identifiers to indexed documents, mandating the storage of genuine identifiers in separate field storage, but this can make access to the required information less efficient. Although this need not be a problem for small volumes of results, this can make the retrieval of large volumes of results more time-consuming as field information needs to be accessed for each result document.

Anonymous

Search

Bioscape Indexing

Namespaces

More

Page actions

Efficient storage of position information

Efficient access to index storage

Convenient access to document identifiers

Navigation

Navigation

Internal Links

Wiki tools

Wiki tools

Anonymous

Search

Bioscape Indexing

Efficient storage of position information

Efficient access to index storage

Convenient access to document identifiers

Navigation

Wiki tools

Page tools

Categories