Text Indexing Resources

From irefindex
Revision as of 15:52, 19 October 2009 by PaulBoddie (talk | contribs) (New page: A considerable number of text indexing solutions exist. This document discusses some of the more widely-known open source solutions. * [http://www.htdig.org/ ht://Dig] - ''a search engine...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

A considerable number of text indexing solutions exist. This document discusses some of the more widely-known open source solutions.

  • ht://Dig - a search engine solution for individual Web sites
  • Hyper Estraier - a reasonably well-utilised solution by other software systems and applications
  • Lucene - arguably the most popular text indexing solution in current use, original implementation in Java with bindings for, and ports to, other languages
  • Managing Gigabytes: Compressing and Indexing Documents and Images - provides software from the book of that name
  • PostgreSQL full text search - incorporates the previously separate tsearch2 functionality into recent versions of PostgreSQL (from 8.3 upwards)
  • Sphinx - used by numerous large-scale public Web sites and services in a traditional document search role
  • SWISH++ - an indexing and searching engine typically used for documents on Web sites
  • Whoosh - a pure Python search engine, apparently attracting interest from various other Python-based projects reluctant to use Lucene, Xapian and other technologies
  • Wumpus - an information retrieval system being used to investigate desktop search solutions, amongst other things
  • Xapian - a reasonably popular solution implemented in C++ with bindings for various languages, with a heritage dating back to 1984 and earlier
  • Zettair - previously known as Lucy (possibly the Lucene derivative of that name) which has "indexed the 426GB TREC terabyte track collection", implemented in C

Some links to comparisons: