GIN Indexes
index
GIN
Introduction
GIN stands for Generalized Inverted Index. It is
an index structure storing a set of (key, posting list) pairs, where
'posting list' is a set of documents in which the key occurs.
It is generalized in the sense that a GIN index
does not need to be aware of the operation that it accelerates.
Instead, it uses custom strategies defined for particular data types.
One advantage of GIN is that it allows the development
of custom data types with the appropriate access methods, by
an expert in the domain of the data type, rather than a database expert.
This is much the same advantage as using GiST.
The GIN
implementation in PostgreSQL is primarily
maintained by Teodor Sigaev and Oleg Bartunov, and there is more
information on their
website.
Extensibility
The GIN interface has a high level of abstraction,
requiring the access method implementer to only implement the semantics of
the data type being accessed. The GIN layer itself
takes care of concurrency, logging and searching the tree structure.
All it takes to get a GIN access method working
is to implement four user-defined methods, which define the behavior of
keys in the tree. In short, GIN combines extensibility
along with generality, code reuse, and a clean interface.
Implementation
There are four methods that an index operator class for
GIN must provide:
compare
extract value
extract query
consistent
Examples
The PostgreSQL source distribution includes
GIN classes for one-dimensional arrays of all internal
types. The following
contrib> modules also contain GIN
operator classes:
intarray
Enhanced support for int4[]
tsearch2
Support for inverted text indexing. This is much faster for very
large, mostly-static sets of documents.