aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/gin
diff options
context:
space:
mode:
authorPeter Eisentraut <peter_e@gmx.net>2010-11-23 22:27:50 +0200
committerPeter Eisentraut <peter_e@gmx.net>2010-11-23 22:34:55 +0200
commitfc946c39aeacdff7df60c83fca6582985e8546c8 (patch)
tree866145f64c09c0673a4aa3d3a2f5647f0b7afc45 /src/backend/access/gin
parent44475e782f4674d257b9e5c1a3930218a4b4deea (diff)
downloadpostgresql-fc946c39aeacdff7df60c83fca6582985e8546c8.tar.gz
postgresql-fc946c39aeacdff7df60c83fca6582985e8546c8.zip
Remove useless whitespace at end of lines
Diffstat (limited to 'src/backend/access/gin')
-rw-r--r--src/backend/access/gin/README32
1 files changed, 16 insertions, 16 deletions
diff --git a/src/backend/access/gin/README b/src/backend/access/gin/README
index 69d5a319413..0f634f83d17 100644
--- a/src/backend/access/gin/README
+++ b/src/backend/access/gin/README
@@ -9,27 +9,27 @@ Gin stands for Generalized Inverted Index and should be considered as a genie,
not a drink.
Generalized means that the index does not know which operation it accelerates.
-It instead works with custom strategies, defined for specific data types (read
-"Index Method Strategies" in the PostgreSQL documentation). In that sense, Gin
+It instead works with custom strategies, defined for specific data types (read
+"Index Method Strategies" in the PostgreSQL documentation). In that sense, Gin
is similar to GiST and differs from btree indices, which have predefined,
comparison-based operations.
-An inverted index is an index structure storing a set of (key, posting list)
-pairs, where 'posting list' is a set of documents in which the key occurs.
-(A text document would usually contain many keys.) The primary goal of
+An inverted index is an index structure storing a set of (key, posting list)
+pairs, where 'posting list' is a set of documents in which the key occurs.
+(A text document would usually contain many keys.) The primary goal of
Gin indices is support for highly scalable, full-text search in PostgreSQL.
Gin consists of a B-tree index constructed over entries (ET, entries tree),
where each entry is an element of the indexed value (element of array, lexeme
-for tsvector) and where each tuple in a leaf page is either a pointer to a
-B-tree over item pointers (PT, posting tree), or a list of item pointers
+for tsvector) and where each tuple in a leaf page is either a pointer to a
+B-tree over item pointers (PT, posting tree), or a list of item pointers
(PL, posting list) if the tuple is small enough.
Note: There is no delete operation for ET. The reason for this is that in
our experience, the set of distinct words in a large corpus changes very
rarely. This greatly simplifies the code and concurrency algorithms.
-Gin comes with built-in support for one-dimensional arrays (eg. integer[],
+Gin comes with built-in support for one-dimensional arrays (eg. integer[],
text[]), but no support for NULL elements. The following operations are
available:
@@ -59,25 +59,25 @@ Gin Fuzzy Limit
There are often situations when a full-text search returns a very large set of
results. Since reading tuples from the disk and sorting them could take a
-lot of time, this is unacceptable for production. (Note that the search
+lot of time, this is unacceptable for production. (Note that the search
itself is very fast.)
-Such queries usually contain very frequent lexemes, so the results are not
-very helpful. To facilitate execution of such queries Gin has a configurable
-soft upper limit on the size of the returned set, determined by the
-'gin_fuzzy_search_limit' GUC variable. This is set to 0 by default (no
+Such queries usually contain very frequent lexemes, so the results are not
+very helpful. To facilitate execution of such queries Gin has a configurable
+soft upper limit on the size of the returned set, determined by the
+'gin_fuzzy_search_limit' GUC variable. This is set to 0 by default (no
limit).
If a non-zero search limit is set, then the returned set is a subset of the
whole result set, chosen at random.
"Soft" means that the actual number of returned results could slightly differ
-from the specified limit, depending on the query and the quality of the
+from the specified limit, depending on the query and the quality of the
system's random number generator.
From experience, a value of 'gin_fuzzy_search_limit' in the thousands
(eg. 5000-20000) works well. This means that 'gin_fuzzy_search_limit' will
-have no effect for queries returning a result set with less tuples than this
+have no effect for queries returning a result set with less tuples than this
number.
Limitations
@@ -115,5 +115,5 @@ Distant future:
Authors
-------
-All work was done by Teodor Sigaev (teodor@sigaev.ru) and Oleg Bartunov
+All work was done by Teodor Sigaev (teodor@sigaev.ru) and Oleg Bartunov
(oleg@sai.msu.su).