aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--contrib/citext/citext.c23
-rw-r--r--doc/src/sgml/citext.sgml50
2 files changed, 52 insertions, 21 deletions
diff --git a/contrib/citext/citext.c b/contrib/citext/citext.c
index 25e4dd3999b..31b952b3f74 100644
--- a/contrib/citext/citext.c
+++ b/contrib/citext/citext.c
@@ -4,6 +4,7 @@
#include "postgres.h"
#include "access/hash.h"
+#include "catalog/pg_collation.h"
#include "fmgr.h"
#include "utils/builtins.h"
#include "utils/formatting.h"
@@ -48,8 +49,16 @@ citextcmp(text *left, text *right, Oid collid)
*rcstr;
int32 result;
- lcstr = str_tolower(VARDATA_ANY(left), VARSIZE_ANY_EXHDR(left), collid);
- rcstr = str_tolower(VARDATA_ANY(right), VARSIZE_ANY_EXHDR(right), collid);
+ /*
+ * We must do our str_tolower calls with DEFAULT_COLLATION_OID, not the
+ * input collation as you might expect. This is so that the behavior of
+ * citext's equality and hashing functions is not collation-dependent. We
+ * should change this once the core infrastructure is able to cope with
+ * collation-dependent equality and hashing functions.
+ */
+
+ lcstr = str_tolower(VARDATA_ANY(left), VARSIZE_ANY_EXHDR(left), DEFAULT_COLLATION_OID);
+ rcstr = str_tolower(VARDATA_ANY(right), VARSIZE_ANY_EXHDR(right), DEFAULT_COLLATION_OID);
result = varstr_cmp(lcstr, strlen(lcstr),
rcstr, strlen(rcstr),
@@ -93,7 +102,7 @@ citext_hash(PG_FUNCTION_ARGS)
char *str;
Datum result;
- str = str_tolower(VARDATA_ANY(txt), VARSIZE_ANY_EXHDR(txt), PG_GET_COLLATION());
+ str = str_tolower(VARDATA_ANY(txt), VARSIZE_ANY_EXHDR(txt), DEFAULT_COLLATION_OID);
result = hash_any((unsigned char *) str, strlen(str));
pfree(str);
@@ -122,8 +131,8 @@ citext_eq(PG_FUNCTION_ARGS)
/* We can't compare lengths in advance of downcasing ... */
- lcstr = str_tolower(VARDATA_ANY(left), VARSIZE_ANY_EXHDR(left), PG_GET_COLLATION());
- rcstr = str_tolower(VARDATA_ANY(right), VARSIZE_ANY_EXHDR(right), PG_GET_COLLATION());
+ lcstr = str_tolower(VARDATA_ANY(left), VARSIZE_ANY_EXHDR(left), DEFAULT_COLLATION_OID);
+ rcstr = str_tolower(VARDATA_ANY(right), VARSIZE_ANY_EXHDR(right), DEFAULT_COLLATION_OID);
/*
* Since we only care about equality or not-equality, we can avoid all the
@@ -152,8 +161,8 @@ citext_ne(PG_FUNCTION_ARGS)
/* We can't compare lengths in advance of downcasing ... */
- lcstr = str_tolower(VARDATA_ANY(left), VARSIZE_ANY_EXHDR(left), PG_GET_COLLATION());
- rcstr = str_tolower(VARDATA_ANY(right), VARSIZE_ANY_EXHDR(right), PG_GET_COLLATION());
+ lcstr = str_tolower(VARDATA_ANY(left), VARSIZE_ANY_EXHDR(left), DEFAULT_COLLATION_OID);
+ rcstr = str_tolower(VARDATA_ANY(right), VARSIZE_ANY_EXHDR(right), DEFAULT_COLLATION_OID);
/*
* Since we only care about equality or not-equality, we can avoid all the
diff --git a/doc/src/sgml/citext.sgml b/doc/src/sgml/citext.sgml
index 8cbde88a3e7..0c6855fea62 100644
--- a/doc/src/sgml/citext.sgml
+++ b/doc/src/sgml/citext.sgml
@@ -58,9 +58,9 @@ SELECT * FROM tab WHERE lower(col) = LOWER(?);
The <type>citext</> data type allows you to eliminate calls
to <function>lower</> in SQL queries, and allows a primary key to
be case-insensitive. <type>citext</> is locale-aware, just
- like <type>text</>, which means that the comparison of upper case and
+ like <type>text</>, which means that the matching of upper case and
lower case characters is dependent on the rules of
- the <literal>LC_CTYPE</> locale setting. Again, this behavior is
+ the database's <literal>LC_CTYPE</> setting. Again, this behavior is
identical to the use of <function>lower</> in queries. But because it's
done transparently by the data type, you don't have to remember to do
anything special in your queries.
@@ -97,17 +97,25 @@ SELECT * FROM users WHERE nick = 'Larry';
<sect2>
<title>String Comparison Behavior</title>
+
+ <para>
+ <type>citext</> performs comparisons by converting each string to lower
+ case (as though <function>lower</> were called) and then comparing the
+ results normally. Thus, for example, two strings are considered equal
+ if <function>lower</> would produce identical results for them.
+ </para>
+
<para>
In order to emulate a case-insensitive collation as closely as possible,
- there are <type>citext</>-specific versions of a number of the comparison
+ there are <type>citext</>-specific versions of a number of string-processing
operators and functions. So, for example, the regular expression
operators <literal>~</> and <literal>~*</> exhibit the same behavior when
- applied to <type>citext</>: they both compare case-insensitively.
+ applied to <type>citext</>: they both match case-insensitively.
The same is true
for <literal>!~</> and <literal>!~*</>, as well as for the
<literal>LIKE</> operators <literal>~~</> and <literal>~~*</>, and
<literal>!~~</> and <literal>!~~*</>. If you'd like to match
- case-sensitively, you can always cast to <type>text</> before comparing.
+ case-sensitively, you can cast the operator's arguments to <type>text</>.
</para>
<para>
@@ -168,10 +176,10 @@ SELECT * FROM users WHERE nick = 'Larry';
<itemizedlist>
<listitem>
<para>
- <type>citext</>'s behavior depends on
+ <type>citext</>'s case-folding behavior depends on
the <literal>LC_CTYPE</> setting of your database. How it compares
- values is therefore determined when
- <application>initdb</> is run to create the cluster. It is not truly
+ values is therefore determined when the database is created.
+ It is not truly
case-insensitive in the terms defined by the Unicode standard.
Effectively, what this means is that, as long as you're happy with your
collation, you should be happy with <type>citext</>'s comparisons. But
@@ -183,6 +191,20 @@ SELECT * FROM users WHERE nick = 'Larry';
<listitem>
<para>
+ As of <productname>PostgreSQL</> 9.1, you can attach a
+ <literal>COLLATE</> specification to <type>citext</> columns or data
+ values. Currently, <type>citext</> operators will honor a non-default
+ <literal>COLLATE</> specification while comparing case-folded strings,
+ but the initial folding to lower case is always done according to the
+ database's <literal>LC_CTYPE</> setting (that is, as though
+ <literal>COLLATE "default"</> were given). This may be changed in a
+ future release so that both steps follow the input <literal>COLLATE</>
+ specification.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
<type>citext</> is not as efficient as <type>text</> because the
operator functions and the B-tree comparison functions must make copies
of the data and convert it to lower case for comparisons. It is,
@@ -198,11 +220,11 @@ SELECT * FROM users WHERE nick = 'Larry';
contexts. The standard answer is to use the <type>text</> type and
manually use the <function>lower</> function when you need to compare
case-insensitively; this works all right if case-insensitive comparison
- is needed only infrequently. If you need case-insensitive most of
- the time and case-sensitive infrequently, consider storing the data
+ is needed only infrequently. If you need case-insensitive behavior most
+ of the time and case-sensitive infrequently, consider storing the data
as <type>citext</> and explicitly casting the column to <type>text</>
- when you want case-sensitive comparison. In either situation, you
- will need two indexes if you want both types of searches to be fast.
+ when you want case-sensitive comparison. In either situation, you will
+ need two indexes if you want both types of searches to be fast.
</para>
</listitem>
@@ -210,8 +232,8 @@ SELECT * FROM users WHERE nick = 'Larry';
<para>
The schema containing the <type>citext</> operators must be
in the current <varname>search_path</> (typically <literal>public</>);
- if it is not, a normal case-sensitive <type>text</> comparison
- is performed.
+ if it is not, the normal case-sensitive <type>text</> operators
+ will be invoked instead.
</para>
</listitem>
</itemizedlist>