aboutsummaryrefslogtreecommitdiff
path: root/doc/src/sgml/charset.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/charset.sgml')
-rw-r--r--doc/src/sgml/charset.sgml61
1 files changed, 56 insertions, 5 deletions
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index a6143ef8a74..555d1b4ac63 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -847,11 +847,13 @@ CREATE COLLATION german (provider = libc, locale = 'de_DE');
<para>
Note that while this system allows creating collations that <quote>ignore
- case</quote> or <quote>ignore accents</quote> or similar (using
- the <literal>ks</literal> key), PostgreSQL does not at the moment allow
- such collations to act in a truly case- or accent-insensitive manner. Any
- strings that compare equal according to the collation but are not
- byte-wise equal will be sorted according to their byte values.
+ case</quote> or <quote>ignore accents</quote> or similar (using the
+ <literal>ks</literal> key), in order for such collations to act in a
+ truly case- or accent-insensitive manner, they also need to be declared as not
+ <firstterm>deterministic</firstterm> in <command>CREATE COLLATION</command>;
+ see <xref linkend="collation-nondeterministic"/>.
+ Otherwise, any strings that compare equal according to the collation but
+ are not byte-wise equal will be sorted according to their byte values.
</para>
<note>
@@ -883,6 +885,55 @@ CREATE COLLATION french FROM "fr-x-icu";
</para>
</sect4>
</sect3>
+
+ <sect3 id="collation-nondeterministic">
+ <title>Nondeterminstic Collations</title>
+
+ <para>
+ A collation is either <firstterm>deterministic</firstterm> or
+ <firstterm>nondeterministic</firstterm>. A deterministic collation uses
+ deterministic comparisons, which means that it considers strings to be
+ equal only if they consist of the same byte sequence. Nondeterministic
+ comparison may determine strings to be equal even if they consist of
+ different bytes. Typical situations include case-insensitive comparison,
+ accent-insensitive comparison, as well as comparion of strings in
+ different Unicode normal forms. It is up to the collation provider to
+ actually implement such insensitive comparisons; the deterministic flag
+ only determines whether ties are to be broken using bytewise comparison.
+ See also <ulink url="https://unicode.org/reports/tr10">Unicode Technical
+ Standard 10</ulink> for more information on the terminology.
+ </para>
+
+ <para>
+ To create a nondeterministic collation, specify the property
+ <literal>deterministic = false</literal> to <command>CREATE
+ COLLATION</command>, for example:
+<programlisting>
+CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false);
+</programlisting>
+ This example would use the standard Unicode collation in a
+ nondeterministic way. In particular, this would allow strings in
+ different normal forms to be compared correctly. More interesting
+ examples make use of the ICU customization facilities explained above.
+ For example:
+<programlisting>
+CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
+CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false);
+</programlisting>
+ </para>
+
+ <para>
+ All standard and predefined collations are deterministic, all
+ user-defined collations are deterministic by default. While
+ nondeterministic collations give a more <quote>correct</quote> behavior,
+ especially when considering the full power of Unicode and its many
+ special cases, they also have some drawbacks. Foremost, their use leads
+ to a performance penalty. Also, certain operations are not possible with
+ nondeterministic collations, such as pattern matching operations.
+ Therefore, they should be used only in cases where they are specifically
+ wanted.
+ </para>
+ </sect3>
</sect2>
</sect1>