Fix documentation of regular expression character-entry escapes.

The docs claimed that \uhhhh would be interpreted as a Unicode value regardless of the database encoding, but it's never been implemented that way: \uhhhh and \xhhhh actually mean exactly the same thing, namely the character that pg_mb2wchar translates to 0xhhhh. Moreover we were falsely dismissive of the usefulness of Unicode code points above FFFF. Fix that. It's been like this for ages, so back-patch to all supported branches.
author: Tom Lane <tgl@sss.pgh.pa.us> 2015-09-16 14:50:12 -0400
committer: Tom Lane <tgl@sss.pgh.pa.us> 2015-09-16 14:50:52 -0400
commit: dad7ea7e4dcc9ebdfb5480aa91cd1424135214e9 (patch)
tree: d80ec9551dd6d55c4d86e82f9dc3bb3a18423489
parent: 06a1ada7935acce6c6b9f5569ca3da9260a50784 (diff)
download: postgresql-dad7ea7e4dcc9ebdfb5480aa91cd1424135214e9.tar.gz
postgresql-dad7ea7e4dcc9ebdfb5480aa91cd1424135214e9.zip
1 files changed, 17 insertions, 4 deletions
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 44a8a814501..d7103d6f78b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4417,7 +4417,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo;
        <entry> <literal>\e</> </entry>
        <entry> the character whose collating-sequence name
        is <literal>ESC</>,
-       or failing that, the character with octal value 033 </entry>
+       or failing that, the character with octal value <literal>033</> </entry>
        </row>
 
        <row>
@@ -4443,15 +4443,17 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo;
        <row>
        <entry> <literal>\u</><replaceable>wxyz</> </entry>
        <entry> (where <replaceable>wxyz</> is exactly four hexadecimal digits)
-       the UTF16 (Unicode, 16-bit) character <literal>U+</><replaceable>wxyz</>
-       in the local byte ordering </entry>
+       the character whose hexadecimal value is
+       <literal>0x</><replaceable>wxyz</>
+       </entry>
        </row>
 
        <row>
        <entry> <literal>\U</><replaceable>stuvwxyz</> </entry>
        <entry> (where <replaceable>stuvwxyz</> is exactly eight hexadecimal
        digits)
-       reserved for a hypothetical Unicode extension to 32 bits
+       the character whose hexadecimal value is
+       <literal>0x</><replaceable>stuvwxyz</>
        </entry>
        </row>
 
@@ -4501,6 +4503,17 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo;
    </para>
 
    <para>
+    Numeric character-entry escapes specifying values outside the ASCII range
+    (0-127) have meanings dependent on the database encoding.  When the
+    encoding is UTF-8, escape values are equivalent to Unicode code points,
+    for example <literal>\u1234</> means the character <literal>U+1234</>.
+    For other multibyte encodings, character-entry escapes usually just
+    specify the concatenation of the byte values for the character.  If the
+    escape value does not correspond to any legal character in the database
+    encoding, no error will be raised, but it will never match any data.
+   </para>
+
+   <para>
     The character-entry escapes are always taken as ordinary characters.
     For example, <literal>\135</> is <literal>]</> in ASCII, but
     <literal>\135</> does not terminate a bracket expression.
author	Tom Lane <tgl@sss.pgh.pa.us>	2015-09-16 14:50:12 -0400
committer	Tom Lane <tgl@sss.pgh.pa.us>	2015-09-16 14:50:52 -0400
commit	dad7ea7e4dcc9ebdfb5480aa91cd1424135214e9 (patch)
tree	d80ec9551dd6d55c4d86e82f9dc3bb3a18423489
parent	06a1ada7935acce6c6b9f5569ca3da9260a50784 (diff)
download	postgresql-dad7ea7e4dcc9ebdfb5480aa91cd1424135214e9.tar.gz postgresql-dad7ea7e4dcc9ebdfb5480aa91cd1424135214e9.zip