aboutsummaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2015-01-30 14:44:49 -0500
committerTom Lane <tgl@sss.pgh.pa.us>2015-01-30 14:44:49 -0500
commit4cbf390d5f0339f9846d4492b15f98b41bfffe12 (patch)
treef1303e08280dc7d5bb94bd2bcba6b28eefef5817 /doc/src
parent70da7aeba1f845d2f0bb864031e0bea21b384ca7 (diff)
downloadpostgresql-4cbf390d5f0339f9846d4492b15f98b41bfffe12.tar.gz
postgresql-4cbf390d5f0339f9846d4492b15f98b41bfffe12.zip
Fix jsonb Unicode escape processing, and in consequence disallow \u0000.
We've been trying to support \u0000 in JSON values since commit 78ed8e03c67d7333, and have introduced increasingly worse hacks to try to make it work, such as commit 0ad1a816320a2b53. However, it fundamentally can't work in the way envisioned, because the stored representation looks the same as for \\u0000 which is not the same thing at all. It's also entirely bogus to output \u0000 when de-escaped output is called for. The right way to do this would be to store an actual 0x00 byte, and then throw error only if asked to produce de-escaped textual output. However, getting to that point seems likely to take considerable work and may well never be practical in the 9.4.x series. To preserve our options for better behavior while getting rid of the nasty side-effects of 0ad1a816320a2b53, revert that commit in toto and instead throw error if \u0000 is used in a context where it needs to be de-escaped. (These are the same contexts where non-ASCII Unicode escapes throw error if the database encoding isn't UTF8, so this behavior is by no means without precedent.) In passing, make both the \u0000 case and the non-ASCII Unicode case report ERRCODE_UNTRANSLATABLE_CHARACTER / "unsupported Unicode escape sequence" rather than claiming there's something wrong with the input syntax. Back-patch to 9.4, where we have to do something because 0ad1a816320a2b53 broke things for many cases having nothing to do with \u0000. 9.3 also has bogus behavior, but only for that specific escape value, so given the lack of field complaints it seems better to leave 9.3 alone.
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/json.sgml19
-rw-r--r--doc/src/sgml/release-9.4.sgml16
2 files changed, 11 insertions, 24 deletions
diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 8feb2fbf0ad..6282ab88539 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -69,12 +69,14 @@
regardless of the database encoding, and are checked only for syntactic
correctness (that is, that four hex digits follow <literal>\u</>).
However, the input function for <type>jsonb</> is stricter: it disallows
- Unicode escapes for non-ASCII characters (those
- above <literal>U+007F</>) unless the database encoding is UTF8. It also
- insists that any use of Unicode surrogate pairs to designate characters
- outside the Unicode Basic Multilingual Plane be correct. Valid Unicode
- escapes, except for <literal>\u0000</>, are then converted to the
- equivalent ASCII or UTF8 character for storage.
+ Unicode escapes for non-ASCII characters (those above <literal>U+007F</>)
+ unless the database encoding is UTF8. The <type>jsonb</> type also
+ rejects <literal>\u0000</> (because that cannot be represented in
+ <productname>PostgreSQL</productname>'s <type>text</> type), and it insists
+ that any use of Unicode surrogate pairs to designate characters outside
+ the Unicode Basic Multilingual Plane be correct. Valid Unicode escapes
+ are converted to the equivalent ASCII or UTF8 character for storage;
+ this includes folding surrogate pairs into a single character.
</para>
<note>
@@ -101,7 +103,7 @@
constitutes valid <type>jsonb</type> data that do not apply to
the <type>json</type> type, nor to JSON in the abstract, corresponding
to limits on what can be represented by the underlying data type.
- Specifically, <type>jsonb</> will reject numbers that are outside the
+ Notably, <type>jsonb</> will reject numbers that are outside the
range of the <productname>PostgreSQL</productname> <type>numeric</> data
type, while <type>json</> will not. Such implementation-defined
restrictions are permitted by <acronym>RFC</> 7159. However, in
@@ -134,7 +136,8 @@
<row>
<entry><type>string</></entry>
<entry><type>text</></entry>
- <entry>See notes above concerning encoding restrictions</entry>
+ <entry><literal>\u0000</> is disallowed, as are non-ASCII Unicode
+ escapes if database encoding is not UTF8</entry>
</row>
<row>
<entry><type>number</></entry>
diff --git a/doc/src/sgml/release-9.4.sgml b/doc/src/sgml/release-9.4.sgml
index 961e4617978..11bbf3bf36c 100644
--- a/doc/src/sgml/release-9.4.sgml
+++ b/doc/src/sgml/release-9.4.sgml
@@ -103,22 +103,6 @@
<listitem>
<para>
- Unicode escapes in <link linkend="datatype-json"><type>JSON</type></link>
- text values are no longer rendered with the backslash escaped
- (Andrew Dunstan)
- </para>
-
- <para>
- Previously, all backslashes in text values being formed into JSON
- were escaped. Now a backslash followed by <literal>u</> and four
- hexadecimal digits is not escaped, as this is a legal sequence in a
- JSON string value, and escaping the backslash led to some perverse
- results.
- </para>
- </listitem>
-
- <listitem>
- <para>
When converting values of type <type>date</>, <type>timestamp</>
or <type>timestamptz</>
to <link linkend="datatype-json"><type>JSON</type></link>, render the