diff options
author | Andrew Dunstan <andrew@dunslane.net> | 2013-06-08 10:20:54 -0400 |
---|---|---|
committer | Andrew Dunstan <andrew@dunslane.net> | 2013-06-08 10:20:54 -0400 |
commit | d7cb64aeb5a7e40f4ea75e60bba1d606ca06df7f (patch) | |
tree | 9f218035148c4294cf7b5489731758da88bf37c3 | |
parent | 8af3f277b4f941404ae43251e23d6561f2250ebb (diff) | |
download | postgresql-d7cb64aeb5a7e40f4ea75e60bba1d606ca06df7f.tar.gz postgresql-d7cb64aeb5a7e40f4ea75e60bba1d606ca06df7f.zip |
Don't downcase non-ascii identifier chars in multi-byte encodings.
Long-standing code has called tolower() on identifier character bytes
with the high bit set. This is clearly an error and produces junk output
when the encoding is multi-byte. This patch therefore restricts this
activity to cases where there is a character with the high bit set AND
the encoding is single-byte.
There have been numerous gripes about this, most recently from Martin
Schäfer.
Backpatch to all live releases.
-rw-r--r-- | src/backend/parser/scansup.c | 8 |
1 files changed, 5 insertions, 3 deletions
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c index c6ac203f86c..1586da6d94b 100644 --- a/src/backend/parser/scansup.c +++ b/src/backend/parser/scansup.c @@ -130,8 +130,10 @@ downcase_truncate_identifier(const char *ident, int len, bool warn) { char *result; int i; + bool enc_is_single_byte; result = palloc(len + 1); + enc_is_single_byte = pg_database_encoding_max_length() == 1; /* * SQL99 specifies Unicode-aware case normalization, which we don't yet @@ -139,8 +141,8 @@ downcase_truncate_identifier(const char *ident, int len, bool warn) * locale-aware translation. However, there are some locales where this * is not right either (eg, Turkish may do strange things with 'i' and * 'I'). Our current compromise is to use tolower() for characters with - * the high bit set, and use an ASCII-only downcasing for 7-bit - * characters. + * the high bit set, as long as they aren't part of a multi-byte character, + * and use an ASCII-only downcasing for 7-bit characters. */ for (i = 0; i < len; i++) { @@ -148,7 +150,7 @@ downcase_truncate_identifier(const char *ident, int len, bool warn) if (ch >= 'A' && ch <= 'Z') ch += 'a' - 'A'; - else if (IS_HIGHBIT_SET(ch) && isupper(ch)) + else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch)) ch = tolower(ch); result[i] = (char) ch; } |