Don't downcase non-ascii identifier chars in multi-byte encodings.

Long-standing code has called tolower() on identifier character bytes with the high bit set. This is clearly an error and produces junk output when the encoding is multi-byte. This patch therefore restricts this activity to cases where there is a character with the high bit set AND the encoding is single-byte. There have been numerous gripes about this, most recently from Martin Schäfer. Backpatch to all live releases.
author: Andrew Dunstan <andrew@dunslane.net> 2013-06-08 10:21:06 -0400
committer: Andrew Dunstan <andrew@dunslane.net> 2013-06-08 10:21:06 -0400
commit: a56c92f938f81df6b9d59b5bb7edc44008f0e06c (patch)
tree: 82321539ace33b81bd9683592e1d2d452d82c515
parent: cd4fe9514f31cb56a471b1f8b2380f4ff5fc2f91 (diff)
download: postgresql-a56c92f938f81df6b9d59b5bb7edc44008f0e06c.tar.gz
postgresql-a56c92f938f81df6b9d59b5bb7edc44008f0e06c.zip
1 files changed, 5 insertions, 3 deletions
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 6101457a109..1c39fd3a3fd 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -130,8 +130,10 @@ downcase_truncate_identifier(const char *ident, int len, bool warn)
 {
 	char	   *result;
 	int			i;
+	bool        enc_is_single_byte;
 
 	result = palloc(len + 1);
+	enc_is_single_byte = pg_database_encoding_max_length() == 1;
 
 	/*
 	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
@@ -139,8 +141,8 @@ downcase_truncate_identifier(const char *ident, int len, bool warn)
 	 * locale-aware translation.  However, there are some locales where this
 	 * is not right either (eg, Turkish may do strange things with 'i' and
 	 * 'I').  Our current compromise is to use tolower() for characters with
-	 * the high bit set, and use an ASCII-only downcasing for 7-bit
-	 * characters.
+	 * the high bit set, as long as they aren't part of a multi-byte character,
+	 * and use an ASCII-only downcasing for 7-bit characters.
 	 */
 	for (i = 0; i < len; i++)
 	{
@@ -148,7 +150,7 @@ downcase_truncate_identifier(const char *ident, int len, bool warn)
 
 		if (ch >= 'A' && ch <= 'Z')
 			ch += 'a' - 'A';
-		else if (IS_HIGHBIT_SET(ch) && isupper(ch))
+		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
 			ch = tolower(ch);
 		result[i] = (char) ch;
 	}
author	Andrew Dunstan <andrew@dunslane.net>	2013-06-08 10:21:06 -0400
committer	Andrew Dunstan <andrew@dunslane.net>	2013-06-08 10:21:06 -0400
commit	a56c92f938f81df6b9d59b5bb7edc44008f0e06c (patch)
tree	82321539ace33b81bd9683592e1d2d452d82c515
parent	cd4fe9514f31cb56a471b1f8b2380f4ff5fc2f91 (diff)
download	postgresql-a56c92f938f81df6b9d59b5bb7edc44008f0e06c.tar.gz postgresql-a56c92f938f81df6b9d59b5bb7edc44008f0e06c.zip