From ea1db8ae70e5f4ceaae34dc9c06a07d59aaa022e Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Tue, 4 Apr 2023 10:28:08 -0700 Subject: Canonicalize ICU locale names to language tags. Convert to BCP47 language tags before storing in the catalog, except during binary upgrade or when the locale comes from an existing collation or template database. The resulting language tags can vary slightly between ICU versions. For instance, "@colBackwards=yes" is converted to "und-u-kb-true" in older versions of ICU, and to the simpler (but equivalent) "und-u-kb" in newer versions. The process of canonicalizing to a language tag also understands more input locale string formats than ucol_open(). For instance, "fr_CA.UTF-8" is misinterpreted by ucol_open() and the region is ignored; effectively treating it the same as the locale "fr" and opening the wrong collator. Canonicalization properly interprets the language and region, resulting in the language tag "fr-CA", which can then be understood by ucol_open(). This commit fixes a problem in prior versions due to ucol_open() misinterpreting locale strings as described above. For instance, creating an ICU collation with locale "fr_CA.UTF-8" would store that string directly in the catalog, which would later be passed to (and misinterpreted by) ucol_open(). After this commit, the locale string will be canonicalized to language tag "fr-CA" in the catalog, which will be properly understood by ucol_open(). Because this fix affects the resulting collator, we cannot change the locale string stored in the catalog for existing databases or collations; otherwise we'd risk corrupting indexes. Therefore, only canonicalize locales for newly-created (not upgraded) collations/databases. For similar reasons, do not backport. Discussion: https://postgr.es/m/8c7af6820aed94dc7bc259d2aa7f9663518e6137.camel@j-davis.com Reviewed-by: Peter Eisentraut --- src/backend/commands/dbcommands.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'src/backend/commands/dbcommands.c') diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index 24bcc5adfe8..2e242eeff24 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -1058,6 +1058,26 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("ICU locale must be specified"))); + /* + * During binary upgrade, or when the locale came from the template + * database, preserve locale string. Otherwise, canonicalize to a + * language tag. + */ + if (!IsBinaryUpgrade && dbiculocale != src_iculocale) + { + char *langtag = icu_language_tag(dbiculocale, + icu_validation_level); + + if (langtag && strcmp(dbiculocale, langtag) != 0) + { + ereport(NOTICE, + (errmsg("using standard form \"%s\" for locale \"%s\"", + langtag, dbiculocale))); + + dbiculocale = langtag; + } + } + icu_validate_locale(dbiculocale); } else -- cgit v1.2.3