diff options
author | Peter Eisentraut <peter@eisentraut.org> | 2024-11-27 08:18:35 +0100 |
---|---|---|
committer | Peter Eisentraut <peter@eisentraut.org> | 2024-11-27 08:19:42 +0100 |
commit | 85b7efa1cdd63c2fe2b70b725b8285743ee5787f (patch) | |
tree | 812b8d1f7a41163284043e4c53f5949daec7f37c /src/backend/utils/adt/like.c | |
parent | 8fcd80258bcf43dab93d877a5de0ce3f4d2bd471 (diff) | |
download | postgresql-85b7efa1cdd63c2fe2b70b725b8285743ee5787f.tar.gz postgresql-85b7efa1cdd63c2fe2b70b725b8285743ee5787f.zip |
Support LIKE with nondeterministic collations
This allows for example using LIKE with case-insensitive collations.
There was previously no internal implementation of this, so it was met
with a not-supported error. This adds the internal implementation and
removes the error. The implementation follows the specification of
the SQL standard for this.
Unlike with deterministic collations, the LIKE matching cannot go
character by character but has to go substring by substring. For
example, if we are matching against LIKE 'foo%bar', we can't start by
looking for an 'f', then an 'o', but instead with have to find
something that matches 'foo'. This is because the collation could
consider substrings of different lengths to be equal. This is all
internal to MatchText() in like_match.c.
The changes in GenericMatchText() in like.c just pass through the
locale information to MatchText(), which was previously not needed.
This matches exactly Generic_Text_IC_like() below.
ILIKE is not affected. (It's unclear whether ILIKE makes sense under
nondeterministic collations.)
This also updates match_pattern_prefix() in like_support.c to support
optimizing the case of an exact pattern with nondeterministic
collations. This was already alluded to in the previous code.
(includes documentation examples from Daniel Vérité and test cases
from Paul A Jungwirth)
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/700d2e86-bf75-4607-9cf2-f5b7802f6e88@eisentraut.org
Diffstat (limited to 'src/backend/utils/adt/like.c')
-rw-r--r-- | src/backend/utils/adt/like.c | 26 |
1 files changed, 16 insertions, 10 deletions
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c index 0152723b2a6..7b3d1b5be71 100644 --- a/src/backend/utils/adt/like.c +++ b/src/backend/utils/adt/like.c @@ -147,22 +147,28 @@ SB_lower_char(unsigned char c, pg_locale_t locale) static inline int GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation) { - if (collation) - { - pg_locale_t locale = pg_newlocale_from_collation(collation); + pg_locale_t locale; - if (!locale->deterministic) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("nondeterministic collations are not supported for LIKE"))); + if (!OidIsValid(collation)) + { + /* + * This typically means that the parser could not resolve a conflict + * of implicit collations, so report it that way. + */ + ereport(ERROR, + (errcode(ERRCODE_INDETERMINATE_COLLATION), + errmsg("could not determine which collation to use for LIKE"), + errhint("Use the COLLATE clause to set the collation explicitly."))); } + locale = pg_newlocale_from_collation(collation); + if (pg_database_encoding_max_length() == 1) - return SB_MatchText(s, slen, p, plen, 0); + return SB_MatchText(s, slen, p, plen, locale); else if (GetDatabaseEncoding() == PG_UTF8) - return UTF8_MatchText(s, slen, p, plen, 0); + return UTF8_MatchText(s, slen, p, plen, locale); else - return MB_MatchText(s, slen, p, plen, 0); + return MB_MatchText(s, slen, p, plen, locale); } static inline int |