| Commit message (Collapse) | Author | Age |
|
|
|
| |
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows for example using LIKE with case-insensitive collations.
There was previously no internal implementation of this, so it was met
with a not-supported error. This adds the internal implementation and
removes the error. The implementation follows the specification of
the SQL standard for this.
Unlike with deterministic collations, the LIKE matching cannot go
character by character but has to go substring by substring. For
example, if we are matching against LIKE 'foo%bar', we can't start by
looking for an 'f', then an 'o', but instead with have to find
something that matches 'foo'. This is because the collation could
consider substrings of different lengths to be equal. This is all
internal to MatchText() in like_match.c.
The changes in GenericMatchText() in like.c just pass through the
locale information to MatchText(), which was previously not needed.
This matches exactly Generic_Text_IC_like() below.
ILIKE is not affected. (It's unclear whether ILIKE makes sense under
nondeterministic collations.)
This also updates match_pattern_prefix() in like_support.c to support
optimizing the case of an exact pattern with nondeterministic
collations. This was already alluded to in the previous code.
(includes documentation examples from Daniel Vérité and test cases
from Paul A Jungwirth)
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/700d2e86-bf75-4607-9cf2-f5b7802f6e88@eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
| |
A missed check for the builtin collation provider could result in
falling through to call isalpha().
This does not appear to have practical consequences because it only
happens for characters in the ASCII range. Regardless, the builtin
provider should not be calling libc functions, so backpatch.
Discussion: https://postgr.es/m/1bd5a0a5192f82c22ee7527e825b18ab0028b2c7.camel@j-davis.com
Backpatch-through: 17
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead always fetch the locale and look at the ctype_is_c field.
hba.c relies on regexes working for the C locale without needing
catalog access, which worked before due to a special case for
C_COLLATION_OID in lc_ctype_is_c(). Move the special case to
pg_set_regex_collation() now that lc_ctype_is_c() is gone.
Author: Andreas Karlsson
Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
|
|
|
|
|
|
|
| |
Instead just look up the collation and check collate_is_c field.
Author: Andreas Karlsson
Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
as determined by include-what-you-use (IWYU)
While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that. In some cases, a more
specific #include replaces another less specific one.
Some manual adjustments of the automatic result:
- IWYU currently doesn't know about includes that provide global
variable declarations (like -Wmissing-variable-declarations), so
those includes are being kept manually.
- All includes for port(ability) headers are being kept for now, to
play it safe.
- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
patch from exploding in size.
Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.
As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.
Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
|
|
|
|
|
|
|
|
| |
Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz
Backpatch-through: 12
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since C99, there can be a trailing comma after the last value in an
enum definition. A lot of new code has been introducing this style on
the fly. Some new patches are now taking an inconsistent approach to
this. Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one. We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.
I omitted a few places where there was a fixed "last" value that will
always stay last. I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers. There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.
Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
locale_t is defined by POSIX.1-2008 and SUSv4, and available on all
targeted systems. For Windows, win32_port.h redirects to a partial
implementation called _locale_t. We can now remove a lot of
compile-time tests for HAVE_LOCALE_T, and associated comments and dead
code branches that were needed for older computers.
Since configure + MinGW builds didn't detect locale_t but now we assume
that all systems have it, further inconsistencies among the 3 Windows build
systems were revealed. With this commit, we no longer define
HAVE_WCSTOMBS_L and HAVE_MBSTOWCS_L on any Windows build system, but
we have logic to deal with that so that replacements are available where
appropriate.
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tristan Partin <tristan@neon.tech>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGLg7_T2GKwZFAkEf0V7vbnur-NfCjZPKZb%3DZfAXSV1ORw%40mail.gmail.com
|
|
|
|
| |
Backpatch-through: 11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SplitToVariants() in the ispell code, lseg_inside_poly() in geo_ops.c,
and regex_selectivity_sub() in selectivity estimation could recurse
until stack overflow; fix by adding check_stack_depth() calls.
So could next() in the regex compiler, but that case is better fixed by
converting its tail recursion to a loop. (We probably get better code
that way too, since next() can now be inlined into its sole caller.)
There remains a reachable stack overrun in the Turkish stemmer, but
we'll need some advice from the Snowball people about how to fix that.
Per report from Egor Chindyaskin and Alexander Lakhin. These mistakes
are old, so back-patch to all supported branches.
Richard Guo and Tom Lane
Discussion: https://postgr.es/m/1661334672.728714027@f473.i.mail.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, callers of pg_newlocale_from_collation() did not call it
if the collation was DEFAULT_COLLATION_OID and instead proceeded with
a pg_locale_t of 0. Instead, now we call it anyway and have it return
0 if the default collation was passed. It already did this, so we
just have to adjust the callers. This simplifies all the call sites
and also makes future enhancements easier.
After discussion and testing, the previous comment in pg_locale.c
about avoiding this for performance reasons may have been mistaken
since it was testing a very different patch version way back when.
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/ed3baa81-7fac-7788-cc12-41e3f7917e34@enterprisedb.com
|
|
|
|
| |
Backpatch-through: 10
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fills in some gaps in planner support for starts_with() and
the equivalent ^@ operator:
* A condition such as "textcol ^@ constant" can now use a regular
btree index, not only an SP-GiST index, so long as the index's
collation is C. (This works just like "textcol LIKE 'foo%'".)
* "starts_with(textcol, constant)" can be optimized the same as
"textcol ^@ constant".
* Fixed-prefix LIKE and regex patterns are now more like starts_with()
in another way: if you apply one to an SPGiST-indexed column, you'll
get an index condition using ^@ rather than two index conditions with
>= and <.
Per a complaint from Shay Rojansky. Patch by me; thanks to
Nathan Bossart for review.
Discussion: https://postgr.es/m/232599.1633800229@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Given a regex pattern with a very long fixed prefix (approaching 500
characters), the result of pow(FIXED_CHAR_SEL, fixed_prefix_len) can
underflow to zero. Typically the preceding selectivity calculation
would have underflowed as well, so that we compute 0/0 and get NaN.
In released branches this leads to an assertion failure later on.
That doesn't happen in HEAD, for reasons I've not explored yet,
but it's surely still a bug.
To fix, just skip the division when the pow() result is zero, so
that we'll (most likely) return a zero selectivity estimate. In
the edge cases where "sel" didn't yet underflow, perhaps this
isn't desirable, but I'm not sure that the case is worth spending
a lot of effort on. The results of regex_selectivity_sub() are
barely worth the electrons they're written on anyway :-(
Per report from Alexander Lakhin. Back-patch to all supported versions.
Discussion: https://postgr.es/m/6de0a0c3-ada9-cd0c-3e4e-2fa9964b41e3@gmail.com
|
|
|
|
| |
Backpatch-through: 9.5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ineq_histogram_selectivity() can be invoked in situations where the
ordering we care about is not that of the column's histogram. We could
be considering some other collation, or even more drastically, the
query operator might not agree at all with what was used to construct
the histogram. (We'll get here for anything using scalarineqsel-based
estimators, so that's quite likely to happen for extension operators.)
Up to now we just ignored this issue and assumed we were dealing with
an operator/collation whose sort order exactly matches the histogram,
possibly resulting in junk estimates if the binary search gets confused.
It's past time to improve that, since the use of nondefault collations
is increasing. What we can do is verify that the given operator and
collation match what's recorded in pg_statistic, and use the existing
code only if so. When they don't match, instead execute the operator
against each histogram entry, and take the fraction of successes as our
selectivity estimate. This gives an estimate that is probably good to
about 1/histogram_size, with no assumptions about ordering. (The quality
of the estimate is likely to degrade near the ends of the value range,
since the two orderings probably don't agree on what is an extremal value;
but this is surely going to be more reliable than what we did before.)
At some point we might further improve matters by storing more than one
histogram calculated according to different orderings. But this code
would still be good fallback logic when no matches exist, so that is
not an argument for not doing this.
While here, also improve get_variable_range() to deal more honestly
with non-default collations.
This isn't back-patchable, because it requires adding another argument
to ineq_histogram_selectivity, and because it might have significant
impact on the estimation results for extension operators relying on
scalarineqsel --- mostly for the better, one hopes, but in any case
destabilizing plan choices in back branches is best avoided.
Per investigation of a report from James Lucas.
Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 5e0928005 changed the planner so that, instead of blindly using
DEFAULT_COLLATION_OID when invoking operators for selectivity estimation,
it would use the collation of the column whose statistics we're
considering. This was recognized as still being not quite the right
thing, but it seemed like a good incremental improvement. However,
shortly thereafter we introduced nondeterministic collations, and that
creates cases where operators can fail if they're passed the wrong
collation. We don't want planning to fail in cases where the query itself
would work, so this means that we *must* use the query's collation when
invoking operators for estimation purposes.
The only real problem this creates is in ineq_histogram_selectivity, where
the binary search might produce a garbage answer if we perform comparisons
using a different collation than the column's histogram is ordered with.
However, when the query's collation is significantly different from the
column's default collation, the estimate we previously generated would be
pretty irrelevant anyway; so it's not clear that this will result in
noticeably worse estimates in practice. (A follow-on patch will improve
this situation in HEAD, but it seems too invasive for back-patch.)
The patch requires changing the signatures of mcv_selectivity and allied
functions, which are exported and very possibly are used by extensions.
In HEAD, I just did that, but an API/ABI break of this sort isn't
acceptable in stable branches. Therefore, in v12 the patch introduces
"mcv_selectivity_ext" and so on, with signatures matching HEAD, and makes
the old functions into wrappers that assume DEFAULT_COLLATION_OID should
be used. That does not match the prior behavior, but it should avoid risk
of failure in most cases. (In practice, I think most extension datatypes
aren't collation-aware, so the change probably doesn't matter to them.)
Per report from James Lucas. Back-patch to v12 where the problem was
introduced.
Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
|
|
|
|
| |
Backpatch-through: update all files in master, backpatch legal files through 9.4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make patternsel_common() select the comparison operators to use with
hardwired logic that matches pattern_prefix()'s new logic, eliminating
its dependencies on particular index opfamilies.
This shouldn't change any behavior, as it's just replacing runtime
operator lookups with the same values hard-wired. But it makes these
closely-related functions look more alike, and saving some runtime
syscache lookups is worth something.
Actually, it's not quite true that this is zero behavioral change:
when estimating for a column of type "name", the comparison constant
will be kept as "text" not coerced to "name". But that's more correct
anyway, and it allows additional simplification of the coercion logic,
again syncing this more closely with pattern_prefix().
Per consideration of a report from Manuel Rigger.
Discussion: https://postgr.es/m/CA+u7OA7nnGYy8rY0vdTe811NuA+Frr9nbcBO9u2Z+JxqNaud+g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Historically, the planner's LIKE/regex index optimizations were only
carried out for specific index opfamilies. That's never been a great
idea from the standpoint of extensibility, but it didn't matter so
much as long as we had no practical way to extend such behaviors anyway.
With the addition of planner support functions, and in view of ongoing
work to support additional table and index AMs, it seems like a good
time to relax this.
Hence, recast the decisions in match_pattern_prefix() so that rather
than decide which operators to generate by looking at what the index
opfamily contains, we decide which operators to generate a-priori
and then see if the opfamily supports them. This is much more
defensible from a semantic standpoint anyway, since we know the
semantics of the chosen operators precisely, and we only need to
assume that the opfamily correctly implements operators it claims
to support.
The existing "pattern" opfamilies put a crimp in this approach, since
we need to select the pattern operators if we want those to work.
So we still have to special-case those opfamilies. But that seems
all right, since in view of the addition of collations, the pattern
opfamilies seem like a legacy hack that nobody will be building on.
The only immediate effect of this change, so far as the core code is
concerned, is that anchored LIKE/regex patterns can be mapped onto
BRIN index searches, and exact-match patterns can be mapped onto hash
indexes, not only btree and spgist indexes as before. That's not a
terribly exciting result, but it does fix an omission mentioned in
the ancient comments here.
Note: no catversion bump, even though this touches pg_operator.dat,
because it's only adding OID macros not changing the contents of
postgres.bki.
Per consideration of a report from Manuel Rigger.
Discussion: https://postgr.es/m/CA+u7OA7nnGYy8rY0vdTe811NuA+Frr9nbcBO9u2Z+JxqNaud+g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The planner's optimization code for LIKE and regex operators could
error out with a complaint like "no = operator for opfamily NNN"
if someone created a binary-compatible index (for example, a
bpchar_ops index on a text column) on the LIKE's left argument.
This is a consequence of careless refactoring in commit 74dfe58a5.
The old code in match_special_index_operator only accepted specific
combinations of the pattern operator and the index opclass, thereby
indirectly guaranteeing that the opclass would have a comparison
operator with the same LHS input type as the pattern operator.
While moving the logic out to a planner support function, I simplified
that test in a way that no longer guarantees that. Really though we'd
like an altogether weaker dependency on the opclass, so rather than
put back exactly the old code, just allow lookup failure. I have in
mind now to rewrite this logic completely, but this is the minimum
change needed to fix the bug in v12.
Per report from Manuel Rigger. Back-patch to v12 where the mistake
came in.
Discussion: https://postgr.es/m/CA+u7OA7nnGYy8rY0vdTe811NuA+Frr9nbcBO9u2Z+JxqNaud+g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As coded, the ICU-collation path in pattern_char_isalpha() failed
to consider regular ASCII letters to be case-varying. This led to
like_fixed_prefix treating too much of an ILIKE pattern as being a
fixed prefix, so that indexscans derived from an ILIKE clause might
miss entries that they should find.
Per bug #15892 from James Inform. This is an oversight in the original
ICU patch (commit eccfef81e), so back-patch to v10 where that came in.
Discussion: https://postgr.es/m/15892-e5d2bea3e8a04a1b@postgresql.org
|
|
|
|
|
|
|
|
|
| |
Switch to 2.1 version of pg_bsd_indent. This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.
Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
|
|
|
|
|
|
|
|
| |
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.
Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
| |
The same code is used to handle both text and bytea, but bytea is not
collation-aware, so we shouldn't call get_collation_isdeterministic()
in that case, since that will error out with an invalid collation.
Reported-by: Jeevan Chalke <jeevan.chalke@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/CAM2%2B6%3DWaf3qJ1%3DyVTUH8_yG-SC0xcBMY%2BSFLhvKKNnWNXSUDBw%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a flag "deterministic" to collations. If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal. That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.
This functionality is only supported with the ICU provider. At least
glibc doesn't appear to have any locales that work in a
nondeterministic way, so it's not worth supporting this for the libc
provider.
The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).
This patch makes changes in three areas:
- CREATE COLLATION DDL changes and system catalog changes to support
this new flag.
- Many executor nodes and auxiliary code are extended to track
collations. Previously, this code would just throw away collation
information, because the eventually-called user-defined functions
didn't use it since they only cared about equality, which didn't
need collation information.
- String data type functions that do equality comparisons and hashing
are changed to take the (non-)deterministic flag into account. For
comparison, this just means skipping various shortcuts and tie
breakers that use byte-wise comparison. For hashing, we first need
to convert the input string to a canonical "sort key" using the ICU
analogue of strxfrm().
Reviewed-by: Daniel Verite <daniel@manitou-mail.org>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While at it, refactor patternsel() a bit so that it can be used from
the LIKE/regex planner support functions as well. This makes the
planner able to deal equally well with either operator or function
syntax for these operations. I'm not excited about that as a feature
in itself, but it provides a nice model for extensions to follow if
they want such behavior for their operations.
This change localizes the use of pattern_fixed_prefix() and
make_greater_string() so that they no longer need be exported.
(We might get pushback from extensions about that, perhaps,
in which case I'd be inclined to re-export them in a new header
file like_support.h.)
This reduces the bulk of selfuncs.c a fair amount, removing ~1370
lines or about one-sixth of that file; it's still too big, but this
is progress.
Discussion: https://postgr.es/m/24537.1550093915@sss.pgh.pa.us
|
|
For a long time, indxpath.c has had the ability to extract derived (lossy)
index conditions from certain operators such as LIKE. For just as long,
it's been obvious that we really ought to make that capability available
to extensions. This commit finally accomplishes that, by adding another
API for planner support functions that lets them create derived index
conditions for their functions. As proof of concept, the hardwired
"special index operator" code formerly present in indxpath.c is pushed
out to planner support functions attached to LIKE and other relevant
operators.
A weak spot in this design is that an extension needs to know OIDs for
the operators, datatypes, and opfamilies involved in the transformation
it wants to make. The core-code prototypes use hard-wired OID references
but extensions don't have that option for their own operators etc. It's
usually possible to look up the required info, but that may be slow and
inconvenient. However, improving that situation is a separate task.
I want to do some additional refactorization around selfuncs.c, but
that also seems like a separate task.
Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
|