diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2021-05-03 11:42:31 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2021-05-03 11:42:31 -0400 |
commit | f68970e33f4dc48094c24c78c452ad730ae9ae12 (patch) | |
tree | 192ae33fa21801492e9d1833f8a151d2f24d6ea7 /src/backend/regex/regc_cvec.c | |
parent | b94409a02f6122d77b5154e481c0819fed6b4c95 (diff) | |
download | postgresql-f68970e33f4dc48094c24c78c452ad730ae9ae12.tar.gz postgresql-f68970e33f4dc48094c24c78c452ad730ae9ae12.zip |
Fix performance issue in new regex match-all detection code.
Commit 824bf7190 introduced a new search of the NFAs generated by
regex compilation. I failed to think hard about the performance
characteristics of that search, with the predictable outcome
that it's bad: weird regexes can trigger exponential search time.
Worse, there's no check-for-interrupt in that code, so you can't
even cancel the query if this happens.
Fix by introducing memo-ization of the search results, so that any one
NFA state need be examined in detail just once. This potentially uses
a lot of memory, but we can bound the memory usage by putting a limit
on the number of states for which we'll try to prove match-all-ness.
That is sane because we already have a limit (DUPINF) on the maximum
finite string length that a matchall regex can match; and patterns
that involve much more than DUPINF states would probably exceed that
limit anyway.
Also, rearrange the logic so that we check the basic is-the-graph-
all-RAINBOW-arcs property before we start the recursive search to
determine path lengths. This will ensure that we fall out quickly
whenever the NFA couldn't possibly be matchall.
Also stick in a check-for-interrupt, just in case these measures
don't completely eliminate the risk of slowness.
Discussion: https://postgr.es/m/3483895.1619898362@sss.pgh.pa.us
Diffstat (limited to 'src/backend/regex/regc_cvec.c')
0 files changed, 0 insertions, 0 deletions