aboutsummaryrefslogtreecommitdiff
path: root/src/backend/regex/regc_nfa.c
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2015-10-02 13:45:39 -0400
committerTom Lane <tgl@sss.pgh.pa.us>2015-10-02 13:45:39 -0400
commitbb704a781ada30b34b377937e8e39c2dae532cec (patch)
tree3981eb7892e717805edfd6c8181e197a5e1cea5a /src/backend/regex/regc_nfa.c
parentc56b2aa6efd1c0d090a9747a8220bf5110e9f9fd (diff)
downloadpostgresql-bb704a781ada30b34b377937e8e39c2dae532cec.tar.gz
postgresql-bb704a781ada30b34b377937e8e39c2dae532cec.zip
Add some more query-cancel checks to regular expression matching.
Commit 9662143f0c35d64d7042fbeaf879df8f0b54be32 added infrastructure to allow regular-expression operations to be terminated early in the event of SIGINT etc. However, fuzz testing by Greg Stark disclosed that there are still cases where regex compilation could run for a long time without noticing a cancel request. Specifically, the fixempties() phase never adds new states, only new arcs, so it doesn't hit the cancel check I'd put in newstate(). Add one to newarc() as well to cover that. Some experimentation of my own found that regex execution could also run for a long time despite a pending cancel. We'd put a high-level cancel check into cdissect(), but there was none inside the core text-matching routines longest() and shortest(). Ordinarily those inner loops are very very fast ... but in the presence of lookahead constraints, not so much. As a compromise, stick a cancel check into the stateset cache-miss function, which is enough to guarantee a cancel check at least once per lookahead constraint test. Making this work required more attention to error handling throughout the regex executor. Henry Spencer had apparently originally intended longest() and shortest() to be incapable of incurring errors while running, so neither they nor their subroutines had well-defined error reporting behaviors. However, that was already broken by the lookahead constraint feature, since lacon() can surely suffer an out-of-memory failure --- which, in the code as it stood, might never be reported to the user at all, but just silently be treated as a non-match of the lookahead constraint. Normalize all that by inserting explicit error tests as needed. I took the opportunity to add some more comments to the code, too. Back-patch to all supported branches, like the previous patch.
Diffstat (limited to 'src/backend/regex/regc_nfa.c')
-rw-r--r--src/backend/regex/regc_nfa.c13
1 files changed, 12 insertions, 1 deletions
diff --git a/src/backend/regex/regc_nfa.c b/src/backend/regex/regc_nfa.c
index 27998d688a8..e474e48c28c 100644
--- a/src/backend/regex/regc_nfa.c
+++ b/src/backend/regex/regc_nfa.c
@@ -180,7 +180,7 @@ newstate(struct nfa * nfa)
/*
* This is a handy place to check for operation cancel during regex
* compilation, since no code path will go very long without making a new
- * state.
+ * state or arc.
*/
if (CANCEL_REQUESTED(nfa->v->re))
{
@@ -333,6 +333,17 @@ newarc(struct nfa * nfa,
assert(from != NULL && to != NULL);
+ /*
+ * This is a handy place to check for operation cancel during regex
+ * compilation, since no code path will go very long without making a new
+ * state or arc.
+ */
+ if (CANCEL_REQUESTED(nfa->v->re))
+ {
+ NERR(REG_CANCEL);
+ return;
+ }
+
/* check for duplicates */
for (a = from->outs; a != NULL; a = a->outchain)
if (a->to == to && a->co == co && a->type == t)