aboutsummaryrefslogtreecommitdiff
path: root/src/backend/regex/rege_dfa.c
Commit message (Collapse)AuthorAge
* Add recursion depth protections to regular expression matching.Tom Lane2015-10-02
| | | | | | | | | | | | | | | Some of the functions in regex compilation and execution recurse, and therefore could in principle be driven to stack overflow. The Tcl crew has seen this happen in practice in duptraverse(), though their fix was to put in a hard-wired limit on the number of recursive levels, which is not too appetizing --- fortunately, we have enough infrastructure to check the actually available stack. Greg Stark has also seen it in other places while fuzz testing on a machine with limited stack space. Let's put guards in to prevent crashes in all these places. Since the regex code would leak memory if we simply threw elog(ERROR), we have to introduce an API that checks for stack depth without throwing such an error. Fortunately that's not difficult.
* Add some more query-cancel checks to regular expression matching.Tom Lane2015-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9662143f0c35d64d7042fbeaf879df8f0b54be32 added infrastructure to allow regular-expression operations to be terminated early in the event of SIGINT etc. However, fuzz testing by Greg Stark disclosed that there are still cases where regex compilation could run for a long time without noticing a cancel request. Specifically, the fixempties() phase never adds new states, only new arcs, so it doesn't hit the cancel check I'd put in newstate(). Add one to newarc() as well to cover that. Some experimentation of my own found that regex execution could also run for a long time despite a pending cancel. We'd put a high-level cancel check into cdissect(), but there was none inside the core text-matching routines longest() and shortest(). Ordinarily those inner loops are very very fast ... but in the presence of lookahead constraints, not so much. As a compromise, stick a cancel check into the stateset cache-miss function, which is enough to guarantee a cancel check at least once per lookahead constraint test. Making this work required more attention to error handling throughout the regex executor. Henry Spencer had apparently originally intended longest() and shortest() to be incapable of incurring errors while running, so neither they nor their subroutines had well-defined error reporting behaviors. However, that was already broken by the lookahead constraint feature, since lacon() can surely suffer an out-of-memory failure --- which, in the code as it stood, might never be reported to the user at all, but just silently be treated as a non-match of the lookahead constraint. Normalize all that by inserting explicit error tests as needed. I took the opportunity to add some more comments to the code, too. Back-patch to all supported branches, like the previous patch.
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Simplify and document regex library's compact-NFA representation.Tom Lane2012-07-07
| | | | | | | | | | | | | | | | | | The previous coding abused the first element of a cNFA state's arcs list to hold a per-state flag bit, which was confusing, undocumented, and not even particularly efficient. Get rid of that in favor of a separate "stflags" vector. Since there's only one bit in use, I chose to allocate a char per state; we could possibly replace this with a bitmap at some point, but that would make accesses a little slower. It's already about 8X smaller than before, so let's not get overly tense. Also document the representation better than it was before, which is to say not at all. This patch is a byproduct of investigations towards extracting a "fixed prefix" string from the compact-NFA representation of regex patterns. Might need to back-patch it if we decide to back-patch that fix, but for now it's just code cleanup so I'll just put it in HEAD.
* Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian2012-06-10
| | | | commit-fest.
* Sync regex code with Tcl 8.5.11.Tom Lane2012-02-17
| | | | | | | | | Sync our regex code with upstream changes since last time we did this, which was Tcl 8.5.0 (see commit df1e965e12cdd48c11057ee6e15346ee2b8b02f5). There are no functional changes here; the main point is just to lay down a commit-log marker that somebody has looked at this recently, and to do what we can to keep the two codebases comparable.
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Adjust some regex debugging printouts to not give wrong-format-widthTom Lane2007-10-06
| | | | | warnings on a 64-bit machine. Noted while chasing a recent regex bug report.
* Wording cleanup for error messages. Also change can't -> cannot.Bruce Momjian2007-02-01
| | | | | | | | | | | | | | Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".
* Standard pgindent run for 8.1.Bruce Momjian2005-10-15
|
* Clean up possibly-uninitialized-variable warnings reported by gcc 4.x.Tom Lane2005-09-24
|
* $Header: -> $PostgreSQL Changes ...PostgreSQL Daemon2003-11-29
|
* Another pgindent run with updated typedefs.Bruce Momjian2003-08-08
|
* pgindent run.Bruce Momjian2003-08-04
|
* Replace regular expression package with Henry Spencer's latest versionTom Lane2003-02-05
(extracted from Tcl 8.4.1 release, as Henry still hasn't got round to making it a separate library). This solves a performance problem for multibyte, as well as upgrading our regexp support to match recent Tcl and nearly match recent Perl.