aboutsummaryrefslogtreecommitdiff
path: root/src/include/regex/regguts.h
Commit message (Collapse)AuthorAge
* Back-patch fix for extraction of fixed prefixes from regular expressions.Tom Lane2012-07-10
| | | | | | Back-patch of commits 628cbb50ba80c83917b07a7609ddec12cda172d0 and c6aae3042be5249e672b731ebeb21875b5343010. This has been broken since 7.3, so back-patch to all supported branches.
* Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian2012-06-10
| | | | commit-fest.
* Avoid repeated creation/freeing of per-subre DFAs during regex search.Tom Lane2012-02-24
| | | | | | | | | | | | In nested sub-regex trees, lower-level nodes created DFAs and then destroyed them again before exiting, which is a bit dumb considering that the recursive search is likely to call those nodes again later. Instead cache each created DFA until the end of pg_regexec(). This is basically a space for time tradeoff, in that it might increase the maximum memory usage. However, in most regex patterns there are not all that many subre nodes, so not that many DFAs --- and in any case, the peak usage occurs when reaching the bottom recursion level, and except for alternation cases that's going to be the same anyway.
* Remove useless "retry memory" logic within regex engine.Tom Lane2012-02-24
| | | | | | | | | | | | | | | | | | Apparently some primordial version of Spencer's engine needed cdissect() and child functions to be able to continue matching from a previous position when re-called. That is dead code, though, since trivial inspection shows that cdissect can never be entered without having previously done zapmem which resets the relevant retry counter. I have also verified experimentally that no case in the Tcl regression tests reaches cdissect with a nonzero retry value. Accordingly, remove that logic. This doesn't really save any noticeable number of cycles in itself, but it is one step towards making dissect() and cdissect() equivalent, which will allow removing hundreds of lines of near-duplicated code. Since struct subre's "retry" field is no longer particularly related to any kind of retry, rename it to "id". As of this commit it's only used for identifying a subre node in debug printouts, so you might think we should get rid of the field entirely; but I have a plan for another use.
* Fix the general case of quantified regex back-references.Tom Lane2012-02-24
| | | | | | | | | | | | | | Cases where a back-reference is part of a larger subexpression that is quantified have never worked in Spencer's regex engine, because he used a compile-time transformation that neglected the need to check the back-reference match in iterations before the last one. (That was okay for capturing parens, and we still do it if the regex has *only* capturing parens ... but it's not okay for backrefs.) To make this work properly, we have to add an "iteration" node type to the regex engine's vocabulary of sub-regex nodes. Since this is a moderately large change with a fair risk of introducing new bugs of its own, apply to HEAD only, even though it's a fix for a longstanding bug.
* Create the beginnings of internals documentation for the regex code.Tom Lane2012-02-19
| | | | | | | | | | Create src/backend/regex/README to hold an implementation overview of the regex package, and fill it in with some preliminary notes about the code's DFA/NFA processing and colormap management. Much more to do there of course. Also, improve some code comments around the colormap and cvec code. No functional changes except to add one missing assert.
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Sync our regex code with upstream changes since last time we did this, whichTom Lane2008-02-14
| | | | | | | | | | | | | was Tcl 8.4.8. The main changes are to remove the never-fully-implemented code for multi-character collating elements, and to const-ify some stuff a bit more fully. In combination with the recent security patch, this commit brings us into line with Tcl 8.5.0. Note that I didn't make any effort to duplicate a lot of cosmetic changes that they made to bring their copy into line with their own style guidelines, such as adding braces around single-line IF bodies. Most of those we either had done already (such as ANSI-fication of function headers) or there is no point because pgindent would undo the change anyway.
* Fix assorted security-grade bugs in the regex engine. All of these problemsTom Lane2008-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | are shared with Tcl, since it's their code to begin with, and the patches have been copied from Tcl 8.5.0. Problems: CVE-2007-4769: Inadequate check on the range of backref numbers allows crash due to out-of-bounds read. CVE-2007-4772: Infinite loop in regex optimizer for pattern '($|^)*'. CVE-2007-6067: Very slow optimizer cleanup for regex with a large NFA representation, as well as crash if we encounter an out-of-memory condition during NFA construction. Part of the response to CVE-2007-6067 is to put a limit on the number of states in the NFA representation of a regex. This seems needed even though the within-the-code problems have been corrected, since otherwise the code could try to use very large amounts of memory for a suitably-crafted regex, leading to potential DOS by driving the system into swap, activating a kernel OOM killer, etc. Although there are certainly plenty of ways to drive the system into effective DOS with poorly-written SQL queries, these problems seem worth treating as security issues because many applications might accept regex search patterns from untrustworthy sources. Thanks to Will Drewry of Google for reporting these problems. Patches by Will Drewry and Tom Lane. Security: CVE-2007-4769, CVE-2007-4772, CVE-2007-6067
* Standard pgindent run for 8.1.Bruce Momjian2005-10-15
|
* Add parentheses to macros when args are used in computations. WithoutBruce Momjian2005-05-25
| | | | them, the executation behavior could be unexpected.
* make sure the $Id tags are converted to $PostgreSQL as well ...PostgreSQL Daemon2003-11-29
|
* pgindent run.Bruce Momjian2003-08-04
|
* Replace regular expression package with Henry Spencer's latest versionTom Lane2003-02-05
(extracted from Tcl 8.4.1 release, as Henry still hasn't got round to making it a separate library). This solves a performance problem for multibyte, as well as upgrading our regexp support to match recent Tcl and nearly match recent Perl.