diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2012-07-10 14:54:37 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2012-07-10 14:54:37 -0400 |
commit | 628cbb50ba80c83917b07a7609ddec12cda172d0 (patch) | |
tree | 7008492921c90e6de7c431633e33624a597a8416 /src/backend/regex/regc_color.c | |
parent | 00dac6000d422033c3e8d191f01ee0e6525794c2 (diff) | |
download | postgresql-628cbb50ba80c83917b07a7609ddec12cda172d0.tar.gz postgresql-628cbb50ba80c83917b07a7609ddec12cda172d0.zip |
Re-implement extraction of fixed prefixes from regular expressions.
To generate btree-indexable conditions from regex WHERE conditions (such as
WHERE indexed_col ~ '^foo'), we need to be able to identify any fixed
prefix that a regex might have; that is, find any string that must be a
prefix of all strings satisfying the regex. We used to do that with
entirely ad-hoc code that looked at the source text of the regex. It
didn't know very much about regex syntax, which mostly meant that it would
fail to identify some optimizable cases; but Viktor Rosenfeld reported that
it would produce actively wrong answers for quantified parenthesized
subexpressions, such as '^(foo)?bar'. Rather than trying to extend the
ad-hoc code to cover this, let's get rid of it altogether in favor of
identifying prefixes by examining the compiled form of a regex.
To do this, I've added a new entry point "pg_regprefix" to the regex library;
hopefully it is defined in a sufficiently general fashion that it can remain
in the library when/if that code gets split out as a standalone project.
Since this bug has been there for a very long time, this fix needs to get
back-patched. However it depends on some other recent commits (particularly
the addition of wchar-to-database-encoding conversion), so I'll commit this
separately and then go to work on back-porting the necessary fixes.
Diffstat (limited to 'src/backend/regex/regc_color.c')
-rw-r--r-- | src/backend/regex/regc_color.c | 11 |
1 files changed, 10 insertions, 1 deletions
diff --git a/src/backend/regex/regc_color.c b/src/backend/regex/regc_color.c index 2aeb861d976..1c60566fbf5 100644 --- a/src/backend/regex/regc_color.c +++ b/src/backend/regex/regc_color.c @@ -66,8 +66,9 @@ initcm(struct vars * v, cd = cm->cd; /* cm->cd[WHITE] */ cd->sub = NOSUB; cd->arcs = NULL; - cd->flags = 0; + cd->firstchr = CHR_MIN; cd->nchrs = CHR_MAX - CHR_MIN + 1; + cd->flags = 0; /* upper levels of tree */ for (t = &cm->tree[0], j = NBYTS - 1; j > 0; t = nextt, j--) @@ -272,6 +273,7 @@ newcolor(struct colormap * cm) cd->nchrs = 0; cd->sub = NOSUB; cd->arcs = NULL; + cd->firstchr = CHR_MIN; /* in case never set otherwise */ cd->flags = 0; cd->block = NULL; @@ -371,6 +373,8 @@ subcolor(struct colormap * cm, chr c) if (co == sco) /* already in an open subcolor */ return co; /* rest is redundant */ cm->cd[co].nchrs--; + if (cm->cd[sco].nchrs == 0) + cm->cd[sco].firstchr = c; cm->cd[sco].nchrs++; setcolor(cm, c, sco); return sco; @@ -438,6 +442,11 @@ subrange(struct vars * v, /* * subblock - allocate new subcolors for one tree block of chrs, fill in arcs + * + * Note: subcolors that are created during execution of this function + * will not be given a useful value of firstchr; it'll be left as CHR_MIN. + * For the current usage of firstchr in pg_regprefix, this does not matter + * because such subcolors won't occur in the common prefix of a regex. */ static void subblock(struct vars * v, |