| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
| |
After this, the PostgreSQL lexers no longer accept numeric literals
with trailing non-digits, such as 123abc, which would be scanned as
two tokens: 123 and abc. This is undocumented and surprising, and it
might also interfere with some extended numeric literal syntax being
contemplated for the future.
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
|
|
|
|
| |
Backpatch-through: 10
|
|
|
|
|
| |
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Several places were performing a tight loop to determine the first power
of 2 number that's > or >= the required memory. Instead of using a loop
for that, we can use pg_nextpower2_32 or pg_nextpower2_64. When we need a
power of 2 number equal to or greater than a given amount, we just pass
the amount to the nextpower2 function. When we need a power of 2 greater
than the amount, we just pass the amount + 1.
Additionally, in tsearch there were a couple of locations that were
performing a while loop when a simple "if" would have done. In both of
these locations only 1 item is being added, so the loop could only have
ever iterated once. Changing the loop into an if statement makes the code
very slightly more optimal as the condition is checked once rather than
twice.
There are quite a few remaining locations that increase the size of the
buffer in the following form:
while (reqsize >= buflen)
{
buflen *= 2;
buf = repalloc(buf, buflen);
}
These are not touched in this commit. repalloc will error out for sizes
larger than MaxAllocSize. Changing these to use pg_nextpower2_32 would
remove the chance of that error being raised. It's unclear from the code
if the sizes could ever become that large, so err on the side of caution.
Discussion: https://postgr.es/m/CAApHDvp=tns7RL4PH0ZR0M+M-YFLquK7218x=0B_zO+DbOma+w@mail.gmail.com
Reviewed-by: Zhihong Yu
|
|
|
|
| |
Backpatch-through: 9.5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Up to now, if you tried to omit "AS" before a column label in a SELECT
list, it would only work if the column label was an IDENT, that is not
any known keyword. This is rather unfriendly considering that we have
so many keywords and are constantly growing more. In the wake of commit
1ed6b8956 it's possible to improve matters quite a bit.
We'd originally tried to make this work by having some of the existing
keyword categories be allowed without AS, but that didn't work too well,
because each category contains a few special cases that don't work
without AS. Instead, invent an entirely orthogonal keyword property
"can be bare column label", and mark all keywords that way for which
we don't get shift/reduce errors by doing so.
It turns out that of our 450 current keywords, all but 39 can be made
bare column labels, improving the situation by over 90%. This number
might move around a little depending on future grammar work, but it's
a pretty nice improvement.
Mark Dilger, based on work by myself and Robert Haas;
review by John Naylor
Discussion: https://postgr.es/m/38ca86db-42ab-9b48-2902-337a0d6b8311@2ndquadrant.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts the parts of commit 17a28b03645e27d73bf69a95d7569b61e58f06eb
that changed ereport's auxiliary functions from returning dummy integer
values to returning void. It turns out that a minority of compilers
complain (not entirely unreasonably) about constructs such as
(condition) ? errdetail(...) : 0
if errdetail() returns void rather than int. We could update those
call sites to say "(void) 0" perhaps, but the expectation for this
patch set was that ereport callers would not have to change anything.
And this aspect of the patch set was already the most invasive and
least compelling part of it, so let's just drop it.
Per buildfarm.
Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change all the auxiliary error-reporting routines to return void,
now that we no longer need to pretend they are passing something
useful to errfinish(). While this probably doesn't save anything
significant at the machine-code level, it allows detection of some
additional types of mistakes.
Pass the error location details (__FILE__, __LINE__, PG_FUNCNAME_MACRO)
to errfinish not errstart. This shaves a few cycles off the case where
errstart decides we're not going to emit anything.
Re-implement elog() as a trivial wrapper around ereport(), removing
the separate support infrastructure it used to have. Aside from
getting rid of some now-surplus code, this means that elog() now
really does have exactly the same semantics as ereport(), in particular
that it can skip evaluation work if the message is not to be emitted.
Andres Freund and Tom Lane
Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SQL includes provisions for numeric Unicode escapes in string
literals and identifiers. Previously we only accepted those
if they represented ASCII characters or the server encoding
was UTF-8, making the conversion to internal form trivial.
This patch adjusts things so that we'll call the appropriate
encoding conversion function in less-trivial cases, allowing
the escape sequence to be accepted so long as it corresponds
to some character available in the server encoding.
This also applies to processing of Unicode escapes in JSONB.
However, the old restriction still applies to client-side
JSON processing, since that hasn't got access to the server's
encoding conversion infrastructure.
This patch includes some lexer infrastructure that simplifies
throwing errors with error cursors pointing into the middle of
a string (or other complex token). For the moment I only used
it for errors relating to Unicode escapes, but we might later
expand the usage to some other cases.
Patch by me, reviewed by John Naylor.
Discussion: https://postgr.es/m/2393.1578958316@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the core scanner's yy_transition[] array had 37045 elements.
Since that number is larger than INT16_MAX, Flex generated the array to
contain 32-bit integers. By reimplementing some of the bulkier scanner
rules, this patch reduces the array to 20495 elements. The much smaller
total length, combined with the consequent use of 16-bit integers for
the array elements reduces the binary size by over 200kB. This was
accomplished in two ways:
1. Consolidate handling of quote continuations into a new start condition,
rather than duplicating that logic for five different string types.
2. Treat Unicode strings and identifiers followed by a UESCAPE sequence
as three separate tokens, rather than one. The logic to de-escape
Unicode strings is moved to the filter code in parser.c, which already
had the ability to provide special processing for token sequences.
While we could have implemented the conversion in the grammar, that
approach was rejected for performance and maintainability reasons.
Performance in microbenchmarks of raw parsing seems equal or slightly
faster in most cases, and it's reasonable to expect that in real-world
usage (with more competition for the CPU cache) there will be a larger
win. The exception is UESCAPE sequences; lexing those is about 10%
slower, primarily because the scanner now has to be called three times
rather than one. This seems acceptable since that feature is very
rarely used.
The psql and epcg lexers are likewise modified, primarily because we
want to keep them all in sync. Since those lexers don't use the
space-hogging -CF option, the space savings is much less, but it's
still good for perhaps 10kB apiece.
While at it, merge the ecpg lexer's handling of C-style comments used
in SQL and in C. Those have different rules regarding nested comments,
but since we already have the ability to keep track of the previous
start condition, we can use that to handle both cases within a single
start condition. This matches the core scanner more closely.
John Naylor
Discussion: https://postgr.es/m/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw@mail.gmail.com
|
|
|
|
| |
Backpatch-through: update all files in master, backpatch legal files through 9.4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, ScanKeywordLookup was passed an array of string pointers.
This had some performance deficiencies: the strings themselves might
be scattered all over the place depending on the compiler (and some
quick checking shows that at least with gcc-on-Linux, they indeed
weren't reliably close together). That led to very cache-unfriendly
behavior as the binary search touched strings in many different pages.
Also, depending on the platform, the string pointers might need to
be adjusted at program start, so that they couldn't be simple constant
data. And the ScanKeyword struct had been designed with an eye to
32-bit machines originally; on 64-bit it requires 16 bytes per
keyword, making it even more cache-unfriendly.
Redesign so that the keyword strings themselves are allocated
consecutively (as part of one big char-string constant), thereby
eliminating the touch-lots-of-unrelated-pages syndrome. And get
rid of the ScanKeyword array in favor of three separate arrays:
uint16 offsets into the keyword array, uint16 token codes, and
uint8 keyword categories. That reduces the overhead per keyword
to 5 bytes instead of 16 (even less in programs that only need
one of the token codes and categories); moreover, the binary search
only touches the offsets array, further reducing its cache footprint.
This also lets us put the token codes somewhere else than the
keyword strings are, which avoids some unpleasant build dependencies.
While we're at it, wrap the data used by ScanKeywordLookup into
a struct that can be treated as an opaque type by most callers.
That doesn't change things much right now, but it will make it
less painful to switch to a hash-based lookup method, as is being
discussed in the mailing list thread.
Most of the change here is associated with adding a generator
script that can build the new data structure from the same
list-of-PG_KEYWORD header representation we used before.
The PG_KEYWORD lists that plpgsql and ecpg used to embed in
their scanner .c files have to be moved into headers, and the
Makefiles have to be taught to invoke the generator script.
This work is also necessary if we're to consider hash-based lookup,
since the generator script is what would be responsible for
constructing a hash table.
Aside from saving a few kilobytes in each program that includes
the keyword table, this seems to speed up raw parsing (flex+bison)
by a few percent. So it's worth doing even as it stands, though
we think we can gain even more with a follow-on patch to switch
to hash-based lookup.
John Naylor, with further hacking by me
Discussion: https://postgr.es/m/CAJVSVGXdFVU2sgym89XPL=Lv1zOS5=EHHQ8XWNzFL=mTXkKMLw@mail.gmail.com
|
|
|
|
| |
Backpatch-through: certain files through 9.4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The realfail1 and realfail2 backup-prevention rules always returned
token type FCONST, ignoring the possibility that what we've scanned
is more appropriately described as ICONST. I think that at the
time that code was added, it might actually have been safe to not
distinguish; but since we started allowing AS-less aliases in SELECT
target lists, it's definitely legal to have a number immediately
followed by an identifier.
In the SELECT case, it seems there's no visible consequence because
make_const() will change the type back to integer anyway. But I'm
worried that there are other contexts, or will be in future, where
it's more important to get the constant's type right.
Hence, use process_integer_literal to correctly determine which
token type to return.
Arguably this is a bug fix, but given the lack of evidence of
user-visible problems, I'll refrain from back-patching.
Discussion: https://postgr.es/m/21364.1542136808@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make a bunch of basically-cosmetic changes to reduce the diffs between
the flex rules in scan.l, psqlscan.l, and pgc.l. Reorder some code,
adjust a lot of whitespace, sync some comments, make use of flex start
condition scopes to do that.
There are a few non-cosmetic changes in the ECPG lexer:
* Bring over the decimalfail rule (and support function
process_integer_literal) so that ECPG will lex "1..10" into
the same tokens as the backend would. I'm not sure this makes any
visible difference to users, but I'm not sure it doesn't, either.
* <xdc><<EOF>> gets its own rule so as to produce a more on-point
error message.
* Remove duplicate <SQL>{xdstart} rule.
John Naylor, with a few additional changes by me
Discussion: https://postgr.es/m/CAJVSVGWGqY9YBs2EwtRUkbNv=hXkN8yRPOoD1wxE6COgvvrz5g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commits c6b3c939b (which fixed the precedence of >=, <=, <> operators)
and 865f14a2d (which added support for the standard => notation for
named arguments) created a class of lexer tokens which look like
multi-character operators but which have their own token IDs distinct
from Op. However, longest-match rules meant that following any of
these tokens with another operator character, as in (1<>-1), would
cause them to be incorrectly returned as Op.
The error here isn't immediately obvious, because the parser would
usually still find the correct operator via the Op token, but there
were more subtle problems:
1. If immediately followed by a comment or +-, >= <= <> would be given
the old precedence of Op rather than the correct new precedence;
2. If followed by a comment, != would be returned as Op rather than as
NOT_EQUAL, causing it not to be found at all;
3. If followed by a comment or +-, the => token for named arguments
would be lexed as Op, causing the argument to be mis-parsed as a
simple expression, usually causing an error.
Fix by explicitly checking for the operators in the {operator} code
block in addition to all the existing special cases there.
Backpatch to 9.5 where the problem was introduced.
Analysis and patch by me; review by Tom Lane.
Discussion: https://postgr.es/m/87va851ppl.fsf@news-spur.riddles.org.uk
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lexer's handling of operators contained an O(N^3) hazard when
dealing with long strings of + or - characters; it seems hard to
prevent this case from being O(N^2), but the additional N multiplier
was not needed.
Backpatch all the way since this has been there since 7.x, and it
presents at least a mild hazard in that trying to do Bind, PREPARE or
EXPLAIN on a hostile query could take excessive time (without
honouring cancels or timeouts) even if the query was never executed.
|
|
|
|
|
|
|
| |
Several places used similar code to convert a string to an int, so take
the function that we already had and make it globally available.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
|
|
|
|
|
|
|
|
|
| |
A Value node would store an integer as a long. This causes needless
portability risks, as long can be of varying sizes. Change it to use
int instead. All code using this was already careful to only store
32-bit values anyway.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
|
|
|
|
| |
Backpatch-through: certain files through 9.3
|
|
|
|
|
|
|
|
|
|
| |
Flex generates a lot of functions that are not actually used. In order
to avoid coverage figures being ruined by that, mark up the part of the
.l files where the generated code appears by lcov exclusion markers.
That way, lcov will typically only reported on coverage for the .l file,
which is under our control, but not for the .c file.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The U&'...' and U&"..." syntaxes silently discarded a surrogate pair
start (that is, a code between U+D800 and U+DBFF) if it occurred at
the very end of the string. This seems like an obvious oversight,
since we throw an error for every other invalid combination of surrogate
characters, including the very same situation in E'...' syntax.
This has been wrong since the pair processing was added (in 9.0),
so back-patch to all supported branches.
Discussion: https://postgr.es/m/19113.1482337898@sss.pgh.pa.us
|
|
|
|
|
|
| |
Reported-by: David G. Johnston
Discussion: CAKFQuwZZvnxwSq9tNtvL+uyuDKGgV91zR_agtPxQHRWMWQRP8g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
This completes (at least for now) the project of getting rid of ad-hoc
linkages among the src/bin/ subdirectories. Everything they share is now
in src/fe_utils/ and is included from a static library at link time.
A side benefit is that we can restore the FLEX_NO_BACKUP check for
psqlscanslash.l. We might need to think of another way to do that check
if we ever need to build two lexers with that property in the same source
directory, but there's no foreseeable reason to need that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make some minor formatting adjustments to make it easier to diff these
files and see that they indeed implement the same flex rules (at least
to the extent that we want them to be the same).
(Someday it'd be nice to make ecpg's pgc.l more easily diff'able too,
but today is not that day.)
Also run relevant parts of these files and psqlscanslash.l through
pgindent.
No actual behavioral changes here, just obsessive neatnik-ism.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we know about the %top{} trick, we can revert to building flex
lexers as separate .o files. This is worth doing for a couple of reasons
besides sheer cleanliness. We can narrow the scope of the -Wno-error flag
that's forced on scan.c. Also, since these grammar and lexer files are
so large, splitting them into separate build targets should have some
advantages in build speed, particularly in parallel or ccache'd builds.
We have quite a few other .l files that could be changed likewise, but the
above arguments don't apply to them, so the benefit of fixing them seems
pretty minimal. Leave the rest for some other day.
|
|
|
|
| |
Backpatch certain files through 9.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While the SQL standard is pretty vague on the overall topic of operator
precedence (because it never presents a unified BNF for all expressions),
it does seem reasonable to conclude from the spec for <boolean value
expression> that OR has the lowest precedence, then AND, then NOT, then IS
tests, then the six standard comparison operators, then everything else
(since any non-boolean operator in a WHERE clause would need to be an
argument of one of these).
We were only sort of on board with that: most notably, while "<" ">" and
"=" had properly low precedence, "<=" ">=" and "<>" were treated as generic
operators and so had significantly higher precedence. And "IS" tests were
even higher precedence than those, which is very clearly wrong per spec.
Another problem was that "foo NOT SOMETHING bar" constructs, such as
"x NOT LIKE y", were treated inconsistently because of a bison
implementation artifact: they had the documented precedence with respect
to operators to their right, but behaved like NOT (i.e., very low priority)
with respect to operators to their left.
Fixing the precedence issues is just a small matter of rearranging the
precedence declarations in gram.y, except for the NOT problem, which
requires adding an additional lookahead case in base_yylex() so that we
can attach a different token precedence to NOT LIKE and allied two-word
operators.
The bulk of this patch is not the bug fix per se, but adding logic to
parse_expr.c to allow giving warnings if an expression has changed meaning
because of these precedence changes. These warnings are off by default
and are enabled by the new GUC operator_precedence_warning. It's believed
that very few applications will be affected by these changes, but it was
agreed that a warning mechanism is essential to help debug any that are.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SQL has standardized on => as the use of to specify named parameters,
and we've wanted for many years to support the same syntax ourselves,
but this has been complicated by the possible use of => as an operator
name. In PostgreSQL 9.0, we began emitting a warning when an operator
named => was defined, and in PostgreSQL 9.2, we stopped shipping a
=>(text, text) operator as part of hstore. By the time the next major
version of PostgreSQL is released, => will have been deprecated for a
full five years, so hopefully there won't be too many people still
relying on it. We continue to support := for compatibility with
previous PostgreSQL releases.
Pavel Stehule, reviewed by Petr Jelinek, with a few documentation
tweaks by me.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
contrib/pg_stat_statements will sometimes run the core lexer a second time
on submitted statements. Formerly, if you had standard_conforming_strings
turned off, this led to sometimes getting two copies of any warnings
enabled by escape_string_warning. While this is probably no longer a big
deal in the field, it's a pain for regression testing.
To fix, change the lexer so it doesn't consult the escape_string_warning
GUC variable directly, but looks at a copy in the core_yy_extra_type state
struct. Then, pg_stat_statements can change that copy to disable warnings
while it's redoing the lexing.
It seemed like a good idea to make this happen for all three of the GUCs
consulted by the lexer, not just escape_string_warning. There's not an
immediate use-case for callers to adjust the other two AFAIK, but making
it possible is easy enough and seems like good future-proofing.
Arguably this is a bug fix, but there doesn't seem to be enough interest to
justify a back-patch. We'd not be able to back-patch exactly as-is anyway,
for fear of breaking ABI compatibility of the struct. (We could perhaps
back-patch the addition of only escape_string_warning by adding it at the
end of the struct, where there's currently alignment padding space.)
|
|
|
|
| |
Backpatch certain files through 9.0
|
|
|
|
|
|
|
|
|
|
|
| |
We used the length of the input string, not the de-escaped string, as
the trigger for NAMEDATALEN truncation. AFAICS this would only result
in sometimes printing a phony truncation warning; but it's just luck
that there was no worse problem, since we were violating the API spec
for truncate_identifier(). Per bug #9204 from Joshua Yanovski.
This has been wrong since the Unicode-identifier support was added,
so back-patch to all supported branches.
|
|
|
|
|
|
|
| |
Commit a5ff502fceadc7c203b0d7a11b45c73f1b421f69 was a brick shy of a load
in the backend lexer too, not just psql. Per further testing of bug #9068.
In passing, improve related comments.
|
|
|
|
|
| |
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
|
|
|
|
|
|
|
| |
The error rule used to avoid backtracking with the U&'...' UESCAPE 'x'
syntax bloated the flex tables, so refactor that. This patch makes the error
rule shorter, by introducing a new exclusive flex state that's entered after
parsing U&'...'. This shrinks the postgres binary by about 220kB.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit 71450d7fd6c7cf7b3e38ac56e363bff6a681973c, we added code to inform
suitably-intelligent compilers that ereport() doesn't return if the elevel
is ERROR or higher. This patch extends that to elog(), and also fixes a
double-evaluation hazard that the previous commit created in ereport(),
as well as reducing the emitted code size.
The elog() improvement requires the compiler to support __VA_ARGS__, which
should be available in just about anything nowadays since it's required by
C99. But our minimum language baseline is still C89, so add a configure
test for that.
The previous commit assumed that ereport's elevel could be evaluated twice,
which isn't terribly safe --- there are already counterexamples in xlog.c.
On compilers that have __builtin_constant_p, we can use that to protect the
second test, since there's no possible optimization gain if the compiler
doesn't know the value of elevel. Otherwise, use a local variable inside
the macros to prevent double evaluation. The local-variable solution is
inferior because (a) it leads to useless code being emitted when elevel
isn't constant, and (b) it increases the optimization level needed for the
compiler to recognize that subsequent code is unreachable. But it seems
better than not teaching non-gcc compilers about unreachability at all.
Lastly, if the compiler has __builtin_unreachable(), we can use that
instead of abort(), resulting in a noticeable code savings since no
function call is actually emitted. However, it seems wise to do this only
in non-assert builds. In an assert build, continue to use abort(), so that
the behavior will be predictable and debuggable if the "impossible"
happens.
These changes involve making the ereport and elog macros emit do-while
statement blocks not just expressions, which forces small changes in
a few call sites.
Andres Freund, Tom Lane, Heikki Linnakangas
|
|
|
|
|
| |
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
|
| |
|
|
|
|
|
| |
Per discussion, we should enforce the policy of "no backtracking" in these
performance-sensitive scanners.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This change should be publicized to driver maintainers at once and
release-noted as an incompatibility with previous releases.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"val AS name" to "name := val", as per recent discussion.
This patch catches everything in the original named-parameters patch,
but I'm not certain that no other dependencies snuck in later (grepping
the source tree for all uses of AS soon proved unworkable).
In passing I note that we've dropped the ball at least once on keeping
ecpg's lexer (as opposed to parser) in sync with the backend. It would
be a good idea to go through all of pgc.l and see if it's in sync now.
I didn't attempt that at the moment.
|
|
|
|
|
| |
argument, per warnings from buildfarm member pika. Also clean up code
formatting a trifle.
|
| |
|
|
|
|
|
|
|
| |
directly. This was a lot of trouble, but should be worth it in terms of
not having to keep the plpgsql lexer in step with core anymore. In addition
the handling of keywords is significantly better-structured, allowing us to
de-reserve a number of words that plpgsql formerly treated as reserved.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of different parsers having different YYSTYPE unions that they want to use
with it. I defined a new union core_YYSTYPE that is just the (very short)
list of semantic values returned by the core scanner. I had originally
worried that this would require an extra interface layer, but actually we can
have parser.c's base_yylex (formerly filtered_base_yylex) take care of that at
no extra cost. Names associated with the core scanner are now "core_yy_foo",
with "base_yy_foo" being used in the core Bison parser and the parser.c
interface layer.
This solves the last serious stumbling block to eliminating plpgsql's separate
lexer. One restriction that will still be present is that plpgsql and the
core will have to agree on the token numbers assigned to tokens that can be
returned by the core lexer. Since Bison doesn't seem willing to accept
external assignments of those numbers, we'll have to live with decreeing that
core and plpgsql grammars declare these tokens first and in the same order.
|
|
|
|
| |
Marko Kreen, Tom Lane
|