aboutsummaryrefslogtreecommitdiff
path: root/src/backend/parser/parser.c
Commit message (Collapse)AuthorAge
* Replace the data structure used for keyword lookup.Tom Lane2019-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, ScanKeywordLookup was passed an array of string pointers. This had some performance deficiencies: the strings themselves might be scattered all over the place depending on the compiler (and some quick checking shows that at least with gcc-on-Linux, they indeed weren't reliably close together). That led to very cache-unfriendly behavior as the binary search touched strings in many different pages. Also, depending on the platform, the string pointers might need to be adjusted at program start, so that they couldn't be simple constant data. And the ScanKeyword struct had been designed with an eye to 32-bit machines originally; on 64-bit it requires 16 bytes per keyword, making it even more cache-unfriendly. Redesign so that the keyword strings themselves are allocated consecutively (as part of one big char-string constant), thereby eliminating the touch-lots-of-unrelated-pages syndrome. And get rid of the ScanKeyword array in favor of three separate arrays: uint16 offsets into the keyword array, uint16 token codes, and uint8 keyword categories. That reduces the overhead per keyword to 5 bytes instead of 16 (even less in programs that only need one of the token codes and categories); moreover, the binary search only touches the offsets array, further reducing its cache footprint. This also lets us put the token codes somewhere else than the keyword strings are, which avoids some unpleasant build dependencies. While we're at it, wrap the data used by ScanKeywordLookup into a struct that can be treated as an opaque type by most callers. That doesn't change things much right now, but it will make it less painful to switch to a hash-based lookup method, as is being discussed in the mailing list thread. Most of the change here is associated with adding a generator script that can build the new data structure from the same list-of-PG_KEYWORD header representation we used before. The PG_KEYWORD lists that plpgsql and ecpg used to embed in their scanner .c files have to be moved into headers, and the Makefiles have to be taught to invoke the generator script. This work is also necessary if we're to consider hash-based lookup, since the generator script is what would be responsible for constructing a hash table. Aside from saving a few kilobytes in each program that includes the keyword table, this seems to speed up raw parsing (flex+bison) by a few percent. So it's worth doing even as it stands, though we think we can gain even more with a follow-on patch to switch to hash-based lookup. John Naylor, with further hacking by me Discussion: https://postgr.es/m/CAJVSVGXdFVU2sgym89XPL=Lv1zOS5=EHHQ8XWNzFL=mTXkKMLw@mail.gmail.com
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Change representation of statement lists, and add statement location info.Tom Lane2017-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes several changes that improve the consistency of representation of lists of statements. It's always been the case that the output of parse analysis is a list of Query nodes, whatever the types of the individual statements in the list. This patch brings similar consistency to the outputs of raw parsing and planning steps: * The output of raw parsing is now always a list of RawStmt nodes; the statement-type-dependent nodes are one level down from that. * The output of pg_plan_queries() is now always a list of PlannedStmt nodes, even for utility statements. In the case of a utility statement, "planning" just consists of wrapping a CMD_UTILITY PlannedStmt around the utility node. This list representation is now used in Portal and CachedPlan plan lists, replacing the former convention of intermixing PlannedStmts with bare utility-statement nodes. Now, every list of statements has a consistent head-node type depending on how far along it is in processing. This allows changing many places that formerly used generic "Node *" pointers to use a more specific pointer type, thus reducing the number of IsA() tests and casts needed, as well as improving code clarity. Also, the post-parse-analysis representation of DECLARE CURSOR is changed so that it looks more like EXPLAIN, PREPARE, etc. That is, the contained SELECT remains a child of the DeclareCursorStmt rather than getting flipped around to be the other way. It's now true for both Query and PlannedStmt that utilityStmt is non-null if and only if commandType is CMD_UTILITY. That allows simplifying a lot of places that were testing both fields. (I think some of those were just defensive programming, but in many places, it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.) Because PlannedStmt carries a canSetTag field, we're also able to get rid of some ad-hoc rules about how to reconstruct canSetTag for a bare utility statement; specifically, the assumption that a utility is canSetTag if and only if it's the only one in its list. While I see no near-term need for relaxing that restriction, it's nice to get rid of the ad-hocery. The API of ProcessUtility() is changed so that what it's passed is the wrapper PlannedStmt not just the bare utility statement. This will affect all users of ProcessUtility_hook, but the changes are pretty trivial; see the affected contrib modules for examples of the minimum change needed. (Most compilers should give pointer-type-mismatch warnings for uncorrected code.) There's also a change in the API of ExplainOneQuery_hook, to pass through cursorOptions instead of expecting hook functions to know what to pick. This is needed because of the DECLARE CURSOR changes, but really should have been done in 9.6; it's unlikely that any extant hook functions know about using CURSOR_OPT_PARALLEL_OK. Finally, teach gram.y to save statement boundary locations in RawStmt nodes, and pass those through to Query and PlannedStmt nodes. This allows more intelligent handling of cases where a source query string contains multiple statements. This patch doesn't actually do anything with the information, but a follow-on patch will. (Passing this information through cleanly is the true motivation for these changes; while I think this is all good cleanup, it's unlikely we'd have bothered without this end goal.) catversion bump because addition of location fields to struct Query affects stored rules. This patch is by me, but it owes a good deal to Fabien Coelho who did a lot of preliminary work on the problem, and also reviewed the patch. Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Make operator precedence follow the SQL standard more closely.Tom Lane2015-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While the SQL standard is pretty vague on the overall topic of operator precedence (because it never presents a unified BNF for all expressions), it does seem reasonable to conclude from the spec for <boolean value expression> that OR has the lowest precedence, then AND, then NOT, then IS tests, then the six standard comparison operators, then everything else (since any non-boolean operator in a WHERE clause would need to be an argument of one of these). We were only sort of on board with that: most notably, while "<" ">" and "=" had properly low precedence, "<=" ">=" and "<>" were treated as generic operators and so had significantly higher precedence. And "IS" tests were even higher precedence than those, which is very clearly wrong per spec. Another problem was that "foo NOT SOMETHING bar" constructs, such as "x NOT LIKE y", were treated inconsistently because of a bison implementation artifact: they had the documented precedence with respect to operators to their right, but behaved like NOT (i.e., very low priority) with respect to operators to their left. Fixing the precedence issues is just a small matter of rearranging the precedence declarations in gram.y, except for the NOT problem, which requires adding an additional lookahead case in base_yylex() so that we can attach a different token precedence to NOT LIKE and allied two-word operators. The bulk of this patch is not the bug fix per se, but adding logic to parse_expr.c to allow giving warnings if an expression has changed meaning because of these precedence changes. These warnings are off by default and are enabled by the new GUC operator_precedence_warning. It's believed that very few applications will be affected by these changes, but it was agreed that a warning mechanism is essential to help debug any that are.
* Improve parser's one-extra-token lookahead mechanism.Tom Lane2015-02-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a couple of places in our grammar that fail to be strict LALR(1), by requiring more than a single token of lookahead to decide what to do. Up to now we've dealt with that by using a filter between the lexer and parser that merges adjacent tokens into one in the places where two tokens of lookahead are necessary. But that creates a number of user-visible anomalies, for instance that you can't name a CTE "ordinality" because "WITH ordinality AS ..." triggers folding of WITH and ORDINALITY into one token. I realized that there's a better way. In this patch, we still do the lookahead basically as before, but we never merge the second token into the first; we replace just the first token by a special lookahead symbol when one of the lookahead pairs is seen. This requires a couple extra productions in the grammar, but it involves fewer special tokens, so that the grammar tables come out a bit smaller than before. The filter logic is no slower than before, perhaps a bit faster. I also fixed the filter logic so that when backing up after a lookahead, the current token's terminator is correctly restored; this eliminates some weird behavior in error message issuance, as is shown by the one change in existing regression test outputs. I believe that this patch entirely eliminates odd behaviors caused by lookahead for WITH. It doesn't really improve the situation for NULLS followed by FIRST/LAST unfortunately: those sequences still act like a reserved word, even though there are cases where they should be seen as two ordinary identifiers, eg "SELECT nulls first FROM ...". I experimented with additional grammar hacks but couldn't find any simple solution for that. Still, this is better than before, and it seems much more likely that we *could* somehow solve the NULLS case on the basis of this filter behavior than the previous one.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Add SQL Standard WITH ORDINALITY support for UNNEST (and any other SRF)Greg Stark2013-07-29
| | | | | Author: Andrew Gierth, David Fetter Reviewers: Dean Rasheed, Jeevan Chalke, Stephen Frost
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Remove pg_parse_string_token() --- not needed anymore.Tom Lane2009-11-12
|
* Re-refactor the core scanner's API, in order to get out from under the problemTom Lane2009-11-09
| | | | | | | | | | | | | | | | | | of different parsers having different YYSTYPE unions that they want to use with it. I defined a new union core_YYSTYPE that is just the (very short) list of semantic values returned by the core scanner. I had originally worried that this would require an extra interface layer, but actually we can have parser.c's base_yylex (formerly filtered_base_yylex) take care of that at no extra cost. Names associated with the core scanner are now "core_yy_foo", with "base_yy_foo" being used in the core Bison parser and the parser.c interface layer. This solves the last serious stumbling block to eliminating plpgsql's separate lexer. One restriction that will still be present is that plpgsql and the core will have to agree on the token numbers assigned to tokens that can be returned by the core lexer. Since Bison doesn't seem willing to accept external assignments of those numbers, we'll have to live with decreeing that core and plpgsql grammars declare these tokens first and in the same order.
* Tweak the core scanner so that it can be used by plpgsql too.Tom Lane2009-07-14
| | | | | | | | | | | | | | Changes: Pass in the keyword lookup array instead of having it be hardwired. (This incidentally allows elimination of some duplicate coding in ecpg.) Re-order the token declarations in gram.y so that non-keyword tokens have numbers that won't change when keywords are added or removed. Add ".." and ":=" to the set of tokens recognized by scan.l. (Since these combinations are nowhere legal in core SQL, this does not change anything except the precise wording of the error you get when you write this.)
* Convert the core lexer and parser into fully reentrant code, by making useTom Lane2009-07-13
| | | | | | | | | | | of features added to flex and bison since this code was originally written. This change doesn't in itself offer any new capability, but it's needed infrastructure for planned improvements in plpgsql. Another feature now available in flex is the ability to make it use palloc instead of malloc, so do that to avoid possible memory leaks. (We should at some point change the other lexers likewise, but this commit doesn't touch them.)
* Move some declarations in the raw-parser header files to create a clearerTom Lane2009-07-12
| | | | | | | | | | distinction between the external API (parser.h) and declarations that only need to be visible within the raw parser code (gramparse.h, which now is only included by parser.c, gram.y, scan.l, and keywords.c). This is in preparation for the upcoming change to a reentrant lexer, which will require referencing YYSTYPE in the declarations of base_yylex and filtered_base_yylex, hence gram.h will have to be included by gramparse.h. We don't want any more files than absolutely necessary to depend on gram.h, so some cleanup is called for.
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Rethink the idea of having plpgsql depend on parser/gram.h. Aside from theTom Lane2009-04-19
| | | | | | | | | fact that this is breaking the MSVC build, it's probably not really a good idea to expand the dependencies of gram.h any further than the core parser; for instance the value of SCONST might depend on which bison version you'd built with. Better to expose an additional call point in parser.c, so move what I had put into pl_funcs.c into parser.c. Also PGDLLIMPORT'ify the reference to standard_conforming_strings, per buildfarm results.
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Add WITH [NO] DATA clause to CREATE TABLE AS, per SQL.Peter Eisentraut2008-10-28
| | | | | | | Also, since WITH is now a reserved word, simplify the token merging code to only deal with WITH_TIME. by Tom Lane and myself
* Remove all traces that suggest that a non-Bison yacc might be supported, andPeter Eisentraut2008-08-29
| | | | | change build system to use only Bison. Simplify build rules, make file names uniform. Don't build the token table header file where it is not needed.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LASTTom Lane2007-01-09
| | | | | | | | | | | | per-column options for btree indexes. The planner's support for this is still pretty rudimentary; it does not yet know how to plan mergejoins with nondefault ordering options. The documentation is pretty rudimentary, too. I'll work on improving that stuff later. Note incompatible change from prior behavior: ORDER BY ... USING will now be rejected if the operator is not a less-than or greater-than member of some btree opclass. This prevents less-than-sane behavior if an operator that doesn't actually define a proper sort ordering is selected.
* Fix filtered_base_yylex() to save and restore base_yylval and base_yyllocTom Lane2007-01-06
| | | | | properly when doing a lookahead. The lack of this was causing various interesting misbehaviors when one tries to use "with" as a plain identifier.
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-05
| | | | back-stamped for this.
* pgindent run for 8.2.Bruce Momjian2006-10-04
|
* Fix some missing inclusions identified with new pgcheckdefines tool.Tom Lane2006-07-15
|
* Re-introduce the yylex filter function formerly used to support UNIONTom Lane2006-05-27
| | | | | | | JOIN, which I removed in a recent fit of over-optimism that we wouldn't have any future use for it. Now it's needed to support disambiguating WITH CHECK OPTION from WITH TIME ZONE. As proof of concept, add stub grammar productions for WITH CHECK OPTION.
* Remove the stub support we had for UNION JOIN; per discussion, this isTom Lane2006-03-07
| | | | | | not likely ever to be implemented seeing it's been removed from SQL2003. This allows getting rid of the 'filter' version of yylex() that we had in parser.c, which should save at least a few microseconds in parsing.
* Update copyright for 2006. Update scripts.Bruce Momjian2006-03-05
|
* Tag appropriate files for rc3PostgreSQL Daemon2004-12-31
| | | | | | | | Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...
* Update copyright to 2004.Bruce Momjian2004-08-29
|
* Replace max_expr_depth parameter with a max_stack_depth parameter thatTom Lane2004-03-24
| | | | | | is measured in kilobytes and checked against actual physical execution stack depth, as per my proposal of 30-Dec. This gives us a fairly bulletproof defense against crashing due to runaway recursive functions.
* $Header: -> $PostgreSQL Changes ...PostgreSQL Daemon2003-11-29
|
* Update copyrights to 2003.Bruce Momjian2003-08-04
|
* Not sure why parser() was still doing clearerr(stdin) ... but it'sTom Lane2003-05-05
| | | | *got* to be pointless.
* Infrastructure for deducing Param types from context, in the same wayTom Lane2003-04-29
| | | | | | | | | | | that the types of untyped string-literal constants are deduced (ie, when coerce_type is applied to 'em, that's what the type must be). Remove the ancient hack of storing the input Param-types array as a global variable, and put the info into ParseState instead. This touches a lot of files because of adjustment of routine parameter lists, but it's really not a large patch. Note: PREPARE statement still insists on exact specification of parameter types, but that could easily be relaxed now, if we wanted to do so.
* Put back encoding-conversion step in processing of incoming queries;Tom Lane2003-04-27
| | | | | | | | | | I had inadvertently omitted it while rearranging things to support length-counted incoming messages. Also, change the parser's API back to accepting a 'char *' query string instead of 'StringInfo', as the latter wasn't buying us anything except overhead. (I think when I put it in I had some notion of making the parser API 8-bit-clean, but seeing that flex depends on null-terminated input, that's not really ever gonna happen.)
* pgindent run.Bruce Momjian2002-09-04
|
* PREPARE/EXECUTE statements. Patch by Neil Conway, some kibitzingTom Lane2002-08-27
| | | | from Tom Lane.
* Update copyright to 2002.Bruce Momjian2002-06-20
|
* Scanner performance improvementsPeter Eisentraut2002-04-20
| | | | | | | | Use flex flags -CF. Pass the to-be-scanned string around as StringInfo type, to avoid querying the length repeatedly. Clean up some code and remove lex-compatibility cruft. Escape backslash sequences inline. Use flex-provided yy_scan_buffer() function to set up input, rather than using myinput().
* New pgindent run with fixes suggested by Tom. Patch manually reviewed,Bruce Momjian2001-11-05
| | | | initdb/regression tests pass.