aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor
Commit message (Collapse)AuthorAge
* Generate fmgr prototypes automaticallyPeter Eisentraut2017-01-17
| | | | | | | | | | | | Gen_fmgrtab.pl creates a new file fmgrprotos.h, which contains prototypes for all functions registered in pg_proc.h. This avoids having to manually maintain these prototypes across a random variety of header files. It also automatically enforces a correct function signature, and since there are warnings about missing prototypes, it will detect functions that are defined but not registered in pg_proc.h (or otherwise used). Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
* Change representation of statement lists, and add statement location info.Tom Lane2017-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes several changes that improve the consistency of representation of lists of statements. It's always been the case that the output of parse analysis is a list of Query nodes, whatever the types of the individual statements in the list. This patch brings similar consistency to the outputs of raw parsing and planning steps: * The output of raw parsing is now always a list of RawStmt nodes; the statement-type-dependent nodes are one level down from that. * The output of pg_plan_queries() is now always a list of PlannedStmt nodes, even for utility statements. In the case of a utility statement, "planning" just consists of wrapping a CMD_UTILITY PlannedStmt around the utility node. This list representation is now used in Portal and CachedPlan plan lists, replacing the former convention of intermixing PlannedStmts with bare utility-statement nodes. Now, every list of statements has a consistent head-node type depending on how far along it is in processing. This allows changing many places that formerly used generic "Node *" pointers to use a more specific pointer type, thus reducing the number of IsA() tests and casts needed, as well as improving code clarity. Also, the post-parse-analysis representation of DECLARE CURSOR is changed so that it looks more like EXPLAIN, PREPARE, etc. That is, the contained SELECT remains a child of the DeclareCursorStmt rather than getting flipped around to be the other way. It's now true for both Query and PlannedStmt that utilityStmt is non-null if and only if commandType is CMD_UTILITY. That allows simplifying a lot of places that were testing both fields. (I think some of those were just defensive programming, but in many places, it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.) Because PlannedStmt carries a canSetTag field, we're also able to get rid of some ad-hoc rules about how to reconstruct canSetTag for a bare utility statement; specifically, the assumption that a utility is canSetTag if and only if it's the only one in its list. While I see no near-term need for relaxing that restriction, it's nice to get rid of the ad-hocery. The API of ProcessUtility() is changed so that what it's passed is the wrapper PlannedStmt not just the bare utility statement. This will affect all users of ProcessUtility_hook, but the changes are pretty trivial; see the affected contrib modules for examples of the minimum change needed. (Most compilers should give pointer-type-mismatch warnings for uncorrected code.) There's also a change in the API of ExplainOneQuery_hook, to pass through cursorOptions instead of expecting hook functions to know what to pick. This is needed because of the DECLARE CURSOR changes, but really should have been done in 9.6; it's unlikely that any extant hook functions know about using CURSOR_OPT_PARALLEL_OK. Finally, teach gram.y to save statement boundary locations in RawStmt nodes, and pass those through to Query and PlannedStmt nodes. This allows more intelligent handling of cases where a source query string contains multiple statements. This patch doesn't actually do anything with the information, but a follow-on patch will. (Passing this information through cleanly is the true motivation for these changes; while I think this is all good cleanup, it's unlikely we'd have bothered without this end goal.) catversion bump because addition of location fields to struct Query affects stored rules. This patch is by me, but it owes a good deal to Fabien Coelho who did a lot of preliminary work on the problem, and also reviewed the patch. Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre
* Throw suitable error for COPY TO STDOUT/FROM STDIN in a SQL function.Tom Lane2017-01-14
| | | | | | | | | | | | | | | | | A client copy can't work inside a function because the FE/BE wire protocol doesn't support nesting of a COPY operation within query results. (Maybe it could, but the protocol spec doesn't suggest that clients should support this, and libpq for one certainly doesn't.) In most PLs, this prohibition is enforced by spi.c, but SQL functions don't use SPI. A comparison of _SPI_execute_plan() and init_execution_state() shows that rejecting client COPY is the only discrepancy in what they allow, so there's no other similar bugs. This is an astonishingly ancient oversight, so back-patch to all supported branches. Report: https://postgr.es/m/BY2PR05MB2309EABA3DEFA0143F50F0D593780@BY2PR05MB2309.namprd05.prod.outlook.com
* Fix incorrect function name in comment.Robert Haas2017-01-12
| | | | Amit Langote
* Repair commit b81b5a96f424531b97cdd1dba97d9d1b9c9d372e.Robert Haas2017-01-06
| | | | | | This commit purported to use a variable hash seed for Partial HashAggregate, but actually did the opposite - it made us use a variable seed for any HashAggregate that is NOT partial. Woops.
* Fix possible crash reading pg_stat_activity.Robert Haas2017-01-05
| | | | | | | | | | With the old code, a backend that read pg_stat_activity without ever having executed a parallel query might see a backend in the midst of executing one waiting on a DSA LWLock, resulting in a crash. The solution is for backends to register the tranche at startup time, not the first time a parallel query is executed. Report by Andreas Seltenreich. Patch by me, reviewed by Thomas Munro.
* Remove unnecessary arguments from partitioning functions.Robert Haas2017-01-04
| | | | | | | RelationGetPartitionQual() and generate_partition_qual() are always called with recurse = true, so we don't need an argument for that. Extracted by me from a larger patch by Amit Langote.
* Fix reporting of constraint violations for table partitioning.Robert Haas2017-01-04
| | | | | | | | | After a tuple is routed to a partition, it has been converted from the root table's row type to the partition's row type. ExecConstraints needs to report the failure using the original tuple and the parent's tuple descriptor rather than the ones for the selected partition. Amit Langote
* Move partition_tuple_slot out of EState.Robert Haas2017-01-04
| | | | | | | | | | | Commit 2ac3ef7a01df859c62d0a02333b646d65eaec5ff added a TupleTapleSlot for partition tuple slot to EState (es_partition_tuple_slot) but it's more logical to have it as part of ModifyTableState (mt_partition_tuple_slot) and CopyState (partition_tuple_slot). Discussion: http://postgr.es/m/1bd459d9-4c0c-197a-346e-e5e59e217d97@lab.ntt.co.jp Amit Langote, per a gripe from me
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Remove unnecessary casts of makeNode() resultPeter Eisentraut2016-12-23
| | | | | makeNode() is already a macro that has the right result pointer type, so casting it again to the same type is unnecessary.
* Spellcheck: s/descendent/descendant/gTom Lane2016-12-23
| | | | | | | | | | I got a little annoyed by reading documentation paragraphs containing both spellings within a few lines of each other. My dictionary says "descendant" is the preferred spelling, and it's certainly the majority usage in our tree, so standardize on that. For one usage in parallel.sgml, I thought it better to rewrite to avoid the term altogether.
* Fix tuple routing in cases where tuple descriptors don't match.Robert Haas2016-12-22
| | | | | | | | | | | | | | | The previous coding failed to work correctly when we have a multi-level partitioned hierarchy where tables at successive levels have different attribute numbers for the partition key attributes. To fix, have each PartitionDispatch object store a standalone TupleTableSlot initialized with the TupleDesc of the corresponding partitioned table, along with a TupleConversionMap to map tuples from the its parent's rowtype to own rowtype. After tuple routing chooses a leaf partition, we must use the leaf partition's tuple descriptor, not the root table's. To that end, a dedicated TupleTableSlot for tuple routing is now allocated in EState. Amit Langote
* Fix handling of expanded objects in CoerceToDomain and CASE execution.Tom Lane2016-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | When the input value to a CoerceToDomain expression node is a read-write expanded datum, we should pass a read-only pointer to any domain CHECK expressions and then return the original read-write pointer as the expression result. Previously we were blindly passing the same pointer to all the consumers of the value, making it possible for a function in CHECK to modify or even delete the expanded value. (Since a plpgsql function will absorb a passed-in read-write expanded array as a local variable value, it will in fact delete the value on exit.) A similar hazard of passing the same read-write pointer to multiple consumers exists in domain_check() and in ExecEvalCase, so fix those too. The fix requires adding MakeExpandedObjectReadOnly calls at the appropriate places, which is simple enough except that we need to get the data type's typlen from somewhere. For the domain cases, solve this by redefining DomainConstraintRef.tcache as okay for callers to access; there wasn't any reason for the original convention against that, other than not wanting the API of typcache.c to be any wider than it had to be. For CASE, there's no good solution except to add a syscache lookup during executor start. Per bug #14472 from Marcos Castedo. Back-patch to 9.5 where expanded values were introduced. Discussion: https://postgr.es/m/15225.1482431619@sss.pgh.pa.us
* Refactor partition tuple routing code to reduce duplication.Robert Haas2016-12-21
| | | | Amit Langote
* Fix minor oversights in nodeAgg.c.Tom Lane2016-12-20
| | | | | | | | | | | | | aggstate->evalproj is always set up by ExecInitAgg, so there's no need to test. Doing so led Coverity to think that we might be intending "slot" to be possibly NULL here, and it quite properly complained that the rest of combine_aggregates() wasn't prepared for that. Also fix a couple of obvious thinkos in Asserts checking that "inputoff" isn't past the end of the slot. Errors introduced in commit 8ed3f11bb, so no need for back-patch.
* Fix sharing Agg transition state of DISTINCT or ordered aggs.Heikki Linnakangas2016-12-20
| | | | | | | | | | | | | | | If a query contained two aggregates that could share the transition value, we would correctly collect the input into a tuplesort only once, but incorrectly run the transition function over the accumulated input twice, in finalize_aggregates(). That caused a crash, when we tried to call tuplesort_performsort() on an already-freed NULL tuplestore. Backport to 9.6, where sharing of transition state and this bug were introduced. Analysis by Tom Lane. Discussion: https://www.postgresql.org/message-id/ac5b0b69-744c-9114-6218-8300ac920e61@iki.fi
* Provide a DSA area for all parallel queries.Robert Haas2016-12-19
| | | | | | | This will allow future parallel query code to dynamically allocate storage shared by all participants. Thomas Munro, with assorted changes by me.
* Unbreak Finalize HashAggregate over Partial HashAggregate.Robert Haas2016-12-16
| | | | | | | | | | | | | | | Commit 5dfc198146b49ce7ecc8a1fc9d5e171fb75f6ba5 introduced the use of a new type of hash table with linear reprobing for hash aggregates. Such a hash table behaves very poorly if keys are inserted in hash order, which does in fact happen in the case where a query use a Finalize HashAggregate node fed (via Gather) by a Partial HashAggregate node. In fact, queries with this type of plan tend to run effectively forever. Fix that by seeding the hash value differently in each worker (and in the leader, if it participates). Andres Freund and Robert Haas
* Clean up code, comments, and formatting for table partitioning.Robert Haas2016-12-13
| | | | | Amit Langote, plus pgindent-ing by me. Inspired in part by review comments from Tomas Vondra.
* Fix bogus comment.Robert Haas2016-12-08
| | | | | | | Commit 4212cb73262bbdd164727beffa4c4744b4ead92d rendered a comment in execMain.c incorrect. Per complaint from Tom Lane, repair. Patch from Amit Kapila, per wording suggested by Tom Lane and me.
* Implement table partitioning.Robert Haas2016-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
* Fix interaction of parallel query with prepared statements.Robert Haas2016-12-06
| | | | | | | | | | | | | | | | | Previously, a prepared statement created via a Parse message could get a parallel plan, but one created with a PREPARE statement could not. This state of affairs was due to confusion on my (rhaas) part: I erroneously believed that a CREATE TABLE .. AS EXECUTE statement could only be performed with a prepared statement by PREPARE, but in fact one created by a Prepare message works just as well. Therefore, it makes no sense to allow parallel query in one case but not the other. To fix, allow parallel query with all prepared statements, but run the parallel plan serially (i.e. without workers) in the case of CREATE TABLE .. AS EXECUTE. Also, document this. Amit Kapila and Tobias Bussman, plus an extra sentence of documentation by me.
* Ensure gatherstate->nextreader is properly initialized.Robert Haas2016-12-05
| | | | | | | | | The previously code worked OK as long as a Gather node was never rescanned, or if it was rescanned, as long as it got at least as many workers on rescan as it had originally. But if the number of workers ever decreased on a rescan, then it could crash. Andreas Seltenreich
* User narrower representative tuples in the hash-agg hashtable.Andres Freund2016-11-30
| | | | | | | | | | | | | | | | | | | So far the hashtable stored representative tuples in the form of its input slot, with all columns in the hashtable that are not needed (i.e. not grouped upon or functionally dependent) set to NULL. Thats good for saving memory, but it turns out that having tuples full of NULL isn't free. slot_deform_tuple is faster if there's no NULL bitmap even if no NULLs are encountered, and skipping over leading NULLs isn't free. So compute a separate tuple descriptor that only contains the needed columns. As columns have already been moved in/out the slot for the hashtable that does not imply additional per-row overhead. Author: Andres Freund Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/20161103110721.h5i5t5saxfk5eeik@alap3.anarazel.de
* Perform one only projection to compute agg arguments.Andres Freund2016-11-30
| | | | | | | | | | | | | | | | Previously we did a ExecProject() for each individual aggregate argument. That turned out to be a performance bottleneck in queries with multiple aggregates. Doing all the argument computations in one ExecProject() is quite a bit cheaper because ExecProject's fastpath can do the work at once in a relatively tight loop, and because it can get all the required columns with a single slot_getsomeattr and save some other redundant setup costs. Author: Andres Freund Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/20161103110721.h5i5t5saxfk5eeik@alap3.anarazel.de
* Fix typo in commentMagnus Hagander2016-11-25
| | | | Thomas Munro
* Simplify code by getting rid of SPI_push, SPI_pop, SPI_restore_connection.Tom Lane2016-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea behind SPI_push was to allow transitioning back into an "unconnected" state when a SPI-using procedure calls unrelated code that might or might not invoke SPI. That sounds good, but in practice the only thing it does for us is to catch cases where a called SPI-using function forgets to call SPI_connect --- which is a highly improbable failure mode, since it would be exposed immediately by direct testing of said function. As against that, we've had multiple bugs induced by forgetting to call SPI_push/SPI_pop around code that might invoke SPI-using functions; these are much harder to catch and indeed have gone undetected for years in some cases. And we've had to band-aid around some problems of this ilk by introducing conditional push/pop pairs in some places, which really kind of defeats the purpose altogether; if we can't draw bright lines between connected and unconnected code, what's the point? Hence, get rid of SPI_push[_conditional], SPI_pop[_conditional], and the underlying state variable _SPI_curid. It turns out SPI_restore_connection can go away too, which is a nice side benefit since it was never more than a kluge. Provide no-op macros for the deleted functions so as to avoid an API break for external modules. A side effect of this removal is that SPI_palloc and allied functions no longer permit being called when unconnected; they'll throw an error instead. The apparent usefulness of the previous behavior was a mirage as well, because it was depended on by only a few places (which I fixed in preceding commits), and it posed a risk of allocations being unexpectedly long-lived if someone forgot a SPI_push call. Discussion: <20808.1478481403@sss.pgh.pa.us>
* Make SPI_fnumber() reject dropped columns.Tom Lane2016-11-08
| | | | | | | | | | | | | | | | | | | | | There's basically no scenario where it's sensible for this to match dropped columns, so put a test for dropped-ness into SPI_fnumber() itself, and excise the test from the small number of callers that were paying attention to the case. (Most weren't :-(.) In passing, normalize tests at call sites: always reject attnum <= 0 if we're disallowing system columns. Previously there was a mixture of "< 0" and "<= 0" tests. This makes no practical difference since SPI_fnumber() never returns 0, but I'm feeling pedantic today. Also, in the places that are actually live user-facing code and not legacy cruft, distinguish "column not found" from "can't handle system column". Per discussion with Jim Nasby; thi supersedes his original patch that just changed the behavior at one call site. Discussion: <b2de8258-c4c0-1cb8-7b97-e8538e5c975c@BlueTreble.com>
* Improve speed of aggregates that use array_append as transition function.Tom Lane2016-10-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the previous coding, if an aggregate's transition function returned an expanded array, nodeAgg.c and nodeWindowAgg.c would always copy it and thus force it into the flat representation. This led to ping-ponging between flat and expanded formats, which costs a lot. For an aggregate using array_append as transition function, I measured about a 15X slowdown compared to the pre-9.5 code, when working on simple int[] arrays. Of course, the old code was already O(N^2) in this usage due to copying flat arrays all the time, but it wasn't quite this inefficient. To fix, teach nodeAgg.c and nodeWindowAgg.c to allow expanded transition values without copying, so long as the transition function takes care to return the transition value already properly parented under the aggcontext. That puts a bit of extra responsibility on the transition function, but doing it this way allows us to not need any extra logic in the fast path of advance_transition_function (ie, with a pass-by-value transition value, or with a modified-in-place pass-by-reference value). We already know that that's a hot spot so I'm loath to add any cycles at all there. Also, while only array_append currently knows how to follow this convention, this solution allows other transition functions to opt-in without needing to have a whitelist in the core aggregation code. (The reason we would need a whitelist is that currently, if you pass a R/W expanded-object pointer to an arbitrary function, it's allowed to do anything with it including deleting it; that breaks the core agg code's assumption that it should free discarded values. Returning a value under aggcontext is the transition function's signal that it knows it is an aggregate transition function and will play nice. Possibly the API rules for expanded objects should be refined, but that would not be a back-patchable change.) With this fix, an aggregate using array_append is no longer O(N^2), so it's much faster than pre-9.5 code rather than much slower. It's still a bit slower than the bespoke infrastructure for array_agg, but the differential seems to be only about 10%-20% rather than orders of magnitude. Discussion: <6315.1477677885@sss.pgh.pa.us>
* Avoid using a C++ keyword in header filePeter Eisentraut2016-10-26
| | | | per cpluspluscheck
* Avoid testing tuple visibility without buffer lock.Tom Lane2016-10-23
| | | | | | | | | | | | | | | | | | | | INSERT ... ON CONFLICT (specifically ExecCheckHeapTupleVisible) contains another example of this unsafe coding practice. It is much harder to get a failure out of it than the case fixed in commit 6292c2339, because in most scenarios any hint bits that could be set would have already been set earlier in the command. However, Konstantin Knizhnik reported a failure with a custom transaction manager, and it's clearly possible to get a failure via a race condition in async-commit mode. For lack of a reproducible example, no regression test case in this commit. I did some testing with Asserts added to tqual.c's functions, and can say that running "make check-world" exposed these two bugs and no others. The Asserts are messy enough that I've not added them to the code for now. Report: <57EE93C8.8080504@postgrespro.ru> Related-Discussion: <CAO3NbwOycQjt2Oqy2VW-eLTq2M5uGMyHnGm=RNga4mjqcYD7gQ@mail.gmail.com>
* Don't throw serialization errors for self-conflicts in INSERT ON CONFLICT.Tom Lane2016-10-23
| | | | | | | | | | | | | | | | | A transaction that conflicts against itself, for example INSERT INTO t(pk) VALUES (1),(1) ON CONFLICT DO NOTHING; should behave the same regardless of isolation level. It certainly shouldn't throw a serialization error, as retrying will not help. We got this wrong due to the ON CONFLICT logic not considering the case, as reported by Jason Dusek. Core of this patch is by Peter Geoghegan (based on an earlier patch by Thomas Munro), though I didn't take his proposed code refactoring for fear that it might have unexpected side-effects. Test cases by Thomas Munro and myself. Report: <CAO3NbwOycQjt2Oqy2VW-eLTq2M5uGMyHnGm=RNga4mjqcYD7gQ@mail.gmail.com> Related-Discussion: <57EE93C8.8080504@postgrespro.ru>
* Use more efficient hashtable for execGrouping.c to speed up hash aggregation.Andres Freund2016-10-14
| | | | | | | | | | | | | | | | | | | | | The more efficient hashtable speeds up hash-aggregations with more than a few hundred groups significantly. Improvements of over 120% have been measured. Due to the the different hash table queries that not fully determined (e.g. GROUP BY without ORDER BY) may change their result order. The conversion is largely straight-forward, except that, due to the static element types of simplehash.h type hashes, the additional data some users store in elements (e.g. the per-group working data for hash aggregaters) is now stored in TupleHashEntryData->additional. The meaning of BuildTupleHashTable's entrysize (renamed to additionalsize) has been changed to only be about the additionally stored size. That size is only used for the initial sizing of the hash-table. Reviewed-By: Tomas Vondra Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>
* Fix incorrect handling of polymorphic aggregates used as window functions.Tom Lane2016-10-09
| | | | | | | | | | | | | | | | | The transfunction was told that its first argument and result were of the window function output type, not the aggregate state type. This'd only matter if the transfunction consults get_fn_expr_argtype, which typically only polymorphic functions would do. Although we have several regression tests around polymorphic aggs, none of them detected this mistake --- in fact, they still didn't fail when I injected the same mistake into nodeAgg.c. So add some more tests covering both plain agg and window-function-agg cases. Per report from Sebastian Luque. Back-patch to 9.6 where the error was introduced (by sloppy refactoring in commit 804163bc2, looks like). Report: <87int2qkat.fsf@gmail.com>
* Extend framework from commit 53be0b1ad to report latch waits.Robert Haas2016-10-04
| | | | | | | | | | | | | | | | | | | | | | WaitLatch, WaitLatchOrSocket, and WaitEventSetWait now taken an additional wait_event_info parameter; legal values are defined in pgstat.h. This makes it possible to uniquely identify every point in the core code where we are waiting for a latch; extensions can pass WAIT_EXTENSION. Because latches were the major wait primitive not previously covered by this patch, it is now possible to see information in pg_stat_activity on a large number of important wait events not previously addressed, such as ClientRead, ClientWrite, and SyncRep. Unfortunately, many of the wait events added by this patch will fail to appear in pg_stat_activity because they're only used in background processes which don't currently appear in pg_stat_activity. We should fix this either by creating a separate view for such information, or else by deciding to include them in pg_stat_activity after all. Michael Paquier and Robert Haas, reviewed by Alexander Korotkov and Thomas Munro.
* Remove nearly-unused SizeOfIptrData macro.Tom Lane2016-09-22
| | | | | | | | | | | | | Past refactorings have removed all but one reference to SizeOfIptrData (and that one place was in a pretty noncritical spot). Since nobody's complained, it seems probable that there are no supported compilers that don't think sizeof(ItemPointerData) is 6. If there are, we're wasting MAXALIGN per heap tuple anyway, so it's rather silly to worry about whether we can shave space in places like WAL records. Pavan Deolasee Discussion: <CABOikdOOawDda4hwLOT6zdA6MFfPLu3Z2YBZkX0JdayNS6JOeQ@mail.gmail.com>
* Be sure to rewind the tuplestore read pointer in non-leader CTEScan nodes.Tom Lane2016-09-22
| | | | | | | | | | | | | | | | | | | | | ExecInitCteScan supposed that it didn't have to do anything to the extra tuplestore read pointer it gets from tuplestore_alloc_read_pointer. However, it needs this read pointer to be positioned at the start of the tuplestore, while tuplestore_alloc_read_pointer is actually defined as cloning the current position of read pointer 0. In normal situations that accidentally works because we initialize the whole plan tree at once, before anything gets read. But it fails in an EvalPlanQual recheck, as illustrated in bug #14328 from Dima Pavlov. To fix, just forcibly rewind the pointer after tuplestore_alloc_read_pointer. The cost of doing so is negligible unless the tuplestore is already in TSS_READFILE state, which wouldn't happen in normal cases. We could consider altering tuplestore's API to make that case cheaper, but that would make for a more invasive back-patch and it doesn't seem worth it. This has been broken probably for as long as we've had CTEs, so back-patch to all supported branches. Discussion: <32468.1474548308@sss.pgh.pa.us>
* Fix typo in comment.Robert Haas2016-09-15
| | | | Amit Langote
* Fix executor/README to reflect disallowing SRFs in UPDATE.Tom Lane2016-09-13
| | | | | The parenthetical comment here is obsoleted by commit a4c35ea1c. Noted by Andres Freund.
* Add macros to make AllocSetContextCreate() calls simpler and safer.Tom Lane2016-08-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>
* Fix improper repetition of previous results from a hashed aggregate.Tom Lane2016-08-24
| | | | | | | | | | | | | | | | | | | | | | ExecReScanAgg's check for whether it could re-use a previously calculated hashtable neglected the possibility that the Agg node might reference PARAM_EXEC Params that are not referenced by its input plan node. That's okay if the Params are in upper tlist or qual expressions; but if one appears in aggregate input expressions, then the hashtable contents need to be recomputed when the Param's value changes. To avoid unnecessary performance degradation in the case of a Param that isn't within an aggregate input, add logic to the planner to determine which Params are within aggregate inputs. This requires a new field in struct Agg, but fortunately we never write plans to disk, so this isn't an initdb-forcing change. Per report from Jeevan Chalke. This has been broken since forever, so back-patch to all supported branches. Andrew Gierth, with minor adjustments by me Report: <CAM2+6=VY8ykfLT5Q8vb9B6EbeBk-NGuLbT6seaQ+Fq4zXvrDcA@mail.gmail.com>
* Improve parsetree representation of special functions such as CURRENT_DATE.Tom Lane2016-08-16
| | | | | | | | | | | | | | We implement a dozen or so parameterless functions that the SQL standard defines special syntax for. Up to now, that was done by converting them into more or less ad-hoc constructs such as "'now'::text::date". That's messy for multiple reasons: it exposes what should be implementation details to users, and performance is worse than it needs to be in several cases. To improve matters, invent a new expression node type SQLValueFunction that can represent any of these parameterless functions. Bump catversion because this changes stored parsetrees for rules. Discussion: <30058.1463091294@sss.pgh.pa.us>
* Fix possible crash due to incorrect allocation context.Robert Haas2016-08-16
| | | | | | | | | | | | | Commit af33039317ddc4a0e38a02e2255c2bf453115fd2 aimed to reduce leakage from tqueue.c, which is good. Unfortunately, by changing the memory context in which all of gather_readnext() executes, it also changed the context in which ExecShutdownGatherWorkers executes, which is not good, because that function eventually causes a call to ExecParallelRetrieveInstrumentation, which proceeds to allocate planstate->worker_instrument in a short-lived context, causing a crash. Rushabh Lathia, reviewed by Amit Kapila and by me.
* Add SQL-accessible functions for inspecting index AM properties.Tom Lane2016-08-13
| | | | | | | | | | | | | | | | | | | | | Per discussion, we should provide such functions to replace the lost ability to discover AM properties by inspecting pg_am (cf commit 65c5fcd35). The added functionality is also meant to displace any code that was looking directly at pg_index.indoption, since we'd rather not believe that the bit meanings in that field are part of any client API contract. As future-proofing, define the SQL API to not assume that properties that are currently AM-wide or index-wide will remain so unless they logically must be; instead, expose them only when inquiring about a specific index or even specific index column. Also provide the ability for an index AM to override the behavior. In passing, document pg_am.amtype, overlooked in commit 473b93287. Andrew Gierth, with kibitzing by me and others Discussion: <87mvl5on7n.fsf@news-spur.riddles.org.uk>
* Fix busted Assert for CREATE MATVIEW ... WITH NO DATA.Tom Lane2016-08-11
| | | | | | | | | | | | | | | | Commit 874fe3aea changed the command tag returned for CREATE MATVIEW/CREATE TABLE AS ... WITH NO DATA, but missed that there was code in spi.c that expected the command tag to always be "SELECT". Fortunately, the consequence was only an Assert failure, so this oversight should have no impact in production builds. Since this code path was evidently un-exercised, add a regression test. Per report from Shivam Saxena. Back-patch to 9.3, like the previous commit. Michael Paquier Report: <97218716-480B-4527-B5CD-D08D798A0C7B@dresources.com>
* Fix two errors with nested CASE/WHEN constructs.Tom Lane2016-08-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ExecEvalCase() tried to save a cycle or two by passing &econtext->caseValue_isNull as the isNull argument to its sub-evaluation of the CASE value expression. If that subexpression itself contained a CASE, then *isNull was an alias for econtext->caseValue_isNull within the recursive call of ExecEvalCase(), leading to confusion about whether the inner call's caseValue was null or not. In the worst case this could lead to a core dump due to dereferencing a null pointer. Fix by not assigning to the global variable until control comes back from the subexpression. Also, avoid using the passed-in isNull pointer transiently for evaluation of WHEN expressions. (Either one of these changes would have been sufficient to fix the known misbehavior, but it's clear now that each of these choices was in itself dangerous coding practice and best avoided. There do not seem to be any similar hazards elsewhere in execQual.c.) Also, it was possible for inlining of a SQL function that implements the equality operator used for a CASE comparison to result in one CASE expression's CaseTestExpr node being inserted inside another CASE expression. This would certainly result in wrong answers since the improperly nested CaseTestExpr would be caused to return the inner CASE's comparison value not the outer's. If the CASE values were of different data types, a crash might result; moreover such situations could be abused to allow disclosure of portions of server memory. To fix, teach inline_function to check for "bare" CaseTestExpr nodes in the arguments of a function to be inlined, and avoid inlining if there are any. Heikki Linnakangas, Michael Paquier, Tom Lane Report: https://github.com/greenplum-db/gpdb/pull/327 Report: <4DDCEEB8.50602@enterprisedb.com> Security: CVE-2016-5423
* Don't CHECK_FOR_INTERRUPTS between WaitLatch and ResetLatch.Tom Lane2016-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | This coding pattern creates a race condition, because if an interesting interrupt happens after we've checked InterruptPending but before we reset our latch, the latch-setting done by the signal handler would get lost, and then we might block at WaitLatch in the next iteration without ever noticing the interrupt condition. You can put the CHECK_FOR_INTERRUPTS before WaitLatch or after ResetLatch, but not between them. Aside from fixing the bugs, add some explanatory comments to latch.h to perhaps forestall the next person from making the same mistake. In HEAD, also replace gather_readnext's direct call of HandleParallelMessages with CHECK_FOR_INTERRUPTS. It does not seem clean or useful for this one caller to bypass ProcessInterrupts and go straight to HandleParallelMessages; not least because that fails to consider the InterruptPending flag, resulting in useless work both here (if InterruptPending isn't set) and in the next CHECK_FOR_INTERRUPTS call (if it is). This thinko seems to have been introduced in the initial coding of storage/ipc/shm_mq.c (commit ec9037df2), and then blindly copied into all the subsequent parallel-query support logic. Back-patch relevant hunks to 9.4 to extirpate the error everywhere. Discussion: <1661.1469996911@sss.pgh.pa.us>
* Code review for tqueue.c: fix memory leaks, speed it up, other fixes.Tom Lane2016-07-31
| | | | | | | | | | | | | | | | | | | | When doing record typmod remapping, tqueue.c did fresh catalog lookups for each tuple it processed, which was pretty horrible performance-wise (it seemed to about halve the already none-too-quick speed of bulk reads in parallel mode). Worse, it insisted on putting bits of that data into TopMemoryContext, from where it never freed them, causing a session-lifespan memory leak. (I suppose this was coded with the idea that the sender process would quit after finishing the query --- but the receiver uses the same code.) Restructure to avoid repetitive catalog lookups and to keep that data in a query-lifespan context, in or below the context where the TQueueDestReceiver or TupleQueueReader itself lives. Fix some other bugs such as continuing to use a tupledesc after releasing our refcount on it. Clean up cavalier datatype choices (typmods are int32, please, not int, and certainly not Oid). Improve comments and error message wording.
* Fix worst memory leaks in tqueue.c.Tom Lane2016-07-29
| | | | | | | | | | | | | | | | | | | | | | | | | TupleQueueReaderNext() leaks like a sieve if it has to do any tuple disassembly/reconstruction. While we could try to clean up its allocations piecemeal, it seems like a better idea just to insist that it should be run in a short-lived memory context, so that any transient space goes away automatically. I chose to have nodeGather.c switch into its existing per-tuple context before the call, rather than inventing a separate context inside tqueue.c. This is sufficient to stop all leakage in the simple case I exhibited earlier today (see link below), but it does not deal with leaks induced in more complex cases by tqueue.c's insistence on using TopMemoryContext for data that it's not actually trying hard to keep track of. That issue is intertwined with another major source of inefficiency, namely failure to cache lookup results across calls, so it seems best to deal with it separately. In passing, improve some comments, and modify gather_readnext's method for deciding when it's visited all the readers so that it's more obviously correct. (I'm not actually convinced that the previous code *is* correct in the case of a reader deletion; it certainly seems fragile.) Discussion: <32763.1469821037@sss.pgh.pa.us>