aboutsummaryrefslogtreecommitdiff
path: root/src/backend/nodes/readfuncs.c
Commit message (Collapse)AuthorAge
* Further fixes for MULTIEXPR_SUBLINK fix.Tom Lane2022-09-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some more things I didn't think about in commits 3f7323cbb et al: MULTIEXPR_SUBLINK subplans might have been converted to initplans instead of regular subplans, in which case they won't show up in the modified targetlist. Fortunately, this would only happen if they have no input parameters, which means that the problem we originally needed to fix can't happen with them. Therefore, there's no need to clone their output parameters, and thus it doesn't hurt that we'll fail to see them in the first pass over the targetlist. Nonetheless, this complicates matters greatly, because now we have to distinguish output Params of initplans (which shouldn't get renumbered) from those of regular subplans (which should). This also breaks the simplistic scheme I used of assuming that the subplans found in the targetlist have consecutive subLinkIds. We really can't avoid the need to know the subplans' subLinkIds in this code. To fix that, add subLinkId as the last field of SubPlan. We can get away with that change in back branches because SubPlan nodes will never be stored in the catalogs, and there's no ABI break for external code that might be looking at the existing fields of SubPlan. Secondly, rewriteTargetListIU might have rolled up multiple FieldStores or SubscriptingRefs into one targetlist entry, breaking the assumption that there's at most one Param to fix per targetlist entry. (That assumption is OK I think in the ruleutils.c code I stole the logic from in 18f51083c, because that only deals with pre-rewrite query trees. But it's definitely not OK here.) Abandon that shortcut and just do a full tree walk on the targetlist to ensure we find all the Params we have to change. Per bug #17606 from Andre Lin. As before, only v10-v13 need the patch. Discussion: https://postgr.es/m/17606-e5c8ad18d31db96a@postgresql.org
* Fix index-only scan plans, take 2.Tom Lane2022-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4ace45677 failed to fix the problem fully, because the same issue of attempting to fetch a non-returnable index column can occur when rechecking the indexqual after using a lossy index operator. Moreover, it broke EXPLAIN for such indexquals (which indicates a gap in our test cases :-(). Revert the code changes of 4ace45677 in favor of adding a new field to struct IndexOnlyScan, containing a version of the indexqual that can be executed against the index-returned tuple without using any non-returnable columns. (The restrictions imposed by check_index_only guarantee this is possible, although we may have to recompute indexed expressions.) Support construction of that during setrefs.c processing by marking IndexOnlyScan.indextlist entries as resjunk if they can't be returned, rather than removing them entirely. (We could alternatively require setrefs.c to look up the IndexOptInfo again, but abusing resjunk this way seems like a reasonably safe way to avoid needing to do that.) This solution isn't great from an API-stability standpoint: if there are any extensions out there that build IndexOnlyScan structs directly, they'll be broken in the next minor releases. However, only a very invasive extension would be likely to do such a thing. There's no change in the Path representation, so typical planner extensions shouldn't have a problem. As before, back-patch to all supported branches. Discussion: https://postgr.es/m/3179992.1641150853@sss.pgh.pa.us Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org
* Get rid of trailing semicolons in C macro definitions.Tom Lane2020-05-01
| | | | | | | | | | | | | | | | | | | | | | | Writing a trailing semicolon in a macro is almost never the right thing, because you almost always want to write a semicolon after each macro call instead. (Even if there was some reason to prefer not to, pgindent would probably make a hash of code formatted that way; so within PG the rule should basically be "don't do it".) Thus, if we have a semi inside the macro, the compiler sees "something;;". Much of the time the extra empty statement is harmless, but it could lead to mysterious syntax errors at call sites. In perhaps an overabundance of neatnik-ism, let's run around and get rid of the excess semicolons whereever possible. The only thing worse than a mysterious syntax error is a mysterious syntax error that only happens in the back branches; therefore, backpatch these changes where relevant, which is most of them because most of these mistakes are old. (The lack of reported problems shows that this is largely a hypothetical issue, but still, it could bite us in some future patch.) John Naylor and Tom Lane Discussion: https://postgr.es/m/CACPNZCs0qWTqJ2QUSGJ07B7uvAvzMb-KbG2q+oo+J3tsWN5cqw@mail.gmail.com
* Fix representation of hash keys in Hash/HashJoin nodes.Andres Freund2019-08-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 5f32b29c1819 I changed the creation of HashState.hashkeys to actually use HashState as the parent (instead of HashJoinState, which was incorrect, as they were executed below HashState), to fix the problem of hashkeys expressions otherwise relying on slot types appropriate for HashJoinState, rather than HashState as would be correct. That reliance was only introduced in 12, which is why it previously worked to use HashJoinState as the parent (although I'd be unsurprised if there were problematic cases). Unfortunately that's not a sufficient solution, because before this commit, the to-be-hashed expressions referenced inner/outer as appropriate for the HashJoin, not Hash. That didn't have obvious bad consequences, because the slots containing the tuples were put into ecxt_innertuple when hashing a tuple for HashState (even though Hash doesn't have an inner plan). There are less common cases where this can cause visible problems however (rather than just confusion when inspecting such executor trees). E.g. "ERROR: bogus varno: 65000", when explaining queries containing a HashJoin where the subsidiary Hash node's hash keys reference a subplan. While normally hashkeys aren't displayed by EXPLAIN, if one of those expressions references a subplan, that subplan may be printed as part of the Hash node - which then failed because an inner plan was referenced, and Hash doesn't have that. It seems quite possible that there's other broken cases, too. Fix the problem by properly splitting the expression for the HashJoin and Hash nodes at plan time, and have them reference the proper subsidiary node. While other workarounds are possible, fixing this correctly seems easy enough. It was a pretty ugly hack to have ExecInitHashJoin put the expression into the already initialized HashState, in the first place. I decided to not just split inner/outer hashkeys inside make_hashjoin(), but also to separate out hashoperators and hashcollations at plan time. Otherwise we would have ended up having two very similar loops, one at plan time and the other during executor startup. The work seems to more appropriately belong to plan time, anyway. Reported-By: Nikita Glukhov, Alexander Korotkov Author: Andres Freund Reviewed-By: Tom Lane, in an earlier version Discussion: https://postgr.es/m/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR+teQ_8tEXU8mxg@mail.gmail.com Backpatch: 12-
* Restructure creation of run-time pruning steps.Tom Lane2019-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, gen_partprune_steps() always built executor pruning steps using all suitable clauses, including those containing PARAM_EXEC Params. This meant that the pruning steps were only completely safe for executor run-time (scan start) pruning. To prune at executor startup, we had to ignore the steps involving exec Params. But this doesn't really work in general, since there may be logic changes needed as well --- for example, pruning according to the last operator's btree strategy is the wrong thing if we're not applying that operator. The rules embodied in gen_partprune_steps() and its minions are sufficiently complicated that tracking their incremental effects in other logic seems quite impractical. Short of a complete redesign, the only safe fix seems to be to run gen_partprune_steps() twice, once to create executor startup pruning steps and then again for run-time pruning steps. We can save a few cycles however by noting during the first scan whether we rejected any clauses because they involved exec Params --- if not, we don't need to do the second scan. In support of this, refactor the internal APIs in partprune.c to make more use of passing information in the GeneratePruningStepsContext struct, rather than as separate arguments. This is, I hope, the last piece of our response to a bug report from Alan Jackson. Back-patch to v11 where this code came in. Discussion: https://postgr.es/m/FAD28A83-AC73-489E-A058-2681FA31D648@tvsquared.com
* Generated columnsPeter Eisentraut2019-03-30
| | | | | | | | | | | | | | This is an SQL-standard feature that allows creating columns that are computed from expressions rather than assigned, similar to a view or materialized view but on a column basis. This implements one kind of generated column: stored (computed on write). Another kind, virtual (computed on read), is planned for the future, and some room is left for it. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
* Collations with nondeterministic comparisonPeter Eisentraut2019-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a flag "deterministic" to collations. If that is false, such a collation disables various optimizations that assume that strings are equal only if they are byte-wise equal. That then allows use cases such as case-insensitive or accent-insensitive comparisons or handling of strings with different Unicode normal forms. This functionality is only supported with the ICU provider. At least glibc doesn't appear to have any locales that work in a nondeterministic way, so it's not worth supporting this for the libc provider. The term "deterministic comparison" in this context is from Unicode Technical Standard #10 (https://unicode.org/reports/tr10/#Deterministic_Comparison). This patch makes changes in three areas: - CREATE COLLATION DDL changes and system catalog changes to support this new flag. - Many executor nodes and auxiliary code are extended to track collations. Previously, this code would just throw away collation information, because the eventually-called user-defined functions didn't use it since they only cared about equality, which didn't need collation information. - String data type functions that do equality comparisons and hashing are changed to take the (non-)deterministic flag into account. For comparison, this just means skipping various shortcuts and tie breakers that use byte-wise comparison. For hashing, we first need to convert the input string to a canonical "sort key" using the ICU analogue of strxfrm(). Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
* Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.Robert Haas2019-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We still require AccessExclusiveLock on the partition itself, because otherwise an insert that violates the newly-imposed partition constraint could be in progress at the same time that we're changing that constraint; only the lock level on the parent relation is weakened. To make this safe, we have to cope with (at least) three separate problems. First, relevant DDL might commit while we're in the process of building a PartitionDesc. If so, find_inheritance_children() might see a new partition while the RELOID system cache still has the old partition bound cached, and even before invalidation messages have been queued. To fix that, if we see that the pg_class tuple seems to be missing or to have a null relpartbound, refetch the value directly from the table. We can't get the wrong value, because DETACH PARTITION still requires AccessExclusiveLock throughout; if we ever want to change that, this will need more thought. In testing, I found it quite difficult to hit even the null-relpartbound case; the race condition is extremely tight, but the theoretical risk is there. Second, successive calls to RelationGetPartitionDesc might not return the same answer. The query planner will get confused if lookup up the PartitionDesc for a particular relation does not return a consistent answer for the entire duration of query planning. Likewise, query execution will get confused if the same relation seems to have a different PartitionDesc at different times. Invent a new PartitionDirectory concept and use it to ensure consistency. This ensures that a single invocation of either the planner or the executor sees the same view of the PartitionDesc from beginning to end, but it does not guarantee that the planner and the executor see the same view. Since this allows pointers to old PartitionDesc entries to survive even after a relcache rebuild, also postpone removing the old PartitionDesc entry until we're certain no one is using it. For the most part, it seems to be OK for the planner and executor to have different views of the PartitionDesc, because the executor will just ignore any concurrently added partitions which were unknown at plan time; those partitions won't be part of the inheritance expansion, but invalidation messages will trigger replanning at some point. Normally, this happens by the time the very next command is executed, but if the next command acquires no locks and executes a prepared query, it can manage not to notice until a new transaction is started. We might want to tighten that up, but it's material for a separate patch. There would still be a small window where a query that started just after an ATTACH PARTITION command committed might fail to notice its results -- but only if the command starts before the commit has been acknowledged to the user. All in all, the warts here around serializability seem small enough to be worth accepting for the considerable advantage of being able to add partitions without a full table lock. Although in general the consequences of new partitions showing up between planning and execution are limited to the query not noticing the new partitions, run-time partition pruning will get confused in that case, so that's the third problem that this patch fixes. Run-time partition pruning assumes that indexes into the PartitionDesc are stable between planning and execution. So, add code so that if new partitions are added between plan time and execution time, the indexes stored in the subplan_map[] and subpart_map[] arrays within the plan's PartitionedRelPruneInfo get adjusted accordingly. There does not seem to be a simple way to generalize this scheme to cope with partitions that are removed, mostly because they could then get added back again with different bounds, but it works OK for added partitions. This code does not try to ensure that every backend participating in a parallel query sees the same view of the PartitionDesc. That currently doesn't matter, because we never pass PartitionDesc indexes between backends. Each backend will ignore the concurrently added partitions which it notices, and it doesn't matter if different backends are ignoring different sets of concurrently added partitions. If in the future that matters, for example because we allow writes in parallel query and want all participants to do tuple routing to the same set of partitions, the PartitionDirectory concept could be improved to share PartitionDescs across backends. There is a draft patch to serialize and restore PartitionDescs on the thread where this patch was discussed, which may be a useful place to start. Patch by me. Thanks to Alvaro Herrera, David Rowley, Simon Riggs, Amit Langote, and Michael Paquier for discussion, and to Alvaro Herrera for some review. Discussion: http://postgr.es/m/CA+Tgmobt2upbSocvvDej3yzokd7AkiT+PvgFH+a9-5VV1oJNSQ@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZE0r9-cyA-aY6f8WFEROaDLLL7Vf81kZ8MtFCkxpeQSw@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoY13KQZF-=HNTrt9UYWYx3_oYOQpu9ioNT49jGgiDpUEA@mail.gmail.com
* Fix copy/out/readfuncs for accessMethod addition in 8586bf7ed8.Andres Freund2019-03-06
| | | | | | | | This includes a catversion bump, as IntoClause is theoretically speaking part of storable rules. In practice I don't think that can happen, but there's no reason to be stingy here. Per buildfarm member calliphoridae.
* Allow user control of CTE materialization, and change the default behavior.Tom Lane2019-02-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically we've always materialized the full output of a CTE query, treating WITH as an optimization fence (so that, for example, restrictions from the outer query cannot be pushed into it). This is appropriate when the CTE query is INSERT/UPDATE/DELETE, or is recursive; but when the CTE query is non-recursive and side-effect-free, there's no hazard of changing the query results by pushing restrictions down. Another argument for materialization is that it can avoid duplicate computation of an expensive WITH query --- but that only applies if the WITH query is called more than once in the outer query. Even then it could still be a net loss, if each call has restrictions that would allow just a small part of the WITH query to be computed. Hence, let's change the behavior for WITH queries that are non-recursive and side-effect-free. By default, we will inline them into the outer query (removing the optimization fence) if they are called just once. If they are called more than once, we will keep the old behavior by default, but the user can override this and force inlining by specifying NOT MATERIALIZED. Lastly, the user can force the old behavior by specifying MATERIALIZED; this would mainly be useful when the query had deliberately been employing WITH as an optimization fence to prevent a poor choice of plan. Andreas Karlsson, Andrew Gierth, David Fetter Discussion: https://postgr.es/m/87sh48ffhb.fsf@news-spur.riddles.org.uk
* Renaming for new subscripting mechanismAlvaro Herrera2019-02-01
| | | | | | | | | | | | Over at patch https://commitfest.postgresql.org/21/1062/ Dmitry wants to introduce a more generic subscription mechanism, which allows subscripting not only arrays but also other object types such as JSONB. That functionality is introduced in a largish invasive patch, out of which this internal renaming patch was extracted. Author: Dmitry Dolgov Reviewed-by: Tom Lane, Arthur Zakirov Discussion: https://postgr.es/m/CA+q6zcUK4EqPAu7XRRO5CCjMwhz5zvg+rfWuLzVoxp_5sKS6=w@mail.gmail.com
* In the planner, replace an empty FROM clause with a dummy RTE.Tom Lane2019-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fact that "SELECT expression" has no base relations has long been a thorn in the side of the planner. It makes it hard to flatten a sub-query that looks like that, or is a trivial VALUES() item, because the planner generally uses relid sets to identify sub-relations, and such a sub-query would have an empty relid set if we flattened it. prepjointree.c contains some baroque logic that works around this in certain special cases --- but there is a much better answer. We can replace an empty FROM clause with a dummy RTE that acts like a table of one row and no columns, and then there are no such corner cases to worry about. Instead we need some logic to get rid of useless dummy RTEs, but that's simpler and covers more cases than what was there before. For really trivial cases, where the query is just "SELECT expression" and nothing else, there's a hazard that adding the extra RTE makes for a noticeable slowdown; even though it's not much processing, there's not that much for the planner to do overall. However testing says that the penalty is very small, close to the noise level. In more complex queries, this is able to find optimizations that we could not find before. The new RTE type is called RTE_RESULT, since the "scan" plan type it gives rise to is a Result node (the same plan we produced for a "SELECT expression" query before). To avoid confusion, rename the old ResultPath path type to GroupResultPath, reflecting that it's only used in degenerate grouping cases where we know the query produces just one grouped row. (It wouldn't work to unify the two cases, because there are different rules about where the associated quals live during query_planner.) Note: although this touches readfuncs.c, I don't think a catversion bump is required, because the added case can't occur in stored rules, only plans. Patch by me, reviewed by David Rowley and Mark Dilger Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Add stack depth checks to key recursive functions in backend/nodes/*.c.Tom Lane2018-12-10
| | | | | | | | | | Although copyfuncs.c has a check_stack_depth call in its recursion, equalfuncs.c, outfuncs.c, and readfuncs.c lacked one. This seems unwise. Likewise fix planstate_tree_walker(), in branches where that exists. Discussion: https://postgr.es/m/30253.1544286631@sss.pgh.pa.us
* Remove some unnecessary fields from Plan trees.Tom Lane2018-10-07
| | | | | | | | | | | | | | | | | | | | In the wake of commit f2343653f, we no longer need some fields that were used before to control executor lock acquisitions: * PlannedStmt.nonleafResultRelations can go away entirely. * partitioned_rels can go away from Append, MergeAppend, and ModifyTable. However, ModifyTable still needs to know the RT index of the partition root table if any, which was formerly kept in the first entry of that list. Add a new field "rootRelation" to remember that. rootRelation is partly redundant with nominalRelation, in that if it's set it will have the same value as nominalRelation. However, the latter field has a different purpose so it seems best to keep them distinct. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
* Centralize executor's opening/closing of Relations for rangetable entries.Tom Lane2018-10-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | Create an array estate->es_relations[] paralleling the es_range_table, and store references to Relations (relcache entries) there, so that any given RT entry is opened and closed just once per executor run. Scan nodes typically still call ExecOpenScanRelation, but ExecCloseScanRelation is no more; relation closing is now done centrally in ExecEndPlan. This is slightly more complex than one would expect because of the interactions with relcache references held in ResultRelInfo nodes. The general convention is now that ResultRelInfo->ri_RelationDesc does not represent a separate relcache reference and so does not need to be explicitly closed; but there is an exception for ResultRelInfos in the es_trig_target_relations list, which are manufactured by ExecGetTriggerResultRel and have to be cleaned up by ExecCleanUpTriggerState. (That much was true all along, but these ResultRelInfos are now more different from others than they used to be.) To allow the partition pruning logic to make use of es_relations[] rather than having its own relcache references, adjust PartitionedRelPruneInfo to store an RT index rather than a relation OID. Amit Langote, reviewed by David Rowley and Jesper Pedersen, some mods by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
* Create an RTE field to record the query's lock mode for each relation.Tom Lane2018-09-30
| | | | | | | | | | | | | | | | | | | | | | | | Add RangeTblEntry.rellockmode, which records the appropriate lock mode for each RTE_RELATION rangetable entry (either AccessShareLock, RowShareLock, or RowExclusiveLock depending on the RTE's role in the query). This patch creates the field and makes all creators of RTE nodes fill it in reasonably, but for the moment nothing much is done with it. The plan is to replace assorted post-parser logic that re-determines the right lockmode to use with simple uses of rte->rellockmode. For now, just add Asserts in each of those places that the rellockmode matches what they are computing today. (In some cases the match isn't perfect, so the Asserts are weaker than you might expect; but this seems OK, as per discussion.) This passes check-world for me, but it seems worth pushing in this state to see if the buildfarm finds any problems in cases I failed to test. catversion bump due to change of stored rules. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
* Add a debugging option to stress-test outfuncs.c and readfuncs.c.Tom Lane2018-09-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the normal course of operation, query trees will be serialized only if they are stored as views or rules; and plan trees will be serialized only if they get passed to parallel-query workers. This leaves an awful lot of opportunity for bugs/oversights to not get detected, as indeed we've just been reminded of the hard way. To improve matters, this patch adds a new compile option WRITE_READ_PARSE_PLAN_TREES, which is modeled on the longstanding option COPY_PARSE_PLAN_TREES; but instead of passing all parse and plan trees through copyObject, it passes them through nodeToString + stringToNode. Enabling this option in a buildfarm animal or two will catch problems at least for cases that are exercised by the regression tests. A small problem with this idea is that readfuncs.c historically has discarded location fields, on the reasonable grounds that parse locations in a retrieved view are not relevant to the current query. But doing that in WRITE_READ_PARSE_PLAN_TREES breaks pg_stat_statements, and it could cause problems for future improvements that might try to report error locations at runtime. To fix that, provide a variant behavior in readfuncs.c that makes it restore location fields when told to. In passing, const-ify the string arguments of stringToNode and its subsidiary functions, just because it annoyed me that they weren't const already. Discussion: https://postgr.es/m/17114.1537138992@sss.pgh.pa.us
* Fix some minor issues exposed by outfuncs/readfuncs testing.Tom Lane2018-09-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A test patch to pass parse and plan trees through outfuncs + readfuncs exposed several issues that need to be fixed to get clean matches: Query.withCheckOptions failed to get copied; it's intentionally ignored by outfuncs/readfuncs on the grounds that it'd always be NIL anyway in stored rules. This seems less than future-proof, and it's not even saving very much, so just undo the decision and treat the field like all others. Several places that convert a view RTE into a subquery RTE, or similar manipulations, failed to clear out fields that were specific to the original RTE type and should be zero in a subquery RTE. Since readfuncs.c will leave such fields as zero, equalfuncs.c thinks the nodes are different leading to a reported mismatch. It seems like a good idea to clear out the no-longer-needed fields, even though in principle nothing should look at them; the node ought to be indistinguishable from how it would look if we'd built a new node instead of scribbling on the old one. BuildOnConflictExcludedTargetlist randomly set the resname of some TargetEntries to "" not NULL. outfuncs/readfuncs don't distinguish those cases, and so the string will read back in as NULL ... but equalfuncs.c does distinguish. Perhaps we ought to try to make things more consistent in this area --- but it's just useless extra code space for BuildOnConflictExcludedTargetlist to not use NULL here, so I fixed it for now by making it do that. catversion bumped because the change in handling of Query.withCheckOptions affects stored rules. Discussion: https://postgr.es/m/17114.1537138992@sss.pgh.pa.us
* Fix some probably-minor oversights in readfuncs.c.Tom Lane2018-09-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The system expects TABLEFUNC RTEs to have coltypes, coltypmods, and colcollations lists, but outfuncs doesn't dump them and readfuncs doesn't restore them. This doesn't cause obvious failures, because the only things that look at those fields are expandRTE() and get_rte_attribute_type(), which are mostly used during parse analysis, before anything would've passed the parsetree through outfuncs/readfuncs. But expandRTE() is used in build_physical_tlist(), which means that that function will return a wrong answer for a TABLEFUNC RTE that came from a view. Very accidentally, this doesn't cause serious problems, because what it will return is NIL which callers will interpret as "couldn't build a physical tlist because of dropped columns". So you still get a plan that works, though it's marginally less efficient than it could be. There are also some other expandRTE() calls associated with transformation of whole-row Vars in the planner. I have been unable to exhibit misbehavior from that, and it may be unreachable in any case that anyone would care about ... but I'm not entirely convinced, so this seems like something we should back- patch a fix for. Fortunately, we can fix it without forcing a change of stored rules and a catversion bump, because we can just copy these lists from the subsidiary TableFunc object. readfuncs.c was also missing support for NamedTuplestoreScan plan nodes. This accidentally fails to break parallel query because a query using a named tuplestore would never be considered parallel-safe anyway. However, project policy since parallel query came in is that all plan node types should have outfuncs/readfuncs support, so this is clearly an oversight that should be repaired. Noted while fooling around with a patch to test outfuncs/readfuncs more thoroughly. That exposed some other issues too, but these are the only ones that seem worth back-patching. Back-patch to v10 where both of these features came in. Discussion: https://postgr.es/m/17114.1537138992@sss.pgh.pa.us
* Fix run-time partition pruning for appends with multiple source rels.Tom Lane2018-08-01
| | | | | | | | | | | | | | | | | | | | | | The previous coding here supposed that if run-time partitioning applied to a particular Append/MergeAppend plan, then all child plans of that node must be members of a single partitioning hierarchy. This is totally wrong, since an Append could be formed from a UNION ALL: we could have multiple hierarchies sharing the same Append, or child plans that aren't part of any hierarchy. To fix, restructure the related plan-time and execution-time data structures so that we can have a separate list or array for each partitioning hierarchy. Also track subplans that are not part of any hierarchy, and make sure they don't get pruned. Per reports from Phil Florent and others. Back-patch to v11, since the bug originated there. David Rowley, with a lot of cosmetic adjustments by me; thanks also to Amit Langote for review. Discussion: https://postgr.es/m/HE1PR03MB17068BB27404C90B5B788BCABA7B0@HE1PR03MB1706.eurprd03.prod.outlook.com
* Expand run-time partition pruning to work with MergeAppendHeikki Linnakangas2018-07-19
| | | | | | | | | This expands the support for the run-time partition pruning which was added for Append in 499be013de to also allow unneeded subnodes of a MergeAppend to be removed. Author: David Rowley Discussion: https://www.postgresql.org/message-id/CAKJS1f_F_V8D7Wu-HVdnH7zCUxhoGK8XhLLtd%3DCu85qDZzXrgg%40mail.gmail.com
* Assorted cosmetic cleanup of run-time-partition-pruning code.Tom Lane2018-06-10
| | | | | | | | | | Use "subplan" rather than "subnode" to refer to the child plans of a partitioning Append; this seems a bit more specific and hence clearer. Improve assorted comments. No non-cosmetic changes. David Rowley and Tom Lane Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
* Relocate partition pruning structs to a saner place.Tom Lane2018-06-10
| | | | | | | | | | | | | | | | These struct definitions were originally dropped into primnodes.h, which is a poor choice since that's mainly intended for primitive expression node types; these are not in that category. What they are is auxiliary info in Plan trees, so move them to plannodes.h. For consistency, also relocate some related code that was apparently placed with the aid of a dartboard. There's no interesting code changes in this commit, just reshuffling. David Rowley and Tom Lane Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
* Improve run-time partition pruning to handle any stable expression.Tom Lane2018-06-10
| | | | | | | | | | | | | The initial coding of the run-time-pruning feature only coped with cases where the partition key(s) are compared to Params. That is a bit silly; we can allow it to work with any non-Var-containing stable expression, as long as we take special care with expressions containing PARAM_EXEC Params. The code is hardly any longer this way, and it's considerably clearer (IMO at least). Per gripe from Pavel Stehule. David Rowley, whacked around a bit by me Discussion: https://postgr.es/m/CAFj8pRBjrufA3ocDm8o4LPGNye9Y+pm1b9kCwode4X04CULG3g@mail.gmail.com
* Fix incorrect field type for PlannedStmt.jitFlags in outfuncs/readfuncs.Tom Lane2018-04-28
| | | | | | | This field was a bool at one point, but now it's an int. Spotted by Hari Babu; trivial patch is by Ashutosh Bapat. Discussion: https://postgr.es/m/CAJrrPGedKiFE2fqntSauUfhapCksOJzam+QtHfSgx86LhXLeOQ@mail.gmail.com
* Revert MERGE patchSimon Riggs2018-04-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commits d204ef63776b8a00ca220adec23979091564e465, 83454e3c2b28141c0db01c7d2027e01040df5249 and a few more commits thereafter (complete list at the end) related to MERGE feature. While the feature was fully functional, with sufficient test coverage and necessary documentation, it was felt that some parts of the executor and parse-analyzer can use a different design and it wasn't possible to do that in the available time. So it was decided to revert the patch for PG11 and retry again in the future. Thanks again to all reviewers and bug reporters. List of commits reverted, in reverse chronological order: f1464c5380 Improve parse representation for MERGE ddb4158579 MERGE syntax diagram correction 530e69e59b Allow cpluspluscheck to pass by renaming variable 01b88b4df5 MERGE minor errata 3af7b2b0d4 MERGE fix variable warning in non-assert builds a5d86181ec MERGE INSERT allows only one VALUES clause 4b2d44031f MERGE post-commit review 4923550c20 Tab completion for MERGE aa3faa3c7a WITH support in MERGE 83454e3c2b New files for MERGE d204ef6377 MERGE SQL Command following SQL:2016 Author: Pavan Deolasee Reviewed-by: Michael Paquier
* Support partition pruning at execution timeAlvaro Herrera2018-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Existing partition pruning is only able to work at plan time, for query quals that appear in the parsed query. This is good but limiting, as there can be parameters that appear later that can be usefully used to further prune partitions. This commit adds support for pruning subnodes of Append which cannot possibly contain any matching tuples, during execution, by evaluating Params to determine the minimum set of subnodes that can possibly match. We support more than just simple Params in WHERE clauses. Support additionally includes: 1. Parameterized Nested Loop Joins: The parameter from the outer side of the join can be used to determine the minimum set of inner side partitions to scan. 2. Initplans: Once an initplan has been executed we can then determine which partitions match the value from the initplan. Partition pruning is performed in two ways. When Params external to the plan are found to match the partition key we attempt to prune away unneeded Append subplans during the initialization of the executor. This allows us to bypass the initialization of non-matching subplans meaning they won't appear in the EXPLAIN or EXPLAIN ANALYZE output. For parameters whose value is only known during the actual execution then the pruning of these subplans must wait. Subplans which are eliminated during this stage of pruning are still visible in the EXPLAIN output. In order to determine if pruning has actually taken place, the EXPLAIN ANALYZE must be viewed. If a certain Append subplan was never executed due to the elimination of the partition then the execution timing area will state "(never executed)". Whereas, if, for example in the case of parameterized nested loops, the number of loops stated in the EXPLAIN ANALYZE output for certain subplans may appear lower than others due to the subplan having been scanned fewer times. This is due to the list of matching subnodes having to be evaluated whenever a parameter which was found to match the partition key changes. This commit required some additional infrastructure that permits the building of a data structure which is able to perform the translation of the matching partition IDs, as returned by get_matching_partitions, into the list index of a subpaths list, as exist in node types such as Append, MergeAppend and ModifyTable. This allows us to translate a list of clauses into a Bitmapset of all the subpath indexes which must be included to satisfy the clause list. Author: David Rowley, based on an earlier effort by Beena Emerson Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi, Jesper Pedersen Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
* Faster partition pruningAlvaro Herrera2018-04-06
| | | | | | | | | | | | | | | | | | | | | Add a new module backend/partitioning/partprune.c, implementing a more sophisticated algorithm for partition pruning. The new module uses each partition's "boundinfo" for pruning instead of constraint exclusion, based on an idea proposed by Robert Haas of a "pruning program": a list of steps generated from the query quals which are run iteratively to obtain a list of partitions that must be scanned in order to satisfy those quals. At present, this targets planner-time partition pruning, but there exist further patches to apply partition pruning at execution time as well. This commit also moves some definitions from include/catalog/partition.h to a new file include/partitioning/partbounds.h, in an attempt to rationalize partitioning related code. Authors: Amit Langote, David Rowley, Dilip Kumar Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen. Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
* Improve parse representation for MERGESimon Riggs2018-04-06
| | | | | | | | | | | | Separation of parser data structures from executor, as requested by Tom Lane. Further improvements possible. While there, implement error for multiple VALUES clauses via parser to allow line number of error, as requested by Andres Freund. Author: Pavan Deolasee Discussion: https://www.postgresql.org/message-id/CABOikdPpqjectFchg0FyTOpsGXyPoqwgC==OLKWuxgBOsrDDZw@mail.gmail.com
* MERGE SQL Command following SQL:2016Simon Riggs2018-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows a task that would other require multiple PL statements. e.g. MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular and partitioned tables, including column and row security enforcement, as well as support for row, statement and transition triggers. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used statically from PL/pgSQL. MERGE does not yet support inheritance, write rules, RETURNING clauses, updatable views or foreign tables. MERGE follows SQL Standard per the most recent SQL:2016. Includes full tests and documentation, including full isolation tests to demonstrate the concurrent behavior. This version written from scratch in 2017 by Simon Riggs, using docs and tests originally written in 2009. Later work from Pavan Deolasee has been both complex and deep, leaving the lead author credit now in his hands. Extensive discussion of concurrency from Peter Geoghegan, with thanks for the time and effort contributed. Various issues reported via sqlsmith by Andreas Seltenreich Authors: Pavan Deolasee, Simon Riggs Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
* Revert "Modified files for MERGE"Simon Riggs2018-04-02
| | | | This reverts commit 354f13855e6381d288dfaa52bcd4f2cb0fd4a5eb.
* Modified files for MERGESimon Riggs2018-04-02
|
* C comments: "a" <--> "an" correctionsBruce Momjian2018-03-29
| | | | | | | | Reported-by: Michael Paquier, Abhijit Menon-Sen Discussion: https://postgr.es/m/20180305045854.GB2266@paquier.xyz Author: Michael Paquier, Abhijit Menon-Sen, me
* Basic planner and executor integration for JIT.Andres Freund2018-03-22
| | | | | | | | | | | | | | | | | | | | | This adds simple cost based plan time decision about whether JIT should be performed. jit_above_cost, jit_optimize_above_cost are compared with the total cost of a plan, and if the cost is above them JIT is performed / optimization is performed respectively. For that PlannedStmt and EState have a jitFlags (es_jit_flags) field that stores information about what JIT operations should be performed. EState now also has a new es_jit field, which can store a JitContext. When there are no errors the context is released in standard_ExecutorEnd(). It is likely that the default values for jit_[optimize_]above_cost will need to be adapted further, but in my test these values seem to work reasonably. Author: Andres Freund, with feedback by Peter Eisentraut Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
* Support all SQL:2011 options for window frame clauses.Tom Lane2018-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING" frame boundaries in window functions. We'd punted on that back in the original patch to add window functions, because it was not clear how to do it in a reasonably data-type-extensible fashion. That problem is resolved here by adding the ability for btree operator classes to provide an "in_range" support function that defines how to add or subtract the RANGE offset value. Factoring it this way also allows the operator class to avoid overflow problems near the ends of the datatype's range, if it wishes to expend effort on that. (In the committed patch, the integer opclasses handle that issue, but it did not seem worth the trouble to avoid overflow failures for datetime types.) The patch includes in_range support for the integer_ops opfamily (int2/int4/int8) as well as the standard datetime types. Support for other numeric types has been requested, but that seems like suitable material for a follow-on patch. In addition, the patch adds GROUPS mode which counts the offset in ORDER-BY peer groups rather than rows, and it adds the frame_exclusion options specified by SQL:2011. As far as I can see, we are now fully up to spec on window framing options. Existing behaviors remain unchanged, except that I changed the errcode for a couple of existing error reports to meet the SQL spec's expectation that negative "offset" values should be reported as SQLSTATE 22013. Internally and in relevant parts of the documentation, we now consistently use the terminology "offset PRECEDING/FOLLOWING" rather than "value PRECEDING/FOLLOWING", since the term "value" is confusingly vague. Oliver Ford, reviewed and whacked around some by me Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com
* Allow UPDATE to move rows between partitions.Robert Haas2018-01-19
| | | | | | | | | | | | | | | | | When an UPDATE causes a row to no longer match the partition constraint, try to move it to a different partition where it does match the partition constraint. In essence, the UPDATE is split into a DELETE from the old partition and an INSERT into the new one. This can lead to surprising behavior in concurrency scenarios because EvalPlanQual rechecks won't work as they normally did; the known problems are documented. (There is a pending patch to improve the situation further, but it needs more review.) Amit Khandekar, reviewed and tested by Amit Langote, David Rowley, Rajkumar Raghuwanshi, Dilip Kumar, Amul Sul, Thomas Munro, Álvaro Herrera, Amit Kapila, and me. A few final revisions by me. Discussion: http://postgr.es/m/CAJ3gD9do9o2ccQ7j7+tSgiE1REY65XRiMb=yJO3u3QhyP8EEPQ@mail.gmail.com
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Add parallel-aware hash joins.Andres Freund2017-12-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel Hash Join with Parallel Hash. While hash joins could already appear in parallel queries, they were previously always parallel-oblivious and had a partial subplan only on the outer side, meaning that the work of the inner subplan was duplicated in every worker. After this commit, the planner will consider using a partial subplan on the inner side too, using the Parallel Hash node to divide the work over the available CPU cores and combine its results in shared memory. If the join needs to be split into multiple batches in order to respect work_mem, then workers process different batches as much as possible and then work together on the remaining batches. The advantages of a parallel-aware hash join over a parallel-oblivious hash join used in a parallel query are that it: * avoids wasting memory on duplicated hash tables * avoids wasting disk space on duplicated batch files * divides the work of building the hash table over the CPUs One disadvantage is that there is some communication between the participating CPUs which might outweigh the benefits of parallelism in the case of small hash tables. This is avoided by the planner's existing reluctance to supply partial plans for small scans, but it may be necessary to estimate synchronization costs in future if that situation changes. Another is that outer batch 0 must be written to disk if multiple batches are required. A potential future advantage of parallel-aware hash joins is that right and full outer joins could be supported, since there is a single set of matched bits for each hashtable, but that is not yet implemented. A new GUC enable_parallel_hash is defined to control the feature, defaulting to on. Author: Thomas Munro Reviewed-By: Andres Freund, Robert Haas Tested-By: Rafia Sabih, Prabhat Sahu Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
* Support Parallel Append plan nodes.Robert Haas2017-12-05
| | | | | | | | | | | | | | | | | | | | When we create an Append node, we can spread out the workers over the subplans instead of piling on to each subplan one at a time, which should typically be a bit more efficient, both because the startup cost of any plan executed entirely by one worker is paid only once and also because of reduced contention. We can also construct Append plans using a mix of partial and non-partial subplans, which may allow for parallelism in places that otherwise couldn't support it. Unfortunately, this patch doesn't handle the important case of parallelizing UNION ALL by running each branch in a separate worker; the executor infrastructure is added here, but more planner work is needed. Amit Khandekar, Robert Haas, Amul Sul, reviewed and tested by Ashutosh Bapat, Amit Langote, Rafia Sabih, Amit Kapila, and Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CAJ3gD9dy0K_E8r727heqXoBmWZ83HwLFwdcaSSmBQ1+S+vRuUQ@mail.gmail.com
* Pass InitPlan values to workers via Gather (Merge).Robert Haas2017-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | If a PARAM_EXEC parameter is used below a Gather (Merge) but the InitPlan that computes it is attached to or above the Gather (Merge), force the value to be computed before starting parallelism and pass it down to all workers. This allows us to use parallelism in cases where it previously would have had to be rejected as unsafe. We do - in this case - lose the optimization that the value is only computed if it's actually used. An alternative strategy would be to have the first worker that needs the value compute it, but one downside of that approach is that we'd then need to select a parallel-safe path to compute the parameter value; it couldn't for example contain a Gather (Merge) node. At some point in the future, we might want to consider both approaches. Independent of that consideration, there is a great deal more work that could be done to make more kinds of PARAM_EXEC parameters parallel-safe. This infrastructure could be used to allow a Gather (Merge) on the inner side of a nested loop (although that's not a very appealing plan) and cases where the InitPlan is attached below the Gather (Merge) could be addressed as well using various techniques. But this is a good start. Amit Kapila, reviewed and revised by me. Reviewing and testing from Kuntal Ghosh, Haribabu Kommi, and Tushar Ahuja. Discussion: http://postgr.es/m/CAA4eK1LV0Y1AUV4cUCdC+sYOx0Z0-8NAJ2Pd9=UKsbQ5Sr7+JQ@mail.gmail.com
* Track in the plan the types associated with PARAM_EXEC parameters.Robert Haas2017-11-13
| | | | | | | | | | | | Up until now, we only tracked the number of parameters, which was sufficient to allocate an array of Datums of the appropriate size, but not sufficient to, for example, know how to serialize a Datum stored in one of those slots. An upcoming patch wants to do that, so add this tracking to make it possible. Patch by me, reviewed by Tom Lane and Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoYqpxDKn8koHdW8BEKk8FMUL0=e8m2Qe=M+r0UBjr3tuQ@mail.gmail.com
* Add hash partitioning.Robert Haas2017-11-09
| | | | | | | | | | | | | | | | | | | Hash partitioning is useful when you want to partition a growing data set evenly. This can be useful to keep table sizes reasonable, which makes maintenance operations such as VACUUM faster, or to enable partition-wise join. At present, we still depend on constraint exclusion for partitioning pruning, and the shape of the partition constraints for hash partitioning is such that that doesn't work. Work is underway to fix that, which should both improve performance and make partitioning pruning work with hash partitioning. Amul Sul, reviewed and tested by Dilip Kumar, Ashutosh Bapat, Yugo Nagata, Rajkumar Raghuwanshi, Jesper Pedersen, and by me. A few final tweaks also by me. Discussion: http://postgr.es/m/CAAJ_b96fhpJAP=ALbETmeLk1Uni_GFZD938zgenhF49qgDTjaQ@mail.gmail.com
* pg_stat_statements: Widen query IDs from 32 bits to 64 bits.Robert Haas2017-10-11
| | | | | | | | | | | | | | | | | | This takes advantage of the infrastructure introduced by commit 81c5e46c490e2426db243eada186995da5bb0ba7 to greatly reduce the likelihood that two different queries will end up with the same query ID. It's still possible, of course, but whereas before it the chances of a collision reached 25% around 50,000 queries, it will now take more than 3 billion queries. Backward incompatibility: Because the type exposed at the SQL level is int8, users may now see negative query IDs in the pg_stat_statements view (and also, query IDs more than 4 billion, which was the old limit). Patch by me, reviewed by Michael Paquier and Peter Geoghegan. Discussion: http://postgr.es/m/CA+TgmobG_Kp4cBKFmsznUAaM1GWW6hhRNiZC0KjRMOOeYnz5Yw@mail.gmail.com
* Support arrays over domains.Tom Lane2017-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allowing arrays with a domain type as their element type was left un-done in the original domain patch, but not for any very good reason. This omission leads to such surprising results as array_agg() not working on a domain column, because the parser can't identify a suitable output type for the polymorphic aggregate. In order to fix this, first clean up the APIs of coerce_to_domain() and some internal functions in parse_coerce.c so that we consistently pass around a CoercionContext along with CoercionForm. Previously, we sometimes passed an "isExplicit" boolean flag instead, which is strictly less information; and coerce_to_domain() didn't even get that, but instead had to reverse-engineer isExplicit from CoercionForm. That's contrary to the documentation in primnodes.h that says that CoercionForm only affects display and not semantics. I don't think this change fixes any live bugs, but it makes things more consistent. The main reason for doing it though is that now build_coercion_expression() receives ccontext, which it needs in order to be able to recursively invoke coerce_to_target_type(). Next, reimplement ArrayCoerceExpr so that the node does not directly know any details of what has to be done to the individual array elements while performing the array coercion. Instead, the per-element processing is represented by a sub-expression whose input is a source array element and whose output is a target array element. This simplifies life in parse_coerce.c, because it can build that sub-expression by a recursive invocation of coerce_to_target_type(). The executor now handles the per-element processing as a compiled expression instead of hard-wired code. The main advantage of this is that we can use a single ArrayCoerceExpr to handle as many as three successive steps per element: base type conversion, typmod coercion, and domain constraint checking. The old code used two stacked ArrayCoerceExprs to handle type + typmod coercion, which was pretty inefficient, and adding yet another array deconstruction to do domain constraint checking seemed very unappetizing. In the case where we just need a single, very simple coercion function, doing this straightforwardly leads to a noticeable increase in the per-array-element runtime cost. Hence, add an additional shortcut evalfunc in execExprInterp.c that skips unnecessary overhead for that specific form of expression. The runtime speed of simple cases is within 1% or so of where it was before, while cases that previously required two levels of array processing are significantly faster. Finally, create an implicit array type for every domain type, as we do for base types, enums, etc. Everything except the array-coercion case seems to just work without further effort. Tom Lane, reviewed by Andrew Dunstan Discussion: https://postgr.es/m/9852.1499791473@sss.pgh.pa.us
* Allow a partitioned table to have a default partition.Robert Haas2017-09-08
| | | | | | | | | | | | | Any tuples that don't route to any other partition will route to the default partition. Jeevan Ladhe, Beena Emerson, Ashutosh Bapat, Rahila Syed, and Robert Haas, with review and testing at various stages by (at least) Rushabh Lathia, Keith Fiske, Amit Langote, Amul Sul, Rajkumar Raghuanshi, Sven Kunze, Kyotaro Horiguchi, Thom Brown, Rafia Sabih, and Dilip Kumar. Discussion: http://postgr.es/m/CAH2L28tbN4SYyhS7YV1YBWcitkqbhSWfQCy0G=apRcC_PEO-bg@mail.gmail.com Discussion: http://postgr.es/m/CAOG9ApEYj34fWMcvBMBQ-YtqR9fTdXhdN82QEKG0SVZ6zeL1xg@mail.gmail.com
* Force rescanning of parallel-aware scan nodes below a Gather[Merge].Tom Lane2017-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ExecReScan machinery contains various optimizations for postponing or skipping rescans of plan subtrees; for example a HashAgg node may conclude that it can re-use the table it built before, instead of re-reading its input subtree. But that is wrong if the input contains a parallel-aware table scan node, since the portion of the table scanned by the leader process is likely to vary from one rescan to the next. This explains the timing-dependent buildfarm failures we saw after commit a2b70c89c. The established mechanism for showing that a plan node's output is potentially variable is to mark it as depending on some runtime Param. Hence, to fix this, invent a dummy Param (one that has a PARAM_EXEC parameter number, but carries no actual value) associated with each Gather or GatherMerge node, mark parallel-aware nodes below that node as dependent on that Param, and arrange for ExecReScanGather[Merge] to flag that Param as changed whenever the Gather[Merge] node is rescanned. This solution breaks an undocumented assumption made by the parallel executor logic, namely that all rescans of nodes below a Gather[Merge] will happen synchronously during the ReScan of the top node itself. But that's fundamentally contrary to the design of the ExecReScan code, and so was doomed to fail someday anyway (even if you want to argue that the bug being fixed here wasn't a failure of that assumption). A follow-on patch will address that issue. In the meantime, the worst that's expected to happen is that given very bad timing luck, the leader might have to do all the work during a rescan, because workers think they have nothing to do, if they are able to start up before the eventual ReScan of the leader's parallel-aware table scan node has reset the shared scan state. Although this problem exists in 9.6, there does not seem to be any way for it to manifest there. Without GatherMerge, it seems that a plan tree that has a rescan-short-circuiting node below Gather will always also have one above it that will short-circuit in the same cases, preventing the Gather from being rescanned. Hence we won't take the risk of back-patching this change into 9.6. But v10 needs it. Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
* Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition bounds.Dean Rasheed2017-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, UNBOUNDED meant no lower bound when used in the FROM list, and no upper bound when used in the TO list, which was OK for single-column range partitioning, but problematic with multiple columns. For example, an upper bound of (10.0, UNBOUNDED) would not be collocated with a lower bound of (10.0, UNBOUNDED), thus making it difficult or impossible to define contiguous multi-column range partitions in some cases. Fix this by using MINVALUE and MAXVALUE instead of UNBOUNDED to represent a partition column that is unbounded below or above respectively. This syntax removes any ambiguity, and ensures that if one partition's lower bound equals another partition's upper bound, then the partitions are contiguous. Also drop the constraint prohibiting finite values after an unbounded column, and just document the fact that any values after MINVALUE or MAXVALUE are ignored. Previously it was necessary to repeat UNBOUNDED multiple times, which was needlessly verbose. Note: Forces a post-PG 10 beta2 initdb. Report by Amul Sul, original patch by Amit Langote with some additional hacking by me. Discussion: https://postgr.es/m/CAAJ_b947mowpLdxL3jo3YLKngRjrq9+Ej4ymduQTfYR+8=YAYQ@mail.gmail.com
* Code review for NextValueExpr expression node type.Tom Lane2017-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add missing infrastructure for this node type, notably in ruleutils.c where its lack could demonstrably cause EXPLAIN to fail. Add outfuncs/readfuncs support. (outfuncs support is useful today for debugging purposes. The readfuncs support may never be needed, since at present it would only matter for parallel query and NextValueExpr should never appear in a parallelizable query; but it seems like a bad idea to have a primnode type that isn't fully supported here.) Teach planner infrastructure that NextValueExpr is a volatile, parallel-unsafe, non-leaky expression node with cost cpu_operator_cost. Given its limited scope of usage, there *might* be no live bug today from the lack of that knowledge, but it's certainly going to bite us on the rear someday. Teach pg_stat_statements about the new node type, too. While at it, also teach cost_qual_eval() that MinMaxExpr, SQLValueFunction, XmlExpr, and CoerceToDomain should be charged as cpu_operator_cost. Failing to do this for SQLValueFunction was an oversight in my commit 0bb51aa96. The others are longer-standing oversights, but no time like the present to fix them. (In principle, CoerceToDomain could have cost much higher than this, but it doesn't presently seem worth trying to examine the domain's constraints here.) Modify execExprInterp.c to execute NextValueExpr as an out-of-line function; it seems quite unlikely to me that it's worth insisting that it be inlined in all expression eval methods. Besides, providing the out-of-line function doesn't stop anyone from inlining if they want to. Adjust some places where NextValueExpr support had been inserted with the aid of a dartboard rather than keeping it in the same order as elsewhere. Discussion: https://postgr.es/m/23862.1499981661@sss.pgh.pa.us
* Fix output of char node fieldsPeter Eisentraut2017-07-05
| | | | | | | WRITE_CHAR_FIELD() didn't do any escaping, so that for example a zero byte would cause the whole output string to be truncated. To fix, pass the char through outToken(), so it is escaped like a string. Adjust the reading side to handle this.