aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor
Commit message (Collapse)AuthorAge
* ExecHashRemoveNextSkewBucket must physically copy tuples to main hashtable.Tom Lane2016-02-07
| | | | | | | | | | | | | | | | | | | | | | Commit 45f6240a8fa9d355 added an assumption in ExecHashIncreaseNumBatches and ExecHashIncreaseNumBuckets that they could find all tuples in the main hash table by iterating over the "dense storage" introduced by that patch. However, ExecHashRemoveNextSkewBucket continued its old practice of simply re-linking deleted skew tuples into the main table's hashchains. Hence, such tuples got lost during any subsequent increase in nbatch or nbuckets, and would never get joined, as reported in bug #13908 from Seth P. I (tgl) think that the aforesaid commit has got multiple design issues and should be reworked rather completely; but there is no time for that right now, so band-aid the problem by making ExecHashRemoveNextSkewBucket physically copy deleted skew tuples into the "dense storage" arena. The added test case is able to exhibit the problem by means of fooling the planner with a WHERE condition that it will underestimate the selectivity of, causing the initial nbatch estimate to be too small. Tomas Vondra and Tom Lane. Thanks to David Johnston for initial investigation into the bug report.
* Improve HJDEBUG code a bit.Tom Lane2016-02-06
| | | | | | | | | | | | | | | Commit 30d7ae3c76d2de144232ae6ab328ca86b70e72c3 introduced an HJDEBUG stanza that probably didn't compile at the time, and definitely doesn't compile now, because it refers to a nonexistent variable. It doesn't seem terribly useful anyway, so just get rid of it. While I'm fooling with it, use %z modifier instead of the obsolete hack of casting size_t to unsigned long, and include the HashJoinTable's address in each printout so that it's possible to distinguish the activities of multiple hashjoins occurring in one query. Noted while trying to use HJDEBUG to investigate bug #13908. Back-patch to 9.5, because code that doesn't compile is certainly not very helpful.
* When modifying a foreign table, initialize tableoid field properly.Robert Haas2016-02-04
| | | | | | | Failure to do this can cause AFTER ROW triggers or RETURNING expressions that reference this field to misbehave. Etsuro Fujita, reviewed by Thom Brown
* Improve some messagesPeter Eisentraut2015-12-10
|
* Fix ON CONFLICT UPDATE bug breaking AFTER UPDATE triggers.Andres Freund2015-12-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ExecOnConflictUpdate() passed t_ctid of the to-be-updated tuple to ExecUpdate(). That's problematic primarily because of two reason: First and foremost t_ctid could point to a different tuple. Secondly, and that's what triggered the complaint by Stanislav, t_ctid is changed by heap_update() to point to the new tuple version. The behavior of AFTER UPDATE triggers was therefore broken, with NEW.* and OLD.* tuples spuriously identical within AFTER UPDATE triggers. To fix both issues, pass a pointer to t_self of a on-stack HeapTuple instead. Fixing this bug lead to one change in regression tests, which previously failed due to the first issue mentioned above. There's a reasonable expectation that test fails, as it updates one row repeatedly within one INSERT ... ON CONFLICT statement. That is only possible if the second update is triggered via ON CONFLICT ... SET, ON CONFLICT ... WHERE, or by a WITH CHECK expression, as those are executed after ExecOnConflictUpdate() does a visibility check. That could easily be prohibited, but given it's allowed for plain UPDATEs and a rare corner case, it doesn't seem worthwhile. Reported-By: Stanislav Grozev Author: Andres Freund and Peter Geoghegan Discussion: CAA78GVqy1+LisN-8DygekD_Ldfy=BJLarSpjGhytOsgkpMavfQ@mail.gmail.com Backpatch: 9.5, where ON CONFLICT was introduced
* Allow foreign and custom joins to handle EvalPlanQual rechecks.Robert Haas2015-12-08
| | | | | | | | | | | | | | | | | | | | | | | Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 provided basic infrastructure for allowing a foreign data wrapper or custom scan provider to replace a join of one or more tables with a scan. However, this infrastructure failed to take into account the need for possible EvalPlanQual rechecks, and ExecScanFetch would fail an assertion (or just overwrite memory) if such a check was attempted for a plan containing a pushed-down join. To fix, adjust the EPQ machinery to skip some processing steps when scanrelid == 0, making those the responsibility of scan's recheck method, which also has the responsibility in this case of correctly populating the relevant slot. To allow foreign scans to gain control in the right place to make use of this new facility, add a new, optional RecheckForeignScan method. Also, allow a foreign scan to have a child plan, which can be used to correctly populate the slot (or perhaps for something else, but this is the only use currently envisioned). KaiGai Kohei, reviewed by Robert Haas, Etsuro Fujita, and Kyotaro Horiguchi.
* Message style improvementsPeter Eisentraut2015-10-28
| | | | | Message style, plurals, quoting, spelling, consistency with similar messages
* Allow FDWs to push down quals without breaking EvalPlanQual rechecks.Robert Haas2015-10-15
| | | | | | | | | | | | | | | | | This fixes a long-standing bug which was discovered while investigating the interaction between the new join pushdown code and the EvalPlanQual machinery: if a ForeignScan appears on the inner side of a paramaterized nestloop, an EPQ recheck would re-return the original tuple even if it no longer satisfied the pushed-down quals due to changed parameter values. This fix adds a new member to ForeignScan and ForeignScanState and a new argument to make_foreignscan, and requires changes to FDWs which push down quals to populate that new argument with a list of quals they have chosen to push down. Therefore, I'm only back-patching to 9.5, even though the bug is not new in 9.5. Etsuro Fujita, reviewed by me and by Kyotaro Horiguchi.
* Improve INSERT .. ON CONFLICT error message.Robert Haas2015-10-13
| | | | Peter Geoghegan, reviewed by me.
* Further twiddling of nodeHash.c hashtable sizing calculation.Tom Lane2015-10-04
| | | | | | | | | | | On reflection, the submitted patch didn't really work to prevent the request size from exceeding MaxAllocSize, because of the fact that we'd happily round nbuckets up to the next power of 2 after we'd limited it to max_pointers. The simplest way to enforce the limit correctly is to round max_pointers down to a power of 2 when it isn't one already. (Note that the constraint to INT_MAX / 2, if it were doing anything useful at all, is properly applied after that.)
* Fix some issues in new hashtable size calculations in nodeHash.c.Tom Lane2015-10-04
| | | | | | | | | | | | | | | | Limit the size of the hashtable pointer array to not more than MaxAllocSize, per reports from Kouhei Kaigai and others of "invalid memory alloc request size" failures. There was discussion of allowing the array to get larger than that by using the "huge" palloc API, but so far no proof that that is actually a good idea, and at this point in the 9.5 cycle major changes from old behavior don't seem like the way to go. Fix a rather serious secondary bug in the new code, which was that it didn't ensure nbuckets remained a power of 2 when recomputing it for the multiple-batch case. Clean up sloppy division of labor between ExecHashIncreaseNumBuckets and its sole call site.
* Fix ON CONFLICT DO UPDATE for tables with oids.Andres Freund2015-09-28
| | | | | | | | | | When taking the UPDATE path in an INSERT .. ON CONFLICT .. UPDATE tables with oids were not supported. The tuple generated by the update target list was projected without space for an oid - a simple oversight. Reported-By: Peter Geoghegan Author: Andres Freund Backpatch: 9.5, where ON CONFLICT was introduced
* RLS refactoringStephen Frost2015-09-15
| | | | | | | | | | | | | | | | This refactors rewrite/rowsecurity.c to simplify the handling of the default deny case (reducing the number of places where we check for and add the default deny policy from three to one) by splitting up the retrival of the policies from the application of them. This also allowed us to do away with the policy_id field. A policy_name field was added for WithCheckOption policies and is used in error reporting, when available. Patch by Dean Rasheed, with various mostly cosmetic changes by me. Back-patch to 9.5 where RLS was introduced to avoid unnecessary differences, since we're still in alpha, per discussion with Robert.
* Avoid O(N^2) behavior when enlarging SPI tuple table in spi_printtup().Tom Lane2015-08-21
| | | | | | | | | | | | | For no obvious reason, spi_printtup() was coded to enlarge the tuple pointer table by just 256 slots at a time, rather than doubling the size at each reallocation, as is our usual habit. For very large SPI results, this makes for O(N^2) time spent in repalloc(), which of course soon comes to dominate the runtime. Use the standard doubling approach instead. This is a longstanding performance bug, so back-patch to all active branches. Neil Conway
* Fix bug in calculations of hash join buckets.Kevin Grittner2015-08-19
| | | | | | | | | | | | | | Commit 8cce08f168481c5fc5be4e7e29b968e314f1b41e used a left-shift on a literal of 1 that could (in large allocations) be shifted by 31 or more bits. This was assigned to a local variable that was already declared to be a long to protect against overruns of int, but the literal in this shift needs to be declared long to allow it to work correctly in some compilers. Backpatch to 9.5, where the bug was introduced. Report and patch by KaiGai Kohei, slighly modified based on discussion.
* Correct type of waitMode variable in ExecInsertIndexTuples().Andres Freund2015-08-15
| | | | | | | | | It was a bool, even though it should be CEOUC_WAIT_MODE. That's unlikely to have a negative effect with the current definition of bool (char), but it's definitely wrong. Discussion: 20150812084351.GD8470@awork2.anarazel.de Backpatch: 9.5, where ON CONFLICT was merged
* Fix a number of places that produced XX000 errors in the regression tests.Tom Lane2015-08-02
| | | | | | | | | | | | | | | | | | | | It's against project policy to use elog() for user-facing errors, or to omit an errcode() selection for errors that aren't supposed to be "can't happen" cases. Fix all the violations of this policy that result in ERRCODE_INTERNAL_ERROR log entries during the standard regression tests, as errors that can reliably be triggered from SQL surely should be considered user-facing. I also looked through all the files touched by this commit and fixed other nearby problems of the same ilk. I do not claim to have fixed all violations of the policy, just the ones in these files. In a few places I also changed existing ERRCODE choices that didn't seem particularly appropriate; mainly replacing ERRCODE_SYNTAX_ERROR by something more specific. Back-patch to 9.5, but no further; changing ERRCODE assignments in stable branches doesn't seem like a good idea.
* Avoid some zero-divide hazards in the planner.Tom Lane2015-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | Although I think on all modern machines floating division by zero results in Infinity not SIGFPE, we still don't want infinities running around in the planner's costing estimates; too much risk of that leading to insane behavior. grouping_planner() failed to consider the possibility that final_rel might be known dummy and hence have zero rowcount. (I wonder if it would be better to set a rows estimate of 1 for dummy relations? But at least in the back branches, changing this convention seems like a bad idea, so I'll leave that for another day.) Make certain that get_variable_numdistinct() produces a nonzero result. The case that can be shown to be broken is with stadistinct < 0.0 and small ntuples; we did not prevent the result from rounding to zero. For good luck I applied clamp_row_est() to all the nonconstant return values. In ExecChooseHashTableSize(), Assert that we compute positive nbuckets and nbatch. I know of no reason to think this isn't the case, but it seems like a good safety check. Per reports from Piotr Stefaniak. Back-patch to all active branches.
* Plug RLS related information leak in pg_stats view.Joe Conway2015-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The pg_stats view is supposed to be restricted to only show rows about tables the user can read. However, it sometimes can leak information which could not otherwise be seen when row level security is enabled. Fix that by not showing pg_stats rows to users that would be subject to RLS on the table the row is related to. This is done by creating/using the newly introduced SQL visible function, row_security_active(). Along the way, clean up three call sites of check_enable_rls(). The second argument of that function should only be specified as other than InvalidOid when we are checking as a different user than the current one, as in when querying through a view. These sites were passing GetUserId() instead of InvalidOid, which can cause the function to return incorrect results if the current user has the BYPASSRLS privilege and row_security has been set to OFF. Additionally fix a bug causing RI Trigger error messages to unintentionally leak information when RLS is enabled, and other minor cleanup and improvements. Also add WITH (security_barrier) to the definition of pg_stats. Bumped CATVERSION due to new SQL functions and pg_stats view definition. Back-patch to 9.5 where RLS was introduced. Reported by Yaroslav. Patch by Joe Conway and Dean Rasheed with review and input by Michael Paquier and Stephen Frost.
* Remove false comment about speculative insertion.Heikki Linnakangas2015-07-27
| | | | | | | | There is no full discussion of speculative insertions in the executor README. There is a high-level explanation in execIndexing.c, but it doesn't seem necessary to refer it from here. Peter Geoghegan
* Redesign tablesample method API, and do extensive code review.Tom Lane2015-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original implementation of TABLESAMPLE modeled the tablesample method API on index access methods, which wasn't a good choice because, without specialized DDL commands, there's no way to build an extension that can implement a TSM. (Raw inserts into system catalogs are not an acceptable thing to do, because we can't undo them during DROP EXTENSION, nor will pg_upgrade behave sanely.) Instead adopt an API more like procedural language handlers or foreign data wrappers, wherein the only SQL-level support object needed is a single handler function identified by having a special return type. This lets us get rid of the supporting catalog altogether, so that no custom DDL support is needed for the feature. Adjust the API so that it can support non-constant tablesample arguments (the original coding assumed we could evaluate the argument expressions at ExecInitSampleScan time, which is undesirable even if it weren't outright unsafe), and discourage sampling methods from looking at invisible tuples. Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable within and across queries, as required by the SQL standard, and deal more honestly with methods that can't support that requirement. Make a full code-review pass over the tablesample additions, and fix assorted bugs, omissions, infelicities, and cosmetic issues (such as failure to put the added code stanzas in a consistent ordering). Improve EXPLAIN's output of tablesample plans, too. Back-patch to 9.5 so that we don't have to support the original API in production.
* Fix rescan of IndexScan node with the new lossy GiST distance functions.Heikki Linnakangas2015-05-25
| | | | | | Must reset the "reached end" flag and reorder queue at rescan. Per report from Regina Obe, bug #13349
* Manual cleanup of pgindent results.Tom Lane2015-05-24
| | | | | | Fix some places where pgindent did silly stuff, often because project style wasn't followed to begin with. (I've not touched the atomics headers, though.)
* pgindent run for 9.5Bruce Momjian2015-05-23
|
* Add error check for lossy distance functions in index-only scans.Tom Lane2015-05-23
| | | | | Maybe we should actually support this, but for the moment let's just throw an error if the opclass tries it.
* Still more fixes for lossy-GiST-distance-functions patch.Tom Lane2015-05-23
| | | | | | Fix confusion in documentation, substantial memory leakage if float8 or float4 are pass-by-reference, and assorted comments that were obsoleted by commit 98edd617f3b62a02cb2df9b418fcc4ece45c7ec0.
* More fixes for lossy-GiST-distance-functions patch.Tom Lane2015-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Paul Ramsey reported that commit 35fcb1b3d038a501f3f4c87c05630095abaaadab induced a core dump on commuted ORDER BY expressions, because it was assuming that the indexorderby expression could be found verbatim in the relevant equivalence class, but it wasn't there. We really don't need anything that complicated anyway; for the data types likely to be used for index ORDER BY operators in the foreseeable future, the exprType() of the ORDER BY expression will serve fine. (The case where we'd have to work harder is where the ORDER BY expression's result is only binary-compatible with the declared input type of the ordering operator; long before worrying about that, one would need to get rid of GiST's hard-wired assumption that said datatype is float8.) Aside from fixing that crash and adding a regression test for the case, I did some desultory code review: nodeIndexscan.c was likewise overthinking how hard it ought to work to identify the datatype of the ORDER BY expressions. Add comments explaining how come nodeIndexscan.c can get away with simplifying assumptions about NULLS LAST ordering and no backward scan. Revert no-longer-needed changes of find_ec_member_for_tle(); while the new definition was no worse than the old, it wasn't better either, and it might cause back-patching pain. Revert entirely bogus additions to genam.h.
* Collection of typo fixes.Heikki Linnakangas2015-05-20
| | | | | | | | | | | | | | | Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.
* Attach ON CONFLICT SET ... WHERE to the correct planstate.Andres Freund2015-05-19
| | | | | | | | | | The previous coding was a leftover from attempting to hang all the on conflict logic onto modify table's child nodes. It appears to not have actually caused problems except for explain. Add test exercising the broken and some other code paths. Author: Peter Geoghegan and Andres Freund
* Fix typo in comment.Heikki Linnakangas2015-05-18
| | | | Jim Nasby
* Fix failure to copy IndexScan.indexorderbyops in copyfuncs.c.Tom Lane2015-05-17
| | | | | | | | | | | | | | | This oversight results in a crash at executor startup if the plan has been copied. outfuncs.c was missed as well. While we could probably have taught both those files to cope with the originally chosen representation of an Oid array, it would have been painful, not least because there'd be no easy way to verify the array length. An Oid List is far easier to work with. And AFAICS, there is no particular notational benefit to using an array rather than a list in the existing parts of the patch either. So just change it to a list. Error in commit 35fcb1b3d038a501f3f4c87c05630095abaaadab, which is new, so no need for back-patch.
* Support GROUPING SETS, CUBE and ROLLUP.Andres Freund2015-05-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
* TABLESAMPLE, SQL Standard and extensibleSimon Riggs2015-05-15
| | | | | | | | | | | | | | Add a TABLESAMPLE clause to SELECT statements that allows user to specify random BERNOULLI sampling or block level SYSTEM sampling. Implementation allows for extensible sampling functions to be written, using a standard API. Basic version follows SQLStandard exactly. Usable concrete use cases for the sampling API follow in later commits. Petr Jelinek Reviewed by Michael Paquier and Simon Riggs
* Fix datatype confusion with the new lossy GiST distance functions.Heikki Linnakangas2015-05-15
| | | | | | | | | | | | | | | | | | | | | | | | We can only support a lossy distance function when the distance function's datatype is comparable with the original ordering operator's datatype. The distance function always returns a float8, so we are limited to float8, and float4 (by a hard-coded cast of the float8 to float4). In light of this limitation, it seems like a good idea to have a separate 'recheck' flag for the ORDER BY expressions, so that if you have a non-lossy distance function, it still works with lossy quals. There are cases like that with the build-in or contrib opclasses, but it's plausible. There was a hidden assumption that the ORDER BY values returned by GiST match the original ordering operator's return type, but there are plenty of examples where that's not true, e.g. in btree_gist and pg_trgm. As long as the distance function is not lossy, we can tolerate that and just not return the distance to the executor (or rather, always return NULL). The executor doesn't need the distances if there are no lossy results. There was another little bug: the recheck variable was not initialized before calling the distance function. That revealed the bigger issue, as the executor tried to reorder tuples that didn't need reordering, and that failed because of the datatype mismatch.
* Allow GiST distance function to return merely a lower-bound.Heikki Linnakangas2015-05-15
| | | | | | | | | | | The distance function can now set *recheck = false, like index quals. The executor will then re-check the ORDER BY expressions, and use a queue to reorder the results on the fly. This makes it possible to do kNN-searches on polygons and circles, which don't store the exact value in the index, but just a bounding box. Alexander Korotkov and me
* Support "expanded" objects, particularly arrays, for better performance.Tom Lane2015-05-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the ability for complex datatypes to have an in-memory representation that is different from their on-disk format. On-disk formats are typically optimized for minimal size, and in any case they can't contain pointers, so they are often not well-suited for computation. Now a datatype can invent an "expanded" in-memory format that is better suited for its operations, and then pass that around among the C functions that operate on the datatype. There are also provisions (rudimentary as yet) to allow an expanded object to be modified in-place under suitable conditions, so that operations like assignment to an element of an array need not involve copying the entire array. The initial application for this feature is arrays, but it is not hard to foresee using it for other container types like JSON, XML and hstore. I have hopes that it will be useful to PostGIS as well. In this initial implementation, a few heuristics have been hard-wired into plpgsql to improve performance for arrays that are stored in plpgsql variables. We would like to generalize those hacks so that other datatypes can obtain similar improvements, but figuring out some appropriate APIs is left as a task for future work. (The heuristics themselves are probably not optimal yet, either, as they sometimes force expansion of arrays that would be better left alone.) Preliminary performance testing shows impressive speed gains for plpgsql functions that do element-by-element access or update of large arrays. There are other cases that get a little slower, as a result of added array format conversions; but we can hope to improve anything that's annoyingly bad. In any case most applications should see a net win. Tom Lane, reviewed by Andres Freund
* Extend abbreviated key infrastructure to datum tuplesorts.Robert Haas2015-05-13
| | | | Andrew Gierth, reviewed by Peter Geoghegan and by me.
* Fix postgres_fdw to return the right ctid value in EvalPlanQual cases.Tom Lane2015-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a postgres_fdw foreign table is a non-locked source relation in an UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, and the query selects its ctid column, the wrong value would be returned if an EvalPlanQual recheck occurred. This happened because the foreign table's result row was copied via the ROW_MARK_COPY code path, and EvalPlanQualFetchRowMarks just unconditionally set the reconstructed tuple's t_self to "invalid". To fix that, we can have EvalPlanQualFetchRowMarks copy the composite datum's t_ctid field, and be sure to initialize that along with t_self when postgres_fdw constructs a tuple to return. If we just did that much then EvalPlanQualFetchRowMarks would start returning "(0,0)" as ctid for all other ROW_MARK_COPY cases, which perhaps does not matter much, but then again maybe it might. The cause of that is that heap_form_tuple, which is the ultimate source of all composite datums, simply leaves t_ctid as zeroes in newly constructed tuples. That seems like a bad idea on general principles: a field that's really not been initialized shouldn't appear to have a valid value. So let's eat the trivial additional overhead of doing "ItemPointerSetInvalid(&(td->t_ctid))" in heap_form_tuple. This closes out our handling of Etsuro Fujita's report that tableoid and ctid weren't correctly set in postgres_fdw EvalPlanQual cases. Along the way we did a great deal of work to improve FDWs' ability to control row locking behavior; which was not wasted effort by any means, but it didn't end up being a fix for this problem because that feature would be too expensive for postgres_fdw to use all the time. Although the fix for the tableoid misbehavior was back-patched, I'm hesitant to do so here; it seems far less likely that people would care about remote ctid than tableoid, and even such a minor behavioral change as this in heap_form_tuple is perhaps best not back-patched. So commit to HEAD only, at least for the moment. Etsuro Fujita, with some adjustments by me
* Fix ON CONFLICT bugs that manifest when used in rules.Andres Freund2015-05-13
| | | | | | | | | | | | | | Specifically the tlist and rti of the pseudo "excluded" relation weren't properly treated by expression_tree_walker, which lead to errors when excluded was referenced inside a rule because the varnos where not properly adjusted. Similar omissions in OffsetVarNodes and expression_tree_mutator had less impact, but should obviously be fixed nonetheless. A couple tests of for ON CONFLICT UPDATE into INSERT rule bearing relations have been added. In passing I updated a couple comments.
* Add support for doing late row locking in FDWs.Tom Lane2015-05-12
| | | | | | | | | | | | | | | | | | | | | Previously, FDWs could only do "early row locking", that is lock a row as soon as it's fetched, even though local restriction/join conditions might discard the row later. This patch adds callbacks that allow FDWs to do late locking in the same way that it's done for regular tables. To make use of this feature, an FDW must support the "ctid" column as a unique row identifier. Currently, since ctid has to be of type TID, the feature is of limited use, though in principle it could be used by postgres_fdw. We may eventually allow FDWs to specify another data type for ctid, which would make it possible for more FDWs to use this feature. This commit does not modify postgres_fdw to use late locking. We've tested some prototype code for that, but it's not in committable shape, and besides it's quite unclear whether it actually makes sense to do late locking against a remote server. The extra round trips required are likely to outweigh any benefit from improved concurrency. Etsuro Fujita, reviewed by Ashutosh Bapat, and hacked up a lot by me
* Code review for foreign/custom join pushdown patch.Tom Lane2015-05-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 included some design decisions that seem pretty questionable to me, and there was quite a lot of stuff not to like about the documentation and comments. Clean up as follows: * Consider foreign joins only between foreign tables on the same server, rather than between any two foreign tables with the same underlying FDW handler function. In most if not all cases, the FDW would simply have had to apply the same-server restriction itself (far more expensively, both for lack of caching and because it would be repeated for each combination of input sub-joins), or else risk nasty bugs. Anyone who's really intent on doing something outside this restriction can always use the set_join_pathlist_hook. * Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist to better reflect what they're for, and allow these custom scan tlists to be used even for base relations. * Change make_foreignscan() API to include passing the fdw_scan_tlist value, since the FDW is required to set that. Backwards compatibility doesn't seem like an adequate reason to expect FDWs to set it in some ad-hoc extra step, and anyway existing FDWs can just pass NIL. * Change the API of path-generating subroutines of add_paths_to_joinrel, and in particular that of GetForeignJoinPaths and set_join_pathlist_hook, so that various less-used parameters are passed in a struct rather than as separate parameter-list entries. The objective here is to reduce the probability that future additions to those parameter lists will result in source-level API breaks for users of these hooks. It's possible that this is even a small win for the core code, since most CPU architectures can't pass more than half a dozen parameters efficiently anyway. I kept root, joinrel, outerrel, innerrel, and jointype as separate parameters to reduce code churn in joinpath.c --- in particular, putting jointype into the struct would have been problematic because of the subroutines' habit of changing their local copies of that variable. * Avoid ad-hocery in ExecAssignScanProjectionInfo. It was probably all right for it to know about IndexOnlyScan, but if the list is to grow we should refactor the knowledge out to the callers. * Restore nodeForeignscan.c's previous use of the relcache to avoid extra GetFdwRoutine lookups for base-relation scans. * Lots of cleanup of documentation and missed comments. Re-order some code additions into more logical places.
* Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.Andres Freund2015-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The newly added ON CONFLICT clause allows to specify an alternative to raising a unique or exclusion constraint violation error when inserting. ON CONFLICT refers to constraints that can either be specified using a inference clause (by specifying the columns of a unique constraint) or by naming a unique or exclusion constraint. DO NOTHING avoids the constraint violation, without touching the pre-existing row. DO UPDATE SET ... [WHERE ...] updates the pre-existing tuple, and has access to both the tuple proposed for insertion and the existing tuple; the optional WHERE clause can be used to prevent an update from being executed. The UPDATE SET and WHERE clauses have access to the tuple proposed for insertion using the "magic" EXCLUDED alias, and to the pre-existing tuple using the table name or its alias. This feature is often referred to as upsert. This is implemented using a new infrastructure called "speculative insertion". It is an optimistic variant of regular insertion that first does a pre-check for existing tuples and then attempts an insert. If a violating tuple was inserted concurrently, the speculatively inserted tuple is deleted and a new attempt is made. If the pre-check finds a matching tuple the alternative DO NOTHING or DO UPDATE action is taken. If the insertion succeeds without detecting a conflict, the tuple is deemed inserted. To handle the possible ambiguity between the excluded alias and a table named excluded, and for convenience with long relation names, INSERT INTO now can alias its target table. Bumps catversion as stored rules change. Author: Peter Geoghegan, with significant contributions from Heikki Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes. Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs, Dean Rasheed, Stephen Frost and many others.
* Represent columns requiring insert and update privileges indentently.Andres Freund2015-05-08
| | | | | | | | | | | | | | | | | | | Previously, relation range table entries used a single Bitmapset field representing which columns required either UPDATE or INSERT privileges, despite the fact that INSERT and UPDATE privileges are separately cataloged, and may be independently held. As statements so far required either insert or update privileges but never both, that was sufficient. The required permission could be inferred from the top level statement run. The upcoming INSERT ... ON CONFLICT UPDATE feature needs to independently check for both privileges in one statement though, so that is not sufficient anymore. Bumps catversion as stored rules change. Author: Peter Geoghegan Reviewed-By: Andres Freund
* Use outerPlanState macro instead of referring to leffttree.Robert Haas2015-05-04
| | | | | | | This makes the executor code more consistent. It also removes an apparently superfluous NULL test in nodeGroup.c. Qingqing Zhou, reviewed by Tom Lane, and further revised by me.
* Allow FDWs and custom scan providers to replace joins with scans.Robert Haas2015-05-01
| | | | | | | | | | | | | | | | | Foreign data wrappers can use this capability for so-called "join pushdown"; that is, instead of executing two separate foreign scans and then joining the results locally, they can generate a path which performs the join on the remote server and then is scanned locally. This commit does not extend postgres_fdw to take advantage of this capability; it just provides the infrastructure. Custom scan providers can use this in a similar way. Previously, it was only possible for a custom scan provider to scan a single relation. Now, it can scan an entire join tree, provided of course that it knows how to produce the same results that the join would have produced if executed normally. KaiGai Kohei, reviewed by Shigeru Hanada, Ashutosh Bapat, and me.
* Create an infrastructure for parallel computation in PostgreSQL.Robert Haas2015-04-30
| | | | | | | | | | | | | | | | | This does four basic things. First, it provides convenience routines to coordinate the startup and shutdown of parallel workers. Second, it synchronizes various pieces of state (e.g. GUCs, combo CID mappings, transaction snapshot) from the parallel group leader to the worker processes. Third, it prohibits various operations that would result in unsafe changes to that state while parallelism is active. Finally, it propagates events that would result in an ErrorResponse, NoticeResponse, or NotifyResponse message being sent to the client from the parallel workers back to the master, from which they can then be sent on to the client. Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke. Suggestions and review from Andres Freund, Heikki Linnakangas, Noah Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
* Fix various typos and grammar errors in comments.Andres Freund2015-04-26
| | | | | Author: Dmitriy Olshevskiy Discussion: 553D00A6.4090205@bk.ru
* Perform RLS WITH CHECK before constraints, etcStephen Frost2015-04-24
| | | | | | | | | | | | | | | | | | | | | | | The RLS capability is built on top of the WITH CHECK OPTION system which was added for auto-updatable views, however, unlike WCOs on views (which are mandated by the SQL spec to not fire until after all other constraints and checks are done), it makes much more sense for RLS checks to happen earlier than constraint and uniqueness checks. This patch reworks the structure which holds the WCOs a bit to be explicitly either VIEW or RLS checks and the RLS-related checks are done prior to the constraint and uniqueness checks. This also allows better error reporting as we are now reporting when a violation is due to a WITH CHECK OPTION and when it's due to an RLS policy violation, which was independently noted by Craig Ringer as being confusing. The documentation is also updated to include a paragraph about when RLS WITH CHECK handling is performed, as there have been a number of questions regarding that and the documentation was previously silent on the matter. Author: Dean Rasheed, with some kabitzing and comment changes by me.
* Add comments explaining how unique and exclusion constraints are enforced.Heikki Linnakangas2015-04-24
|
* Move functions related to index maintenance to separate source file.Heikki Linnakangas2015-04-24
| | | | | There is enough code here to deserve a file of their own, not be buried in the middle of execUtils.c.