aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor
Commit message (Collapse)AuthorAge
...
* Avoid O(N^2) behavior when enlarging SPI tuple table in spi_printtup().Tom Lane2015-08-21
| | | | | | | | | | | | | For no obvious reason, spi_printtup() was coded to enlarge the tuple pointer table by just 256 slots at a time, rather than doubling the size at each reallocation, as is our usual habit. For very large SPI results, this makes for O(N^2) time spent in repalloc(), which of course soon comes to dominate the runtime. Use the standard doubling approach instead. This is a longstanding performance bug, so back-patch to all active branches. Neil Conway
* Fix bug in calculations of hash join buckets.Kevin Grittner2015-08-19
| | | | | | | | | | | | | | Commit 8cce08f168481c5fc5be4e7e29b968e314f1b41e used a left-shift on a literal of 1 that could (in large allocations) be shifted by 31 or more bits. This was assigned to a local variable that was already declared to be a long to protect against overruns of int, but the literal in this shift needs to be declared long to allow it to work correctly in some compilers. Backpatch to 9.5, where the bug was introduced. Report and patch by KaiGai Kohei, slighly modified based on discussion.
* Correct type of waitMode variable in ExecInsertIndexTuples().Andres Freund2015-08-15
| | | | | | | | | It was a bool, even though it should be CEOUC_WAIT_MODE. That's unlikely to have a negative effect with the current definition of bool (char), but it's definitely wrong. Discussion: 20150812084351.GD8470@awork2.anarazel.de Backpatch: 9.5, where ON CONFLICT was merged
* Fix a number of places that produced XX000 errors in the regression tests.Tom Lane2015-08-02
| | | | | | | | | | | | | | | | | | | | It's against project policy to use elog() for user-facing errors, or to omit an errcode() selection for errors that aren't supposed to be "can't happen" cases. Fix all the violations of this policy that result in ERRCODE_INTERNAL_ERROR log entries during the standard regression tests, as errors that can reliably be triggered from SQL surely should be considered user-facing. I also looked through all the files touched by this commit and fixed other nearby problems of the same ilk. I do not claim to have fixed all violations of the policy, just the ones in these files. In a few places I also changed existing ERRCODE choices that didn't seem particularly appropriate; mainly replacing ERRCODE_SYNTAX_ERROR by something more specific. Back-patch to 9.5, but no further; changing ERRCODE assignments in stable branches doesn't seem like a good idea.
* Avoid some zero-divide hazards in the planner.Tom Lane2015-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | Although I think on all modern machines floating division by zero results in Infinity not SIGFPE, we still don't want infinities running around in the planner's costing estimates; too much risk of that leading to insane behavior. grouping_planner() failed to consider the possibility that final_rel might be known dummy and hence have zero rowcount. (I wonder if it would be better to set a rows estimate of 1 for dummy relations? But at least in the back branches, changing this convention seems like a bad idea, so I'll leave that for another day.) Make certain that get_variable_numdistinct() produces a nonzero result. The case that can be shown to be broken is with stadistinct < 0.0 and small ntuples; we did not prevent the result from rounding to zero. For good luck I applied clamp_row_est() to all the nonconstant return values. In ExecChooseHashTableSize(), Assert that we compute positive nbuckets and nbatch. I know of no reason to think this isn't the case, but it seems like a good safety check. Per reports from Piotr Stefaniak. Back-patch to all active branches.
* Plug RLS related information leak in pg_stats view.Joe Conway2015-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The pg_stats view is supposed to be restricted to only show rows about tables the user can read. However, it sometimes can leak information which could not otherwise be seen when row level security is enabled. Fix that by not showing pg_stats rows to users that would be subject to RLS on the table the row is related to. This is done by creating/using the newly introduced SQL visible function, row_security_active(). Along the way, clean up three call sites of check_enable_rls(). The second argument of that function should only be specified as other than InvalidOid when we are checking as a different user than the current one, as in when querying through a view. These sites were passing GetUserId() instead of InvalidOid, which can cause the function to return incorrect results if the current user has the BYPASSRLS privilege and row_security has been set to OFF. Additionally fix a bug causing RI Trigger error messages to unintentionally leak information when RLS is enabled, and other minor cleanup and improvements. Also add WITH (security_barrier) to the definition of pg_stats. Bumped CATVERSION due to new SQL functions and pg_stats view definition. Back-patch to 9.5 where RLS was introduced. Reported by Yaroslav. Patch by Joe Conway and Dean Rasheed with review and input by Michael Paquier and Stephen Frost.
* Remove false comment about speculative insertion.Heikki Linnakangas2015-07-27
| | | | | | | | There is no full discussion of speculative insertions in the executor README. There is a high-level explanation in execIndexing.c, but it doesn't seem necessary to refer it from here. Peter Geoghegan
* Redesign tablesample method API, and do extensive code review.Tom Lane2015-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original implementation of TABLESAMPLE modeled the tablesample method API on index access methods, which wasn't a good choice because, without specialized DDL commands, there's no way to build an extension that can implement a TSM. (Raw inserts into system catalogs are not an acceptable thing to do, because we can't undo them during DROP EXTENSION, nor will pg_upgrade behave sanely.) Instead adopt an API more like procedural language handlers or foreign data wrappers, wherein the only SQL-level support object needed is a single handler function identified by having a special return type. This lets us get rid of the supporting catalog altogether, so that no custom DDL support is needed for the feature. Adjust the API so that it can support non-constant tablesample arguments (the original coding assumed we could evaluate the argument expressions at ExecInitSampleScan time, which is undesirable even if it weren't outright unsafe), and discourage sampling methods from looking at invisible tuples. Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable within and across queries, as required by the SQL standard, and deal more honestly with methods that can't support that requirement. Make a full code-review pass over the tablesample additions, and fix assorted bugs, omissions, infelicities, and cosmetic issues (such as failure to put the added code stanzas in a consistent ordering). Improve EXPLAIN's output of tablesample plans, too. Back-patch to 9.5 so that we don't have to support the original API in production.
* Fix rescan of IndexScan node with the new lossy GiST distance functions.Heikki Linnakangas2015-05-25
| | | | | | Must reset the "reached end" flag and reorder queue at rescan. Per report from Regina Obe, bug #13349
* Manual cleanup of pgindent results.Tom Lane2015-05-24
| | | | | | Fix some places where pgindent did silly stuff, often because project style wasn't followed to begin with. (I've not touched the atomics headers, though.)
* pgindent run for 9.5Bruce Momjian2015-05-23
|
* Add error check for lossy distance functions in index-only scans.Tom Lane2015-05-23
| | | | | Maybe we should actually support this, but for the moment let's just throw an error if the opclass tries it.
* Still more fixes for lossy-GiST-distance-functions patch.Tom Lane2015-05-23
| | | | | | Fix confusion in documentation, substantial memory leakage if float8 or float4 are pass-by-reference, and assorted comments that were obsoleted by commit 98edd617f3b62a02cb2df9b418fcc4ece45c7ec0.
* More fixes for lossy-GiST-distance-functions patch.Tom Lane2015-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Paul Ramsey reported that commit 35fcb1b3d038a501f3f4c87c05630095abaaadab induced a core dump on commuted ORDER BY expressions, because it was assuming that the indexorderby expression could be found verbatim in the relevant equivalence class, but it wasn't there. We really don't need anything that complicated anyway; for the data types likely to be used for index ORDER BY operators in the foreseeable future, the exprType() of the ORDER BY expression will serve fine. (The case where we'd have to work harder is where the ORDER BY expression's result is only binary-compatible with the declared input type of the ordering operator; long before worrying about that, one would need to get rid of GiST's hard-wired assumption that said datatype is float8.) Aside from fixing that crash and adding a regression test for the case, I did some desultory code review: nodeIndexscan.c was likewise overthinking how hard it ought to work to identify the datatype of the ORDER BY expressions. Add comments explaining how come nodeIndexscan.c can get away with simplifying assumptions about NULLS LAST ordering and no backward scan. Revert no-longer-needed changes of find_ec_member_for_tle(); while the new definition was no worse than the old, it wasn't better either, and it might cause back-patching pain. Revert entirely bogus additions to genam.h.
* Collection of typo fixes.Heikki Linnakangas2015-05-20
| | | | | | | | | | | | | | | Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.
* Attach ON CONFLICT SET ... WHERE to the correct planstate.Andres Freund2015-05-19
| | | | | | | | | | The previous coding was a leftover from attempting to hang all the on conflict logic onto modify table's child nodes. It appears to not have actually caused problems except for explain. Add test exercising the broken and some other code paths. Author: Peter Geoghegan and Andres Freund
* Fix typo in comment.Heikki Linnakangas2015-05-18
| | | | Jim Nasby
* Fix failure to copy IndexScan.indexorderbyops in copyfuncs.c.Tom Lane2015-05-17
| | | | | | | | | | | | | | | This oversight results in a crash at executor startup if the plan has been copied. outfuncs.c was missed as well. While we could probably have taught both those files to cope with the originally chosen representation of an Oid array, it would have been painful, not least because there'd be no easy way to verify the array length. An Oid List is far easier to work with. And AFAICS, there is no particular notational benefit to using an array rather than a list in the existing parts of the patch either. So just change it to a list. Error in commit 35fcb1b3d038a501f3f4c87c05630095abaaadab, which is new, so no need for back-patch.
* Support GROUPING SETS, CUBE and ROLLUP.Andres Freund2015-05-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
* TABLESAMPLE, SQL Standard and extensibleSimon Riggs2015-05-15
| | | | | | | | | | | | | | Add a TABLESAMPLE clause to SELECT statements that allows user to specify random BERNOULLI sampling or block level SYSTEM sampling. Implementation allows for extensible sampling functions to be written, using a standard API. Basic version follows SQLStandard exactly. Usable concrete use cases for the sampling API follow in later commits. Petr Jelinek Reviewed by Michael Paquier and Simon Riggs
* Fix datatype confusion with the new lossy GiST distance functions.Heikki Linnakangas2015-05-15
| | | | | | | | | | | | | | | | | | | | | | | | We can only support a lossy distance function when the distance function's datatype is comparable with the original ordering operator's datatype. The distance function always returns a float8, so we are limited to float8, and float4 (by a hard-coded cast of the float8 to float4). In light of this limitation, it seems like a good idea to have a separate 'recheck' flag for the ORDER BY expressions, so that if you have a non-lossy distance function, it still works with lossy quals. There are cases like that with the build-in or contrib opclasses, but it's plausible. There was a hidden assumption that the ORDER BY values returned by GiST match the original ordering operator's return type, but there are plenty of examples where that's not true, e.g. in btree_gist and pg_trgm. As long as the distance function is not lossy, we can tolerate that and just not return the distance to the executor (or rather, always return NULL). The executor doesn't need the distances if there are no lossy results. There was another little bug: the recheck variable was not initialized before calling the distance function. That revealed the bigger issue, as the executor tried to reorder tuples that didn't need reordering, and that failed because of the datatype mismatch.
* Allow GiST distance function to return merely a lower-bound.Heikki Linnakangas2015-05-15
| | | | | | | | | | | The distance function can now set *recheck = false, like index quals. The executor will then re-check the ORDER BY expressions, and use a queue to reorder the results on the fly. This makes it possible to do kNN-searches on polygons and circles, which don't store the exact value in the index, but just a bounding box. Alexander Korotkov and me
* Support "expanded" objects, particularly arrays, for better performance.Tom Lane2015-05-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the ability for complex datatypes to have an in-memory representation that is different from their on-disk format. On-disk formats are typically optimized for minimal size, and in any case they can't contain pointers, so they are often not well-suited for computation. Now a datatype can invent an "expanded" in-memory format that is better suited for its operations, and then pass that around among the C functions that operate on the datatype. There are also provisions (rudimentary as yet) to allow an expanded object to be modified in-place under suitable conditions, so that operations like assignment to an element of an array need not involve copying the entire array. The initial application for this feature is arrays, but it is not hard to foresee using it for other container types like JSON, XML and hstore. I have hopes that it will be useful to PostGIS as well. In this initial implementation, a few heuristics have been hard-wired into plpgsql to improve performance for arrays that are stored in plpgsql variables. We would like to generalize those hacks so that other datatypes can obtain similar improvements, but figuring out some appropriate APIs is left as a task for future work. (The heuristics themselves are probably not optimal yet, either, as they sometimes force expansion of arrays that would be better left alone.) Preliminary performance testing shows impressive speed gains for plpgsql functions that do element-by-element access or update of large arrays. There are other cases that get a little slower, as a result of added array format conversions; but we can hope to improve anything that's annoyingly bad. In any case most applications should see a net win. Tom Lane, reviewed by Andres Freund
* Extend abbreviated key infrastructure to datum tuplesorts.Robert Haas2015-05-13
| | | | Andrew Gierth, reviewed by Peter Geoghegan and by me.
* Fix postgres_fdw to return the right ctid value in EvalPlanQual cases.Tom Lane2015-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a postgres_fdw foreign table is a non-locked source relation in an UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, and the query selects its ctid column, the wrong value would be returned if an EvalPlanQual recheck occurred. This happened because the foreign table's result row was copied via the ROW_MARK_COPY code path, and EvalPlanQualFetchRowMarks just unconditionally set the reconstructed tuple's t_self to "invalid". To fix that, we can have EvalPlanQualFetchRowMarks copy the composite datum's t_ctid field, and be sure to initialize that along with t_self when postgres_fdw constructs a tuple to return. If we just did that much then EvalPlanQualFetchRowMarks would start returning "(0,0)" as ctid for all other ROW_MARK_COPY cases, which perhaps does not matter much, but then again maybe it might. The cause of that is that heap_form_tuple, which is the ultimate source of all composite datums, simply leaves t_ctid as zeroes in newly constructed tuples. That seems like a bad idea on general principles: a field that's really not been initialized shouldn't appear to have a valid value. So let's eat the trivial additional overhead of doing "ItemPointerSetInvalid(&(td->t_ctid))" in heap_form_tuple. This closes out our handling of Etsuro Fujita's report that tableoid and ctid weren't correctly set in postgres_fdw EvalPlanQual cases. Along the way we did a great deal of work to improve FDWs' ability to control row locking behavior; which was not wasted effort by any means, but it didn't end up being a fix for this problem because that feature would be too expensive for postgres_fdw to use all the time. Although the fix for the tableoid misbehavior was back-patched, I'm hesitant to do so here; it seems far less likely that people would care about remote ctid than tableoid, and even such a minor behavioral change as this in heap_form_tuple is perhaps best not back-patched. So commit to HEAD only, at least for the moment. Etsuro Fujita, with some adjustments by me
* Fix ON CONFLICT bugs that manifest when used in rules.Andres Freund2015-05-13
| | | | | | | | | | | | | | Specifically the tlist and rti of the pseudo "excluded" relation weren't properly treated by expression_tree_walker, which lead to errors when excluded was referenced inside a rule because the varnos where not properly adjusted. Similar omissions in OffsetVarNodes and expression_tree_mutator had less impact, but should obviously be fixed nonetheless. A couple tests of for ON CONFLICT UPDATE into INSERT rule bearing relations have been added. In passing I updated a couple comments.
* Add support for doing late row locking in FDWs.Tom Lane2015-05-12
| | | | | | | | | | | | | | | | | | | | | Previously, FDWs could only do "early row locking", that is lock a row as soon as it's fetched, even though local restriction/join conditions might discard the row later. This patch adds callbacks that allow FDWs to do late locking in the same way that it's done for regular tables. To make use of this feature, an FDW must support the "ctid" column as a unique row identifier. Currently, since ctid has to be of type TID, the feature is of limited use, though in principle it could be used by postgres_fdw. We may eventually allow FDWs to specify another data type for ctid, which would make it possible for more FDWs to use this feature. This commit does not modify postgres_fdw to use late locking. We've tested some prototype code for that, but it's not in committable shape, and besides it's quite unclear whether it actually makes sense to do late locking against a remote server. The extra round trips required are likely to outweigh any benefit from improved concurrency. Etsuro Fujita, reviewed by Ashutosh Bapat, and hacked up a lot by me
* Code review for foreign/custom join pushdown patch.Tom Lane2015-05-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 included some design decisions that seem pretty questionable to me, and there was quite a lot of stuff not to like about the documentation and comments. Clean up as follows: * Consider foreign joins only between foreign tables on the same server, rather than between any two foreign tables with the same underlying FDW handler function. In most if not all cases, the FDW would simply have had to apply the same-server restriction itself (far more expensively, both for lack of caching and because it would be repeated for each combination of input sub-joins), or else risk nasty bugs. Anyone who's really intent on doing something outside this restriction can always use the set_join_pathlist_hook. * Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist to better reflect what they're for, and allow these custom scan tlists to be used even for base relations. * Change make_foreignscan() API to include passing the fdw_scan_tlist value, since the FDW is required to set that. Backwards compatibility doesn't seem like an adequate reason to expect FDWs to set it in some ad-hoc extra step, and anyway existing FDWs can just pass NIL. * Change the API of path-generating subroutines of add_paths_to_joinrel, and in particular that of GetForeignJoinPaths and set_join_pathlist_hook, so that various less-used parameters are passed in a struct rather than as separate parameter-list entries. The objective here is to reduce the probability that future additions to those parameter lists will result in source-level API breaks for users of these hooks. It's possible that this is even a small win for the core code, since most CPU architectures can't pass more than half a dozen parameters efficiently anyway. I kept root, joinrel, outerrel, innerrel, and jointype as separate parameters to reduce code churn in joinpath.c --- in particular, putting jointype into the struct would have been problematic because of the subroutines' habit of changing their local copies of that variable. * Avoid ad-hocery in ExecAssignScanProjectionInfo. It was probably all right for it to know about IndexOnlyScan, but if the list is to grow we should refactor the knowledge out to the callers. * Restore nodeForeignscan.c's previous use of the relcache to avoid extra GetFdwRoutine lookups for base-relation scans. * Lots of cleanup of documentation and missed comments. Re-order some code additions into more logical places.
* Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.Andres Freund2015-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The newly added ON CONFLICT clause allows to specify an alternative to raising a unique or exclusion constraint violation error when inserting. ON CONFLICT refers to constraints that can either be specified using a inference clause (by specifying the columns of a unique constraint) or by naming a unique or exclusion constraint. DO NOTHING avoids the constraint violation, without touching the pre-existing row. DO UPDATE SET ... [WHERE ...] updates the pre-existing tuple, and has access to both the tuple proposed for insertion and the existing tuple; the optional WHERE clause can be used to prevent an update from being executed. The UPDATE SET and WHERE clauses have access to the tuple proposed for insertion using the "magic" EXCLUDED alias, and to the pre-existing tuple using the table name or its alias. This feature is often referred to as upsert. This is implemented using a new infrastructure called "speculative insertion". It is an optimistic variant of regular insertion that first does a pre-check for existing tuples and then attempts an insert. If a violating tuple was inserted concurrently, the speculatively inserted tuple is deleted and a new attempt is made. If the pre-check finds a matching tuple the alternative DO NOTHING or DO UPDATE action is taken. If the insertion succeeds without detecting a conflict, the tuple is deemed inserted. To handle the possible ambiguity between the excluded alias and a table named excluded, and for convenience with long relation names, INSERT INTO now can alias its target table. Bumps catversion as stored rules change. Author: Peter Geoghegan, with significant contributions from Heikki Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes. Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs, Dean Rasheed, Stephen Frost and many others.
* Represent columns requiring insert and update privileges indentently.Andres Freund2015-05-08
| | | | | | | | | | | | | | | | | | | Previously, relation range table entries used a single Bitmapset field representing which columns required either UPDATE or INSERT privileges, despite the fact that INSERT and UPDATE privileges are separately cataloged, and may be independently held. As statements so far required either insert or update privileges but never both, that was sufficient. The required permission could be inferred from the top level statement run. The upcoming INSERT ... ON CONFLICT UPDATE feature needs to independently check for both privileges in one statement though, so that is not sufficient anymore. Bumps catversion as stored rules change. Author: Peter Geoghegan Reviewed-By: Andres Freund
* Use outerPlanState macro instead of referring to leffttree.Robert Haas2015-05-04
| | | | | | | This makes the executor code more consistent. It also removes an apparently superfluous NULL test in nodeGroup.c. Qingqing Zhou, reviewed by Tom Lane, and further revised by me.
* Allow FDWs and custom scan providers to replace joins with scans.Robert Haas2015-05-01
| | | | | | | | | | | | | | | | | Foreign data wrappers can use this capability for so-called "join pushdown"; that is, instead of executing two separate foreign scans and then joining the results locally, they can generate a path which performs the join on the remote server and then is scanned locally. This commit does not extend postgres_fdw to take advantage of this capability; it just provides the infrastructure. Custom scan providers can use this in a similar way. Previously, it was only possible for a custom scan provider to scan a single relation. Now, it can scan an entire join tree, provided of course that it knows how to produce the same results that the join would have produced if executed normally. KaiGai Kohei, reviewed by Shigeru Hanada, Ashutosh Bapat, and me.
* Create an infrastructure for parallel computation in PostgreSQL.Robert Haas2015-04-30
| | | | | | | | | | | | | | | | | This does four basic things. First, it provides convenience routines to coordinate the startup and shutdown of parallel workers. Second, it synchronizes various pieces of state (e.g. GUCs, combo CID mappings, transaction snapshot) from the parallel group leader to the worker processes. Third, it prohibits various operations that would result in unsafe changes to that state while parallelism is active. Finally, it propagates events that would result in an ErrorResponse, NoticeResponse, or NotifyResponse message being sent to the client from the parallel workers back to the master, from which they can then be sent on to the client. Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke. Suggestions and review from Andres Freund, Heikki Linnakangas, Noah Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
* Fix various typos and grammar errors in comments.Andres Freund2015-04-26
| | | | | Author: Dmitriy Olshevskiy Discussion: 553D00A6.4090205@bk.ru
* Perform RLS WITH CHECK before constraints, etcStephen Frost2015-04-24
| | | | | | | | | | | | | | | | | | | | | | | The RLS capability is built on top of the WITH CHECK OPTION system which was added for auto-updatable views, however, unlike WCOs on views (which are mandated by the SQL spec to not fire until after all other constraints and checks are done), it makes much more sense for RLS checks to happen earlier than constraint and uniqueness checks. This patch reworks the structure which holds the WCOs a bit to be explicitly either VIEW or RLS checks and the RLS-related checks are done prior to the constraint and uniqueness checks. This also allows better error reporting as we are now reporting when a violation is due to a WITH CHECK OPTION and when it's due to an RLS policy violation, which was independently noted by Craig Ringer as being confusing. The documentation is also updated to include a paragraph about when RLS WITH CHECK handling is performed, as there have been a number of questions regarding that and the documentation was previously silent on the matter. Author: Dean Rasheed, with some kabitzing and comment changes by me.
* Add comments explaining how unique and exclusion constraints are enforced.Heikki Linnakangas2015-04-24
|
* Move functions related to index maintenance to separate source file.Heikki Linnakangas2015-04-24
| | | | | There is enough code here to deserve a file of their own, not be buried in the middle of execUtils.c.
* Fix ExecOpenScanRelation to take a lock on a ROW_MARK_COPY relation.Tom Lane2015-03-24
| | | | | | | | | | | | | | | | | | ExecOpenScanRelation assumed that any relation listed in the ExecRowMark list has been locked by InitPlan; but this is not true if the rel's markType is ROW_MARK_COPY, which is possible if it's a foreign table. In most (possibly all) cases, failure to acquire a lock here isn't really problematic because the parser, planner, or plancache would have taken the appropriate lock already. In principle though it might leave us vulnerable to working with a relation that we hold no lock on, and in any case if the executor isn't depending on previously-taken locks otherwise then it should not do so for ROW_MARK_COPY relations. Noted by Etsuro Fujita. Back-patch to all active versions, since the inconsistency has been there a long time. (It's almost certainly irrelevant in 9.0, since that predates foreign tables, but the code's still wrong on its own terms.)
* Allow foreign tables to participate in inheritance.Tom Lane2015-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Foreign tables can now be inheritance children, or parents. Much of the system was already ready for this, but we had to fix a few things of course, mostly in the area of planner and executor handling of row locks. As side effects of this, allow foreign tables to have NOT VALID CHECK constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS. Continuing to disallow these things would've required bizarre and inconsistent special cases in inheritance behavior. Since foreign tables don't enforce CHECK constraints anyway, a NOT VALID one is a complete no-op, but that doesn't mean we shouldn't allow it. And it's possible that some FDWs might have use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops for most. An additional change in support of this is that when a ModifyTable node has multiple target tables, they will all now be explicitly identified in EXPLAIN output, for example: Update on pt1 (cost=0.00..321.05 rows=3541 width=46) Update on pt1 Foreign Update on ft1 Foreign Update on ft2 Update on child3 -> Seq Scan on pt1 (cost=0.00..0.00 rows=1 width=46) -> Foreign Scan on ft1 (cost=100.00..148.03 rows=1170 width=46) -> Foreign Scan on ft2 (cost=100.00..148.03 rows=1170 width=46) -> Seq Scan on child3 (cost=0.00..25.00 rows=1200 width=46) This was done mainly to provide an unambiguous place to attach "Remote SQL" fields, but it is useful for inherited updates even when no foreign tables are involved. Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro Horiguchi, some additional hacking by me
* Ensure tableoid reads correctly in EvalPlanQual-manufactured tuples.Tom Lane2015-03-12
| | | | | | | | | | | | | | | | | | | | The ROW_MARK_COPY path in EvalPlanQualFetchRowMarks() was just setting tableoid to InvalidOid, I think on the assumption that the referenced RTE must be a subquery or other case without a meaningful OID. However, foreign tables also use this code path, and they do have meaningful table OIDs; so failure to set the tuple field can lead to user-visible misbehavior. Fix that by fetching the appropriate OID from the range table. There's still an issue about whether CTID can ever have a meaningful value in this case; at least with postgres_fdw foreign tables, it does. But that is a different problem that seems to require a significantly different patch --- it's debatable whether postgres_fdw really wants to use this code path at all. Simplified version of a patch by Etsuro Fujita, who also noted the problem to begin with. The issue can be demonstrated in all versions having FDWs, so back-patch to 9.1.
* Use the typcache to cache constraints for domain types.Tom Lane2015-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we cached domain constraints for the life of a query, or really for the life of the FmgrInfo struct that was used to invoke domain_in() or domain_check(). But plpgsql (and probably other places) are set up to cache such FmgrInfos for the whole lifespan of a session, which meant they could be enforcing really stale sets of constraints. On the other hand, searching pg_constraint once per query gets kind of expensive too: testing says that as much as half the runtime of a trivial query such as "SELECT 0::domaintype" went into that. To fix this, delegate the responsibility for tracking a domain's constraints to the typcache, which has the infrastructure needed to detect syscache invalidation events that signal possible changes. This not only removes unnecessary repeat reads of pg_constraint, but ensures that we never apply stale constraint data: whatever we use is the current data according to syscache rules. Unfortunately, the current configuration of the system catalogs means we have to flush cached domain-constraint data whenever either pg_type or pg_constraint changes, which happens rather a lot (eg, creation or deletion of a temp table will do it). It might be worth rearranging things to split pg_constraint into two catalogs, of which the domain constraint one would probably be very low-traffic. That's a job for another patch though, and in any case this patch should improve matters materially even with that handicap. This patch makes use of the recently-added memory context reset callback feature to manage the lifespan of domain constraint caches, so that we don't risk deleting a cache that might be in the midst of evaluation. Although this is a bug fix as well as a performance improvement, no back-patch. There haven't been many if any field complaints about stale domain constraint checks, so it doesn't seem worth taking the risk of modifying data structures as basic as MemoryContexts in back branches.
* In array_agg(), don't create a new context for every group.Jeff Davis2015-02-21
| | | | | | | | | | | | | | | Previously, each new array created a new memory context that started out at 8kB. This is incredibly wasteful when there are lots of small groups of just a few elements each. Change initArrayResult() and friends to accept a "subcontext" argument to indicate whether the caller wants the ArrayBuildState allocated in a new subcontext or not. If not, it can no longer be released separately from the rest of the memory context. Fixes bug report by Frank van Vugt on 2013-10-19. Tomas Vondra. Reviewed by Ali Akbar, Tom Lane, and me.
* Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[].Tom Lane2015-02-21
| | | | | | | This requires changing quite a few places that were depending on sizeof(HeapTupleHeaderData), but it seems for the best. Michael Paquier, some adjustments by me
* Use FLEXIBLE_ARRAY_MEMBER in some more places.Tom Lane2015-02-20
| | | | | | Fix a batch of structs that are only visible within individual .c files. Michael Paquier
* Use FLEXIBLE_ARRAY_MEMBER in a bunch more places.Tom Lane2015-02-20
| | | | | | | | | | | | | | | | Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]". Aside from being more self-documenting, this should help prevent bogus warnings from static code analyzers and perhaps compiler misoptimizations. This patch is just a down payment on eliminating the whole problem, but it gets rid of a lot of easy-to-fix cases. Note that the main problem with doing this is that one must no longer rely on computing sizeof(the containing struct), since the result would be compiler-dependent. Instead use offsetof(struct, lastfield). Autoconf also warns against spelling that offsetof(struct, lastfield[0]). Michael Paquier, review and additional fixes by me.
* Rationalize the APIs of array element/slice access functions.Tom Lane2015-02-16
| | | | | | | | | | | | | | | | | | | | | | The four functions array_ref, array_set, array_get_slice, array_set_slice have traditionally declared their array inputs and results as being of type "ArrayType *". This is a lie, and has been since Berkeley days, because they actually also support "fixed-length array" types such as "name" and "point"; not to mention that the inputs could be toasted. These values should be declared Datum instead to avoid confusion. The current coding already risks possible misoptimization by compilers, and it'll get worse when "expanded" array representations become a valid alternative. However, there's a fair amount of code using array_ref and array_set with arrays that *are* known to be ArrayType structures, and there might be more such places in third-party code. Rather than cluttering those call sites with PointerGetDatum/DatumGetArrayTypeP cruft, what I did was to rename the existing functions to array_get_element/array_set_element, fix their signatures, then reincarnate array_ref/array_set as backwards compatibility wrappers. array_get_slice/array_set_slice have no such constituency in the core code, and probably not in third-party code either, so I just changed their APIs.
* Fix null-pointer-deref crash while doing COPY IN with check constraints.Tom Lane2015-02-15
| | | | | | | | | | | | | | | | | | | | In commit bf7ca15875988a88e97302e012d7c4808bef3ea9 I introduced an assumption that an RTE referenced by a whole-row Var must have a valid eref field. This is false for RTEs constructed by DoCopy, and there are other places taking similar shortcuts. Perhaps we should make all those places go through addRangeTableEntryForRelation or its siblings instead of having ad-hoc logic, but the most reliable fix seems to be to make the new code in ExecEvalWholeRowVar cope if there's no eref. We can reasonably assume that there's no need to insert column aliases if no aliases were provided. Add a regression test case covering this, and also verifying that a sane column name is in fact available in this situation. Although the known case only crashes in 9.4 and HEAD, it seems prudent to back-patch the code change to 9.2, since all the ingredients for a similar failure exist in the variant patch applied to 9.3 and 9.2. Per report from Jean-Pierre Pelletier.
* Fix reference-after-free when waiting for another xact due to constraint.Heikki Linnakangas2015-02-04
| | | | | | | | | | | | | | | | | | | If an insertion or update had to wait for another transaction to finish, because there was another insertion with conflicting key in progress, we would pass a just-free'd item pointer to XactLockTableWait(). All calls to XactLockTableWait() and MultiXactIdWait() had similar issues. Some passed a pointer to a buffer in the buffer cache, after already releasing the lock. The call in EvalPlanQualFetch had already released the pin too. All but the call in execUtils.c would merely lead to reporting a bogus ctid, however (or an assertion failure, if enabled). All the callers that passed HeapTuple->t_data->t_ctid were slightly bogus anyway: if the tuple was updated (again) in the same transaction, its ctid field would point to the next tuple in the chain, not the tuple itself. Backpatch to 9.4, where the 'ctid' argument to XactLockTableWait was added (in commit f88d4cfc)
* Fix column-privilege leak in error-message pathsStephen Frost2015-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | While building error messages to return to the user, BuildIndexValueDescription, ExecBuildSlotValueDescription and ri_ReportViolation would happily include the entire key or entire row in the result returned to the user, even if the user didn't have access to view all of the columns being included. Instead, include only those columns which the user is providing or which the user has select rights on. If the user does not have any rights to view the table or any of the columns involved then no detail is provided and a NULL value is returned from BuildIndexValueDescription and ExecBuildSlotValueDescription. Note that, for key cases, the user must have access to all of the columns for the key to be shown; a partial key will not be returned. Further, in master only, do not return any data for cases where row security is enabled on the relation and row security should be applied for the user. This required a bit of refactoring and moving of things around related to RLS- note the addition of utils/misc/rls.c. Back-patch all the way, as column-level privileges are now in all supported versions. This has been assigned CVE-2014-8161, but since the issue and the patch have already been publicized on pgsql-hackers, there's no point in trying to hide this commit.
* Use abbreviated keys for faster sorting of text datums.Robert Haas2015-01-19
| | | | | | | | | | | | | | | | | | | | | This commit extends the SortSupport infrastructure to allow operator classes the option to provide abbreviated representations of Datums; in the case of text, we abbreviate by taking the first few characters of the strxfrm() blob. If the abbreviated comparison is insufficent to resolve the comparison, we fall back on the normal comparator. This can be much faster than the old way of doing sorting if the first few bytes of the string are usually sufficient to resolve the comparison. There is the potential for a performance regression if all of the strings to be sorted are identical for the first 8+ characters and differ only in later positions; therefore, the SortSupport machinery now provides an infrastructure to abort the use of abbreviation if it appears that abbreviation is producing comparatively few distinct keys. HyperLogLog, a streaming cardinality estimator, is included in this commit and used to make that determination for text. Peter Geoghegan, reviewed by me.