postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
...
*	Fix datatype confusion with the new lossy GiST distance functions.	Heikki Linnakangas	2015-05-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can only support a lossy distance function when the distance function's datatype is comparable with the original ordering operator's datatype. The distance function always returns a float8, so we are limited to float8, and float4 (by a hard-coded cast of the float8 to float4). In light of this limitation, it seems like a good idea to have a separate 'recheck' flag for the ORDER BY expressions, so that if you have a non-lossy distance function, it still works with lossy quals. There are cases like that with the build-in or contrib opclasses, but it's plausible. There was a hidden assumption that the ORDER BY values returned by GiST match the original ordering operator's return type, but there are plenty of examples where that's not true, e.g. in btree_gist and pg_trgm. As long as the distance function is not lossy, we can tolerate that and just not return the distance to the executor (or rather, always return NULL). The executor doesn't need the distances if there are no lossy results. There was another little bug: the recheck variable was not initialized before calling the distance function. That revealed the bigger issue, as the executor tried to reorder tuples that didn't need reordering, and that failed because of the datatype mismatch.
*	Allow GiST distance function to return merely a lower-bound.	Heikki Linnakangas	2015-05-15
\| \| \| \| \| \| \| \| \| \| \|	The distance function can now set *recheck = false, like index quals. The executor will then re-check the ORDER BY expressions, and use a queue to reorder the results on the fly. This makes it possible to do kNN-searches on polygons and circles, which don't store the exact value in the index, but just a bounding box. Alexander Korotkov and me
*	Support "expanded" objects, particularly arrays, for better performance.	Tom Lane	2015-05-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces the ability for complex datatypes to have an in-memory representation that is different from their on-disk format. On-disk formats are typically optimized for minimal size, and in any case they can't contain pointers, so they are often not well-suited for computation. Now a datatype can invent an "expanded" in-memory format that is better suited for its operations, and then pass that around among the C functions that operate on the datatype. There are also provisions (rudimentary as yet) to allow an expanded object to be modified in-place under suitable conditions, so that operations like assignment to an element of an array need not involve copying the entire array. The initial application for this feature is arrays, but it is not hard to foresee using it for other container types like JSON, XML and hstore. I have hopes that it will be useful to PostGIS as well. In this initial implementation, a few heuristics have been hard-wired into plpgsql to improve performance for arrays that are stored in plpgsql variables. We would like to generalize those hacks so that other datatypes can obtain similar improvements, but figuring out some appropriate APIs is left as a task for future work. (The heuristics themselves are probably not optimal yet, either, as they sometimes force expansion of arrays that would be better left alone.) Preliminary performance testing shows impressive speed gains for plpgsql functions that do element-by-element access or update of large arrays. There are other cases that get a little slower, as a result of added array format conversions; but we can hope to improve anything that's annoyingly bad. In any case most applications should see a net win. Tom Lane, reviewed by Andres Freund
*	Extend abbreviated key infrastructure to datum tuplesorts.	Robert Haas	2015-05-13
\| \| \| \|	Andrew Gierth, reviewed by Peter Geoghegan and by me.
*	Fix postgres_fdw to return the right ctid value in EvalPlanQual cases.	Tom Lane	2015-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a postgres_fdw foreign table is a non-locked source relation in an UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, and the query selects its ctid column, the wrong value would be returned if an EvalPlanQual recheck occurred. This happened because the foreign table's result row was copied via the ROW_MARK_COPY code path, and EvalPlanQualFetchRowMarks just unconditionally set the reconstructed tuple's t_self to "invalid". To fix that, we can have EvalPlanQualFetchRowMarks copy the composite datum's t_ctid field, and be sure to initialize that along with t_self when postgres_fdw constructs a tuple to return. If we just did that much then EvalPlanQualFetchRowMarks would start returning "(0,0)" as ctid for all other ROW_MARK_COPY cases, which perhaps does not matter much, but then again maybe it might. The cause of that is that heap_form_tuple, which is the ultimate source of all composite datums, simply leaves t_ctid as zeroes in newly constructed tuples. That seems like a bad idea on general principles: a field that's really not been initialized shouldn't appear to have a valid value. So let's eat the trivial additional overhead of doing "ItemPointerSetInvalid(&(td->t_ctid))" in heap_form_tuple. This closes out our handling of Etsuro Fujita's report that tableoid and ctid weren't correctly set in postgres_fdw EvalPlanQual cases. Along the way we did a great deal of work to improve FDWs' ability to control row locking behavior; which was not wasted effort by any means, but it didn't end up being a fix for this problem because that feature would be too expensive for postgres_fdw to use all the time. Although the fix for the tableoid misbehavior was back-patched, I'm hesitant to do so here; it seems far less likely that people would care about remote ctid than tableoid, and even such a minor behavioral change as this in heap_form_tuple is perhaps best not back-patched. So commit to HEAD only, at least for the moment. Etsuro Fujita, with some adjustments by me
*	Fix ON CONFLICT bugs that manifest when used in rules.	Andres Freund	2015-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Specifically the tlist and rti of the pseudo "excluded" relation weren't properly treated by expression_tree_walker, which lead to errors when excluded was referenced inside a rule because the varnos where not properly adjusted. Similar omissions in OffsetVarNodes and expression_tree_mutator had less impact, but should obviously be fixed nonetheless. A couple tests of for ON CONFLICT UPDATE into INSERT rule bearing relations have been added. In passing I updated a couple comments.
*	Add support for doing late row locking in FDWs.	Tom Lane	2015-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, FDWs could only do "early row locking", that is lock a row as soon as it's fetched, even though local restriction/join conditions might discard the row later. This patch adds callbacks that allow FDWs to do late locking in the same way that it's done for regular tables. To make use of this feature, an FDW must support the "ctid" column as a unique row identifier. Currently, since ctid has to be of type TID, the feature is of limited use, though in principle it could be used by postgres_fdw. We may eventually allow FDWs to specify another data type for ctid, which would make it possible for more FDWs to use this feature. This commit does not modify postgres_fdw to use late locking. We've tested some prototype code for that, but it's not in committable shape, and besides it's quite unclear whether it actually makes sense to do late locking against a remote server. The extra round trips required are likely to outweigh any benefit from improved concurrency. Etsuro Fujita, reviewed by Ashutosh Bapat, and hacked up a lot by me
*	Code review for foreign/custom join pushdown patch.	Tom Lane	2015-05-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 included some design decisions that seem pretty questionable to me, and there was quite a lot of stuff not to like about the documentation and comments. Clean up as follows: * Consider foreign joins only between foreign tables on the same server, rather than between any two foreign tables with the same underlying FDW handler function. In most if not all cases, the FDW would simply have had to apply the same-server restriction itself (far more expensively, both for lack of caching and because it would be repeated for each combination of input sub-joins), or else risk nasty bugs. Anyone who's really intent on doing something outside this restriction can always use the set_join_pathlist_hook. * Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist to better reflect what they're for, and allow these custom scan tlists to be used even for base relations. * Change make_foreignscan() API to include passing the fdw_scan_tlist value, since the FDW is required to set that. Backwards compatibility doesn't seem like an adequate reason to expect FDWs to set it in some ad-hoc extra step, and anyway existing FDWs can just pass NIL. * Change the API of path-generating subroutines of add_paths_to_joinrel, and in particular that of GetForeignJoinPaths and set_join_pathlist_hook, so that various less-used parameters are passed in a struct rather than as separate parameter-list entries. The objective here is to reduce the probability that future additions to those parameter lists will result in source-level API breaks for users of these hooks. It's possible that this is even a small win for the core code, since most CPU architectures can't pass more than half a dozen parameters efficiently anyway. I kept root, joinrel, outerrel, innerrel, and jointype as separate parameters to reduce code churn in joinpath.c --- in particular, putting jointype into the struct would have been problematic because of the subroutines' habit of changing their local copies of that variable. * Avoid ad-hocery in ExecAssignScanProjectionInfo. It was probably all right for it to know about IndexOnlyScan, but if the list is to grow we should refactor the knowledge out to the callers. * Restore nodeForeignscan.c's previous use of the relcache to avoid extra GetFdwRoutine lookups for base-relation scans. * Lots of cleanup of documentation and missed comments. Re-order some code additions into more logical places.
*	Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.	Andres Freund	2015-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The newly added ON CONFLICT clause allows to specify an alternative to raising a unique or exclusion constraint violation error when inserting. ON CONFLICT refers to constraints that can either be specified using a inference clause (by specifying the columns of a unique constraint) or by naming a unique or exclusion constraint. DO NOTHING avoids the constraint violation, without touching the pre-existing row. DO UPDATE SET ... [WHERE ...] updates the pre-existing tuple, and has access to both the tuple proposed for insertion and the existing tuple; the optional WHERE clause can be used to prevent an update from being executed. The UPDATE SET and WHERE clauses have access to the tuple proposed for insertion using the "magic" EXCLUDED alias, and to the pre-existing tuple using the table name or its alias. This feature is often referred to as upsert. This is implemented using a new infrastructure called "speculative insertion". It is an optimistic variant of regular insertion that first does a pre-check for existing tuples and then attempts an insert. If a violating tuple was inserted concurrently, the speculatively inserted tuple is deleted and a new attempt is made. If the pre-check finds a matching tuple the alternative DO NOTHING or DO UPDATE action is taken. If the insertion succeeds without detecting a conflict, the tuple is deemed inserted. To handle the possible ambiguity between the excluded alias and a table named excluded, and for convenience with long relation names, INSERT INTO now can alias its target table. Bumps catversion as stored rules change. Author: Peter Geoghegan, with significant contributions from Heikki Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes. Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs, Dean Rasheed, Stephen Frost and many others.
*	Represent columns requiring insert and update privileges indentently.	Andres Freund	2015-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, relation range table entries used a single Bitmapset field representing which columns required either UPDATE or INSERT privileges, despite the fact that INSERT and UPDATE privileges are separately cataloged, and may be independently held. As statements so far required either insert or update privileges but never both, that was sufficient. The required permission could be inferred from the top level statement run. The upcoming INSERT ... ON CONFLICT UPDATE feature needs to independently check for both privileges in one statement though, so that is not sufficient anymore. Bumps catversion as stored rules change. Author: Peter Geoghegan Reviewed-By: Andres Freund
*	Use outerPlanState macro instead of referring to leffttree.	Robert Haas	2015-05-04
\| \| \| \| \| \| \|	This makes the executor code more consistent. It also removes an apparently superfluous NULL test in nodeGroup.c. Qingqing Zhou, reviewed by Tom Lane, and further revised by me.
*	Allow FDWs and custom scan providers to replace joins with scans.	Robert Haas	2015-05-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Foreign data wrappers can use this capability for so-called "join pushdown"; that is, instead of executing two separate foreign scans and then joining the results locally, they can generate a path which performs the join on the remote server and then is scanned locally. This commit does not extend postgres_fdw to take advantage of this capability; it just provides the infrastructure. Custom scan providers can use this in a similar way. Previously, it was only possible for a custom scan provider to scan a single relation. Now, it can scan an entire join tree, provided of course that it knows how to produce the same results that the join would have produced if executed normally. KaiGai Kohei, reviewed by Shigeru Hanada, Ashutosh Bapat, and me.
*	Create an infrastructure for parallel computation in PostgreSQL.	Robert Haas	2015-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This does four basic things. First, it provides convenience routines to coordinate the startup and shutdown of parallel workers. Second, it synchronizes various pieces of state (e.g. GUCs, combo CID mappings, transaction snapshot) from the parallel group leader to the worker processes. Third, it prohibits various operations that would result in unsafe changes to that state while parallelism is active. Finally, it propagates events that would result in an ErrorResponse, NoticeResponse, or NotifyResponse message being sent to the client from the parallel workers back to the master, from which they can then be sent on to the client. Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke. Suggestions and review from Andres Freund, Heikki Linnakangas, Noah Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
*	Fix various typos and grammar errors in comments.	Andres Freund	2015-04-26
\| \| \| \| \|	Author: Dmitriy Olshevskiy Discussion: 553D00A6.4090205@bk.ru
*	Perform RLS WITH CHECK before constraints, etc	Stephen Frost	2015-04-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The RLS capability is built on top of the WITH CHECK OPTION system which was added for auto-updatable views, however, unlike WCOs on views (which are mandated by the SQL spec to not fire until after all other constraints and checks are done), it makes much more sense for RLS checks to happen earlier than constraint and uniqueness checks. This patch reworks the structure which holds the WCOs a bit to be explicitly either VIEW or RLS checks and the RLS-related checks are done prior to the constraint and uniqueness checks. This also allows better error reporting as we are now reporting when a violation is due to a WITH CHECK OPTION and when it's due to an RLS policy violation, which was independently noted by Craig Ringer as being confusing. The documentation is also updated to include a paragraph about when RLS WITH CHECK handling is performed, as there have been a number of questions regarding that and the documentation was previously silent on the matter. Author: Dean Rasheed, with some kabitzing and comment changes by me.
*	Add comments explaining how unique and exclusion constraints are enforced.	Heikki Linnakangas	2015-04-24
\|
*	Move functions related to index maintenance to separate source file.	Heikki Linnakangas	2015-04-24
\| \| \| \| \|	There is enough code here to deserve a file of their own, not be buried in the middle of execUtils.c.
*	Fix ExecOpenScanRelation to take a lock on a ROW_MARK_COPY relation.	Tom Lane	2015-03-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ExecOpenScanRelation assumed that any relation listed in the ExecRowMark list has been locked by InitPlan; but this is not true if the rel's markType is ROW_MARK_COPY, which is possible if it's a foreign table. In most (possibly all) cases, failure to acquire a lock here isn't really problematic because the parser, planner, or plancache would have taken the appropriate lock already. In principle though it might leave us vulnerable to working with a relation that we hold no lock on, and in any case if the executor isn't depending on previously-taken locks otherwise then it should not do so for ROW_MARK_COPY relations. Noted by Etsuro Fujita. Back-patch to all active versions, since the inconsistency has been there a long time. (It's almost certainly irrelevant in 9.0, since that predates foreign tables, but the code's still wrong on its own terms.)
*	Allow foreign tables to participate in inheritance.	Tom Lane	2015-03-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Foreign tables can now be inheritance children, or parents. Much of the system was already ready for this, but we had to fix a few things of course, mostly in the area of planner and executor handling of row locks. As side effects of this, allow foreign tables to have NOT VALID CHECK constraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and to accept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS. Continuing to disallow these things would've required bizarre and inconsistent special cases in inheritance behavior. Since foreign tables don't enforce CHECK constraints anyway, a NOT VALID one is a complete no-op, but that doesn't mean we shouldn't allow it. And it's possible that some FDWs might have use for SET STORAGE or SET WITH OIDS, though doubtless they will be no-ops for most. An additional change in support of this is that when a ModifyTable node has multiple target tables, they will all now be explicitly identified in EXPLAIN output, for example: Update on pt1 (cost=0.00..321.05 rows=3541 width=46) Update on pt1 Foreign Update on ft1 Foreign Update on ft2 Update on child3 -> Seq Scan on pt1 (cost=0.00..0.00 rows=1 width=46) -> Foreign Scan on ft1 (cost=100.00..148.03 rows=1170 width=46) -> Foreign Scan on ft2 (cost=100.00..148.03 rows=1170 width=46) -> Seq Scan on child3 (cost=0.00..25.00 rows=1200 width=46) This was done mainly to provide an unambiguous place to attach "Remote SQL" fields, but it is useful for inherited updates even when no foreign tables are involved. Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro Horiguchi, some additional hacking by me
*	Ensure tableoid reads correctly in EvalPlanQual-manufactured tuples.	Tom Lane	2015-03-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ROW_MARK_COPY path in EvalPlanQualFetchRowMarks() was just setting tableoid to InvalidOid, I think on the assumption that the referenced RTE must be a subquery or other case without a meaningful OID. However, foreign tables also use this code path, and they do have meaningful table OIDs; so failure to set the tuple field can lead to user-visible misbehavior. Fix that by fetching the appropriate OID from the range table. There's still an issue about whether CTID can ever have a meaningful value in this case; at least with postgres_fdw foreign tables, it does. But that is a different problem that seems to require a significantly different patch --- it's debatable whether postgres_fdw really wants to use this code path at all. Simplified version of a patch by Etsuro Fujita, who also noted the problem to begin with. The issue can be demonstrated in all versions having FDWs, so back-patch to 9.1.
*	Use the typcache to cache constraints for domain types.	Tom Lane	2015-03-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we cached domain constraints for the life of a query, or really for the life of the FmgrInfo struct that was used to invoke domain_in() or domain_check(). But plpgsql (and probably other places) are set up to cache such FmgrInfos for the whole lifespan of a session, which meant they could be enforcing really stale sets of constraints. On the other hand, searching pg_constraint once per query gets kind of expensive too: testing says that as much as half the runtime of a trivial query such as "SELECT 0::domaintype" went into that. To fix this, delegate the responsibility for tracking a domain's constraints to the typcache, which has the infrastructure needed to detect syscache invalidation events that signal possible changes. This not only removes unnecessary repeat reads of pg_constraint, but ensures that we never apply stale constraint data: whatever we use is the current data according to syscache rules. Unfortunately, the current configuration of the system catalogs means we have to flush cached domain-constraint data whenever either pg_type or pg_constraint changes, which happens rather a lot (eg, creation or deletion of a temp table will do it). It might be worth rearranging things to split pg_constraint into two catalogs, of which the domain constraint one would probably be very low-traffic. That's a job for another patch though, and in any case this patch should improve matters materially even with that handicap. This patch makes use of the recently-added memory context reset callback feature to manage the lifespan of domain constraint caches, so that we don't risk deleting a cache that might be in the midst of evaluation. Although this is a bug fix as well as a performance improvement, no back-patch. There haven't been many if any field complaints about stale domain constraint checks, so it doesn't seem worth taking the risk of modifying data structures as basic as MemoryContexts in back branches.
*	In array_agg(), don't create a new context for every group.	Jeff Davis	2015-02-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, each new array created a new memory context that started out at 8kB. This is incredibly wasteful when there are lots of small groups of just a few elements each. Change initArrayResult() and friends to accept a "subcontext" argument to indicate whether the caller wants the ArrayBuildState allocated in a new subcontext or not. If not, it can no longer be released separately from the rest of the memory context. Fixes bug report by Frank van Vugt on 2013-10-19. Tomas Vondra. Reviewed by Ali Akbar, Tom Lane, and me.
*	Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[].	Tom Lane	2015-02-21
\| \| \| \| \| \| \|	This requires changing quite a few places that were depending on sizeof(HeapTupleHeaderData), but it seems for the best. Michael Paquier, some adjustments by me
*	Use FLEXIBLE_ARRAY_MEMBER in some more places.	Tom Lane	2015-02-20
\| \| \| \| \| \|	Fix a batch of structs that are only visible within individual .c files. Michael Paquier
*	Use FLEXIBLE_ARRAY_MEMBER in a bunch more places.	Tom Lane	2015-02-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]". Aside from being more self-documenting, this should help prevent bogus warnings from static code analyzers and perhaps compiler misoptimizations. This patch is just a down payment on eliminating the whole problem, but it gets rid of a lot of easy-to-fix cases. Note that the main problem with doing this is that one must no longer rely on computing sizeof(the containing struct), since the result would be compiler-dependent. Instead use offsetof(struct, lastfield). Autoconf also warns against spelling that offsetof(struct, lastfield[0]). Michael Paquier, review and additional fixes by me.
*	Rationalize the APIs of array element/slice access functions.	Tom Lane	2015-02-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The four functions array_ref, array_set, array_get_slice, array_set_slice have traditionally declared their array inputs and results as being of type "ArrayType ". This is a lie, and has been since Berkeley days, because they actually also support "fixed-length array" types such as "name" and "point"; not to mention that the inputs could be toasted. These values should be declared Datum instead to avoid confusion. The current coding already risks possible misoptimization by compilers, and it'll get worse when "expanded" array representations become a valid alternative. However, there's a fair amount of code using array_ref and array_set with arrays that are* known to be ArrayType structures, and there might be more such places in third-party code. Rather than cluttering those call sites with PointerGetDatum/DatumGetArrayTypeP cruft, what I did was to rename the existing functions to array_get_element/array_set_element, fix their signatures, then reincarnate array_ref/array_set as backwards compatibility wrappers. array_get_slice/array_set_slice have no such constituency in the core code, and probably not in third-party code either, so I just changed their APIs.
*	Fix null-pointer-deref crash while doing COPY IN with check constraints.	Tom Lane	2015-02-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit bf7ca15875988a88e97302e012d7c4808bef3ea9 I introduced an assumption that an RTE referenced by a whole-row Var must have a valid eref field. This is false for RTEs constructed by DoCopy, and there are other places taking similar shortcuts. Perhaps we should make all those places go through addRangeTableEntryForRelation or its siblings instead of having ad-hoc logic, but the most reliable fix seems to be to make the new code in ExecEvalWholeRowVar cope if there's no eref. We can reasonably assume that there's no need to insert column aliases if no aliases were provided. Add a regression test case covering this, and also verifying that a sane column name is in fact available in this situation. Although the known case only crashes in 9.4 and HEAD, it seems prudent to back-patch the code change to 9.2, since all the ingredients for a similar failure exist in the variant patch applied to 9.3 and 9.2. Per report from Jean-Pierre Pelletier.
*	Fix reference-after-free when waiting for another xact due to constraint.	Heikki Linnakangas	2015-02-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If an insertion or update had to wait for another transaction to finish, because there was another insertion with conflicting key in progress, we would pass a just-free'd item pointer to XactLockTableWait(). All calls to XactLockTableWait() and MultiXactIdWait() had similar issues. Some passed a pointer to a buffer in the buffer cache, after already releasing the lock. The call in EvalPlanQualFetch had already released the pin too. All but the call in execUtils.c would merely lead to reporting a bogus ctid, however (or an assertion failure, if enabled). All the callers that passed HeapTuple->t_data->t_ctid were slightly bogus anyway: if the tuple was updated (again) in the same transaction, its ctid field would point to the next tuple in the chain, not the tuple itself. Backpatch to 9.4, where the 'ctid' argument to XactLockTableWait was added (in commit f88d4cfc)
*	Fix column-privilege leak in error-message paths	Stephen Frost	2015-01-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While building error messages to return to the user, BuildIndexValueDescription, ExecBuildSlotValueDescription and ri_ReportViolation would happily include the entire key or entire row in the result returned to the user, even if the user didn't have access to view all of the columns being included. Instead, include only those columns which the user is providing or which the user has select rights on. If the user does not have any rights to view the table or any of the columns involved then no detail is provided and a NULL value is returned from BuildIndexValueDescription and ExecBuildSlotValueDescription. Note that, for key cases, the user must have access to all of the columns for the key to be shown; a partial key will not be returned. Further, in master only, do not return any data for cases where row security is enabled on the relation and row security should be applied for the user. This required a bit of refactoring and moving of things around related to RLS- note the addition of utils/misc/rls.c. Back-patch all the way, as column-level privileges are now in all supported versions. This has been assigned CVE-2014-8161, but since the issue and the patch have already been publicized on pgsql-hackers, there's no point in trying to hide this commit.
*	Use abbreviated keys for faster sorting of text datums.	Robert Haas	2015-01-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit extends the SortSupport infrastructure to allow operator classes the option to provide abbreviated representations of Datums; in the case of text, we abbreviate by taking the first few characters of the strxfrm() blob. If the abbreviated comparison is insufficent to resolve the comparison, we fall back on the normal comparator. This can be much faster than the old way of doing sorting if the first few bytes of the string are usually sufficient to resolve the comparison. There is the potential for a performance regression if all of the strings to be sorted are identical for the first 8+ characters and differ only in later positions; therefore, the SortSupport machinery now provides an infrastructure to abort the use of abbreviation if it appears that abbreviation is producing comparatively few distinct keys. HyperLogLog, a streaming cardinality estimator, is included in this commit and used to make that determination for text. Peter Geoghegan, reviewed by me.
*	Typo fix.	Robert Haas	2015-01-19
\| \| \| \|	Etsuro Fujita
*	Fix use-of-already-freed-memory problem in EvalPlanQual processing.	Tom Lane	2015-01-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Up to now, the "child" executor state trees generated for EvalPlanQual rechecks have simply shared the ResultRelInfo arrays used for the original execution tree. However, this leads to dangling-pointer problems, because ExecInitModifyTable() is all too willing to scribble on some fields of the ResultRelInfo(s) even when it's being run in one of those child trees. This trashes those fields from the perspective of the parent tree, because even if the generated subtree is logically identical to what was in use in the parent, it's in a memory context that will go away when we're done with the child state tree. We do however want to share information in the direction from the parent down to the children; in particular, fields such as es_instrument must be shared or we'll lose the stats arising from execution of the children. So the simplest fix is to make a copy of the parent's ResultRelInfo array, but not copy any fields back at end of child execution. Per report from Manuel Kniep. The added isolation test is based on his example. In an unpatched memory-clobber-enabled build it will reliably fail with "ctid is NULL" errors in all branches back to 9.1, as a consequence of junkfilter->jf_junkAttNo being overwritten with $7f7f. This test cannot be run as-is before that for lack of WITH syntax; but I have no doubt that some variant of this problem can arise in older branches, so apply the code change all the way back.
*	Fix typo in execMain.c	Stephen Frost	2015-01-09
\| \| \| \| \| \|	Wee -> We. Pointed out by Etsuro Fujita.
*	Update copyright for 2015	Bruce Momjian	2015-01-06
\| \| \| \|	Backpatch certain files through 9.0
*	Fix corner case where SELECT FOR UPDATE could return a row twice.	Tom Lane	2014-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In READ COMMITTED mode, if a SELECT FOR UPDATE discovers it has to redo WHERE-clause checking on rows that have been updated since the SELECT's snapshot, it invokes EvalPlanQual processing to do that. If this first occurs within a non-first child table of an inheritance tree, the previous coding could accidentally re-return a matching row from an earlier, already-scanned child table. (And, to add insult to injury, I think this could make it miss returning a row that should have been returned, if the updated row that this happens on should still have passed the WHERE qual.) Per report from Kyotaro Horiguchi; the added isolation test is based on his test case. This has been broken for quite awhile, so back-patch to all supported branches.
*	Add bms_next_member(), and use it where appropriate.	Tom Lane	2014-11-28
\| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a way of iterating through the members of a bitmapset nondestructively, unlike the old way with bms_first_member(). While bms_next_member() is very slightly slower than bms_first_member() (at least for typical-size bitmapsets), eliminating the need to palloc and pfree a temporary copy of the target bitmapset is a significant win. So this method should be preferred in all cases where a temporary copy would be necessary. Tom Lane, with suggestions from Dean Rasheed and David Rowley
*	Rename pg_rowsecurity -> pg_policy and other fixes	Stephen Frost	2014-11-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As pointed out by Robert, we should really have named pg_rowsecurity pg_policy, as the objects stored in that catalog are policies. This patch fixes that and updates the column names to start with 'pol' to match the new catalog name. The security consideration for COPY with row level security, also pointed out by Robert, has also been addressed by remembering and re-checking the OID of the relation initially referenced during COPY processing, to make sure it hasn't changed under us by the time we finish planning out the query which has been built. Robert and Alvaro also commented on missing OCLASS and OBJECT entries for POLICY (formerly ROWSECURITY or POLICY, depending) in various places. This patch fixes that too, which also happens to add the ability to COMMENT on policies. In passing, attempt to improve the consistency of messages, comments, and documentation as well. This removes various incarnations of 'row-security', 'row-level security', 'Row-security', etc, in favor of 'policy', 'row level security' or 'row_security' as appropriate. Happy Thanksgiving!
*	Support arrays as input to array_agg() and ARRAY(SELECT ...).	Tom Lane	2014-11-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These cases formerly failed with errors about "could not find array type for data type". Now they yield arrays of the same element type and one higher dimension. The implementation involves creating functions with API similar to the existing accumArrayResult() family. I (tgl) also extended the base family by adding an initArrayResult() function, which allows callers to avoid special-casing the zero-inputs case if they just want an empty array as result. (Not all do, so the previous calling convention remains valid.) This allowed simplifying some existing code in xml.c and plperl.c. Ali Akbar, reviewed by Pavel Stehule, significantly modified by me
*	Remove dead code supporting mark/restore in SeqScan, TidScan, ValuesScan.	Tom Lane	2014-11-20
\| \| \| \| \| \| \| \| \| \| \| \|	There seems no prospect that any of this will ever be useful, and indeed it's questionable whether some of it would work if it ever got called; it's certainly not been exercised in a very long time, if ever. So let's get rid of it, and make the comments about mark/restore in execAmi.c less wishy-washy. The mark/restore support for Result nodes is also currently dead code, but that's due to planner limitations not because it's impossible that it could be useful. So I left it in.
*	Initial code review for CustomScan patch.	Tom Lane	2014-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Get rid of the pernicious entanglement between planner and executor headers introduced by commit 0b03e5951bf0a1a8868db13f02049cf686a82165. Also, rearrange the CustomFoo struct/typedef definitions so that all the typedef names are seen as used by the compiler. Without this pgindent will mess things up a bit, which is not so important perhaps, but it also removes a bizarre discrepancy between the declaration arrangement used for CustomExecMethods and that used for CustomScanMethods and CustomPathMethods. Clean up the commentary around ExecSupportsMarkRestore to reflect the rather large change in its API. Const-ify register_custom_path_provider's argument. This necessitates casting away const in the function, but that seems better than forcing callers of the function to do so (or else not const-ify their method pointer structs, which was sort of the whole point). De-export fix_expr_common. I don't like the exporting of fix_scan_expr or replace_nestloop_params either, but this one surely has got little excuse.
*	Fix another oversight in CustomScan patch.	Tom Lane	2014-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	execCurrent.c's search_plan_tree() must recognize a CustomScan on the target relation. This would only be helpful for custom providers that support CurrentOfExpr quals, which is probably a bit far-fetched, but it's not impossible I think. But even without assuming that, we need to recognize a scanned-relation match so that we will properly throw error if the desired relation is being scanned with both a CustomScan and a regular scan (ie, self-join). Also recognize ForeignScanState for similar reasons. Supporting WHERE CURRENT OF on a foreign table is probably even more far-fetched than it is for custom scans, but I think in principle you could do it with postgres_fdw (or another FDW that supports the ctid column). This would be a back-patchable bug fix if existing FDWs handled CurrentOfExpr, but I doubt any do so I won't bother back-patching.
*	Explicitly support the case that a plancache's raw_parse_tree is NULL.	Tom Lane	2014-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This only happens if a client issues a Parse message with an empty query string, which is a bit odd; but since it is explicitly called out as legal by our FE/BE protocol spec, we'd probably better continue to allow it. Fix by adding tests everywhere that the raw_parse_tree field is passed to functions that don't or shouldn't accept NULL. Also make it clear in the relevant comments that NULL is an expected case. This reverts commits a73c9dbab0165b3395dfe8a44a7dfd16166963c4 and 2e9650cbcff8c8fb0d9ef807c73a44f241822eee, which fixed specific crash symptoms by hacking things at what now seems to be the wrong end, ie the callee functions. Making the callees allow NULL is superficially more robust, but it's not always true that there is a defensible thing for the callee to do in such cases. The caller has more context and is better able to decide what the empty-query case ought to do. Per followup discussion of bug #11335. Back-patch to 9.2. The code before that is sufficiently different that it would require development of a separate patch, which doesn't seem worthwhile for what is believed to be an essentially cosmetic change.
*	Ensure that RowExprs and whole-row Vars produce the expected column names.	Tom Lane	2014-11-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At one time it wasn't terribly important what column names were associated with the fields of a composite Datum, but since the introduction of operations like row_to_json(), it's important that looking up the rowtype ID embedded in the Datum returns the column names that users would expect. That did not work terribly well before this patch: you could get the column names of the underlying table, or column aliases from any level of the query, depending on minor details of the plan tree. You could even get totally empty field names, which is disastrous for cases like row_to_json(). To fix this for whole-row Vars, look to the RTE referenced by the Var, and make sure its column aliases are applied to the rowtype associated with the result Datums. This is a tad scary because we might have to return a transient RECORD type even though the Var is declared as having some named rowtype. In principle it should be all right because the record type will still be physically compatible with the named rowtype; but I had to weaken one Assert in ExecEvalConvertRowtype, and there might be third-party code containing similar assumptions. Similarly, RowExprs have to be willing to override the column names coming from a named composite result type and produce a RECORD when the column aliases visible at the site of the RowExpr differ from the underlying table's column names. In passing, revert the decision made in commit 398f70ec070fe601 to add an alias-list argument to ExecTypeFromExprList: better to provide that functionality in a separate function. This also reverts most of the code changes in d68581483564ec0f, which we don't need because we're no longer depending on the tupdesc found in the child plan node's result slot to be blessed. Back-patch to 9.4, but not earlier, since this solution changes the results in some cases that users might not have realized were buggy. We'll apply a more restricted form of this patch in older branches.
*	Introduce custom path and scan providers.	Robert Haas	2014-11-07
\| \| \| \| \| \| \| \| \| \| \|	This allows extension modules to define their own methods for scanning a relation, and get the core code to use them. It's unclear as yet how much use this capability will find, but we won't find out if we never commit it. KaiGai Kohei, reviewed at various times and in various levels of detail by Shigeru Hanada, Tom Lane, Andres Freund, Álvaro Herrera, and myself.
*	C comments: adjust execTuples.c for new structure	Bruce Momjian	2014-10-13
\| \| \| \|	Report by Peter Geoghegan
*	Increase number of hash join buckets for underestimate.	Kevin Grittner	2014-10-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we expect batching at the very beginning, we size nbuckets for "full work_mem" (see how many tuples we can get into work_mem, while not breaking NTUP_PER_BUCKET threshold). If we expect to be fine without batching, we start with the 'right' nbuckets and track the optimal nbuckets as we go (without actually resizing the hash table). Once we hit work_mem (considering the optimal nbuckets value), we keep the value. At the end of the first batch, we check whether (nbuckets != nbuckets_optimal) and resize the hash table if needed. Also, we keep this value for all batches (it's OK because it assumes full work_mem, and it makes the batchno evaluation trivial). So the resize happens only once. There could be cases where it would improve performance to allow the NTUP_PER_BUCKET threshold to be exceeded to keep everything in one batch rather than spilling to a second batch, but attempts to generate such a case have so far been unsuccessful; that issue may be addressed with a follow-on patch after further investigation. Tomas Vondra with minor format and comment cleanup by me Reviewed by Robert Haas, Heikki Linnakangas, and Kevin Grittner
*	Implement SKIP LOCKED for row-level locks	Alvaro Herrera	2014-10-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This clause changes the behavior of SELECT locking clauses in the presence of locked rows: instead of causing a process to block waiting for the locks held by other processes (or raise an error, with NOWAIT), SKIP LOCKED makes the new reader skip over such rows. While this is not appropriate behavior for general purposes, there are some cases in which it is useful, such as queue-like tables. Catalog version bumped because this patch changes the representation of stored rules. Reviewed by Craig Ringer (based on a previous attempt at an implementation by Simon Riggs, who also provided input on the syntax used in the current patch), David Rowley, and Álvaro Herrera. Author: Thomas Munro
*	Row-Level Security Policies (RLS)	Stephen Frost	2014-09-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Building on the updatable security-barrier views work, add the ability to define policies on tables to limit the set of rows which are returned from a query and which are allowed to be added to a table. Expressions defined by the policy for filtering are added to the security barrier quals of the query, while expressions defined to check records being added to a table are added to the with-check options of the query. New top-level commands are CREATE/ALTER/DROP POLICY and are controlled by the table owner. Row Security is able to be enabled and disabled by the owner on a per-table basis using ALTER TABLE .. ENABLE/DISABLE ROW SECURITY. Per discussion, ROW SECURITY is disabled on tables by default and must be enabled for policies on the table to be used. If no policies exist on a table with ROW SECURITY enabled, a default-deny policy is used and no records will be visible. By default, row security is applied at all times except for the table owner and the superuser. A new GUC, row_security, is added which can be set to ON, OFF, or FORCE. When set to FORCE, row security will be applied even for the table owner and superusers. When set to OFF, row security will be disabled when allowed and an error will be thrown if the user does not have rights to bypass row security. Per discussion, pg_dump sets row_security = OFF by default to ensure that exports and backups will have all data in the table or will error if there are insufficient privileges to bypass row security. A new option has been added to pg_dump, --enable-row-security, to ask pg_dump to export with row security enabled. A new role capability, BYPASSRLS, which can only be set by the superuser, is added to allow other users to be able to bypass row security using row_security = OFF. Many thanks to the various individuals who have helped with the design, particularly Robert Haas for his feedback. Authors include Craig Ringer, KaiGai Kohei, Adam Brightwell, Dean Rasheed, with additional changes and rework by me. Reviewers have included all of the above, Greg Smith, Jeff McCormick, and Robert Haas.
*	Fix pointer type in size passed to memset.	Heikki Linnakangas	2014-09-14
\| \| \| \| \| \| \|	Pointers are all the same size, so it makes no practical difference, but let's be tidy. Found by Coverity, noted off-list by Tom Lane.
*	Change NTUP_PER_BUCKET to 1 to improve hash join lookup speed.	Robert Haas	2014-09-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since this makes the bucket headers use ~10x as much memory, properly account for that memory when we figure out whether everything fits in work_mem. This might result in some cases that previously used only a single batch getting split into multiple batches, but it's unclear as yet whether we need defenses against that case, and if so, what the shape of those defenses should be. It's worth noting that even in these edge cases, users should still be no worse off than they would have been last week, because commit 45f6240a8fa9d35548eb2ef23dba2c11540aa02a saved a big pile of memory on exactly the same workloads. Tomas Vondra, reviewed and somewhat revised by me.