aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/cache
Commit message (Collapse)AuthorAge
* Fix a crash in logical replicationPeter Eisentraut2019-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bug was that determining which columns are part of the replica identity index using RelationGetIndexAttrBitmap() would run eval_const_expressions() on index expressions and predicates across all indexes of the table, which in turn might require a snapshot, but there wasn't one set, so it crashes. There were actually two separate bugs, one on the publisher and one on the subscriber. To trigger the bug, a table that is part of a publication or subscription needs to have an index with a predicate or expression that lends itself to constant expressions simplification. The fix is to avoid the constant expressions simplification in RelationGetIndexAttrBitmap(), so that it becomes safe to call in these contexts. The constant expressions simplification comes from the calls to RelationGetIndexExpressions()/RelationGetIndexPredicate() via BuildIndexInfo(). But RelationGetIndexAttrBitmap() calling BuildIndexInfo() is overkill. The latter just takes pg_index catalog information, packs it into the IndexInfo structure, which former then just unpacks again and throws away. We can just do this directly with less overhead and skip the troublesome calls to eval_const_expressions(). This also removes the awkward cross-dependency between relcache.c and index.c. Bug: #15114 Reported-by: Петър Славов <pet.slavov@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/152110589574.1223.17983600132321618383@wrigleys.postgresql.org/
* PANIC on fsync() failure.Thomas Munro2018-11-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some operating systems, it doesn't make sense to retry fsync(), because dirty data cached by the kernel may have been dropped on write-back failure. In that case the only remaining copy of the data is in the WAL. A subsequent fsync() could appear to succeed, but not have flushed the data. That means that a future checkpoint could apparently complete successfully but have lost data. Therefore, violently prevent any future checkpoint attempts by panicking on the first fsync() failure. Note that we already did the same for WAL data; this change extends that behavior to non-temporary data files. Provide a GUC data_sync_retry to control this new behavior, for users of operating systems that don't eject dirty data, and possibly forensic/testing uses. If it is set to on and the write-back error was transient, a later checkpoint might genuinely succeed (on a system that does not throw away buffers on failure); if the error is permanent, later checkpoints will continue to fail. The GUC defaults to off, meaning that we panic. Back-patch to all supported releases. There is still a narrow window for error-loss on some operating systems: if the file is closed and later reopened and a write-back error occurs in the intervening time, but the inode has the bad luck to be evicted due to memory pressure before we reopen, we could miss the error. A later patch will address that with a scheme for keeping files with dirty data open at all times, but we judge that to be too complicated to back-patch. Author: Craig Ringer, with some adjustments by Thomas Munro Reported-by: Craig Ringer Reviewed-by: Robert Haas, Thomas Munro, Andres Freund Discussion: https://postgr.es/m/20180427222842.in2e4mibx45zdth5%40alap3.anarazel.de
* Limit depth of forced recursion for CLOBBER_CACHE_RECURSIVELY.Tom Lane2018-09-07
| | | | | | | | | | | | It's somewhat surprising that we got away with this before. (Actually, since nobody tests this routinely AFAIK, it might've been broken for awhile. But it's definitely broken in the wake of commit f868a8143.) It seems sufficient to limit the forced recursion to a small number of levels. Back-patch to all supported branches, like the preceding patch. Discussion: https://postgr.es/m/12259.1532117714@sss.pgh.pa.us
* Fix bugs in vacuum of shared rels, by keeping their relcache entries current.Andres Freund2018-06-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When vacuum processes a relation it uses the corresponding relcache entry's relfrozenxid / relminmxid as a cutoff for when to remove tuples etc. Unfortunately for nailed relations (i.e. critical system catalogs) bugs could frequently lead to the corresponding relcache entry being stale. This set of bugs could cause actual data corruption as vacuum would potentially not remove the correct row versions, potentially reviving them at a later point. After 699bf7d05c some corruptions in this vein were prevented, but the additional error checks could also trigger spuriously. Examples of such errors are: ERROR: found xmin ... from before relfrozenxid ... and ERROR: found multixact ... from before relminmxid ... To be caused by this bug the errors have to occur on system catalog tables. The two bugs are: 1) Invalidations for nailed relations were ignored, based on the theory that the relcache entry for such tables doesn't change. Which is largely true, except for fields like relfrozenxid etc. This means that changes to relations vacuumed in other sessions weren't picked up by already existing sessions. Luckily autovacuum doesn't have particularly longrunning sessions. 2) For shared *and* nailed relations, the shared relcache init file was never invalidated while running. That means that for such tables (e.g. pg_authid, pg_database) it's not just already existing sessions that are affected, but even new connections are as well. That explains why the reports usually were about pg_authid et. al. To fix 1), revalidate the rd_rel portion of a relcache entry when invalid. This implies a bit of extra complexity to deal with bootstrapping, but it's not too bad. The fix for 2) is simpler, simply always remove both the shared and local init files. Author: Andres Freund Reviewed-By: Alvaro Herrera Discussion: https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com Backpatch: 9.3-
* Fix improper uses of canonicalize_qual().Tom Lane2018-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the things canonicalize_qual() does is to remove constant-NULL subexpressions of top-level AND/OR clauses. It does that on the assumption that what it's given is a top-level WHERE clause, so that NULL can be treated like FALSE. Although this is documented down inside a subroutine of canonicalize_qual(), it wasn't mentioned in the documentation of that function itself, and some callers hadn't gotten that memo. Notably, commit d007a9505 caused get_relation_constraints() to apply canonicalize_qual() to CHECK constraints. That allowed constraint exclusion to misoptimize situations in which a CHECK constraint had a provably-NULL subclause, as seen in the regression test case added here, in which a child table that should be scanned is not. (Although this thinko is ancient, the test case doesn't fail before 9.2, for reasons I've not bothered to track down in detail. There may be related cases that do fail before that.) More recently, commit f0e44751d added an independent bug by applying canonicalize_qual() to index expressions, which is even sillier since those might not even be boolean. If they are, though, I think this could lead to making incorrect index entries for affected index expressions in v10. I haven't attempted to prove that though. To fix, add an "is_check" parameter to canonicalize_qual() to specify whether it should assume WHERE or CHECK semantics, and make it perform NULL-elimination accordingly. Adjust the callers to apply the right semantics, or remove the call entirely in cases where it's not known that the expression has one or the other semantics. I also removed the call in some cases involving partition expressions, where it should be a no-op because such expressions should be canonical already ... and was a no-op, independently of whether it could in principle have done something, because it was being handed the qual in implicit-AND format which isn't what it expects. In HEAD, add an Assert to catch that type of mistake in future. This represents an API break for external callers of canonicalize_qual(). While that's intentional in HEAD to make such callers think about which case applies to them, it seems like something we probably wouldn't be thanked for in released branches. Hence, in released branches, the extra parameter is added to a new function canonicalize_qual_ext(), and canonicalize_qual() is a wrapper that retains its old behavior. Patch by me with suggestions from Dean Rasheed. Back-patch to all supported branches. Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us
* Fix typcache's failure to treat ranges as container types.Tom Lane2017-10-20
| | | | | | | | | | | | | | | | | Like the similar logic for arrays and records, it's necessary to examine the range's subtype to decide whether the range type can support hashing. We can omit checking the subtype for btree-defined operations, though, since range subtypes are required to have those operations. (Possibly that simplification for btree cases led us to overlook that it does not apply for hash cases.) This is only an issue if the subtype lacks hash support, which is not true of any built-in range type, but it's easy to demonstrate a problem with a range type over, eg, money: you can get a "could not identify a hash function" failure when the planner is misled into thinking that hash join or aggregation would work. This was born broken, so back-patch to all supported branches.
* Remove AtEOXact_CatCache().Tom Lane2017-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sole useful effect of this function, to check that no catcache entries have positive refcounts at transaction end, has really been obsolete since we introduced ResourceOwners in PG 8.1. We reduced the checks to assertions years ago, so that the function was a complete no-op in production builds. There have been previous discussions about removing it entirely, but consensus up to now was that it had some small value as a cross-check for bugs in the ResourceOwner logic. However, it now emerges that it's possible to trigger these assertions if you hit an assert-enabled backend with SIGTERM during a call to SearchCatCacheList, because that function temporarily increases the refcounts of entries it's intending to add to a catcache list construct. In a normal ERROR scenario, the extra refcounts are cleaned up by SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a transaction abort and exit without ever executing PG_CATCH handlers. There's a case to be made that this is a generic hazard and we should consider restructuring elog(FATAL) handling so that pending PG_CATCH handlers do get run. That's pretty scary though: it could easily create more problems than it solves. Preliminary stress testing by Andreas Seltenreich suggests that there are not many live problems of this ilk, so we rejected that idea. There are more-localized ways to fix the problem; the most principled one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY. But adding cycles to SearchCatCacheList isn't very appealing. We could also weaken the assertions in AtEOXact_CatCache in some more or less ad-hoc way, but that just makes its raison d'etre even less compelling. In the end, the most reasonable solution seems to be to just remove AtEOXact_CatCache altogether, on the grounds that it's not worth trying to fix it. It hasn't found any bugs for us in many years. Per report from Jeevan Chalke. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com
* Fix new warnings from GCC 7Peter Eisentraut2017-05-16
| | | | | This addresses the new warning types -Wformat-truncation -Wformat-overflow that are part of -Wall, via -Wformat, in GCC 7.
* Avoid searching for callback functions in CallSyscacheCallbacks().Tom Lane2017-05-12
| | | | | | | | | | | | | | | | | | | | | | | | | We have now grown enough registerable syscache-invalidation callback functions that the original assumption that there would be few of them is causing performance problems. In particular, let's fix things so that CallSyscacheCallbacks doesn't have to search the whole array to find which callback(s) to invoke for a given cache ID. Preserve the original behavior that callbacks are called in order of registration, just in case there's someplace that depends on that (which I doubt). In support of this, export the number of syscaches from syscache.h. People could have found that out anyway from the enum, but adding a #define makes that much safer. This provides a useful additional speedup in Mathieu Fenniak's logical-decoding test case, although we're reaching the point of diminishing returns there. I think any further improvement will have to come from reducing the number of cache invalidations that are triggered in the first place. Still, we can hope that this change gives some incremental benefit for all invalidation scenarios. Back-patch to 9.4 where logical decoding was introduced. Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
* Reduce initial size of RelfilenodeMapHash.Tom Lane2017-05-12
| | | | | | | | | | | | | | | | | | | | | A test case provided by Mathieu Fenniak shows that hash_seq_search'ing this hashtable can consume a very significant amount of overhead during logical decoding, which triggers frequent cache invalidation. Testing suggests that the actual population of the hashtable is often no more than a few dozen entries, so we can cut the overhead just by dropping the initial number of buckets down from 1024 --- I chose to cut it to 64. (In situations where we do have a significant number of entries, we shouldn't get any real penalty from doing this, as the dynahash.c code will resize the hashtable automatically.) This gives a further factor-of-two savings in Mathieu's test case. That may be overly optimistic for real-world benefit, as real cases may have larger average table populations, but it's hard to see it turning into a net negative for any workload. Back-patch to 9.4 where relfilenodemap.c was introduced. Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
* Avoid searching for the target catcache in CatalogCacheIdInvalidate.Tom Lane2017-05-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | A test case provided by Mathieu Fenniak shows that the initial search for the target catcache in CatalogCacheIdInvalidate consumes a very significant amount of overhead in cases where cache invalidation is triggered but has little useful work to do. There is no good reason for that search to exist at all, as the index array maintained by syscache.c allows direct lookup of the catcache from its ID. We just need a frontend function in syscache.c, matching the division of labor for most other cache-accessing operations. While there's more that can be done in this area, this patch alone reduces the runtime of Mathieu's example by 2X. We can hope that it offers some useful benefit in other cases too, although usually cache invalidation overhead is not such a striking fraction of the total runtime. Back-patch to 9.4 where logical decoding was introduced. It might be worth going further back, but presently the only case we know of where cache invalidation is really a significant burden is in logical decoding. Also, older branches have fewer catcaches, reducing the possible benefit. (Note: although this nominally changes catcache's API, we have always documented CatalogCacheIdInvalidate as a private function, so I would have little sympathy for an external module calling it directly. So backpatching should be fine.) Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
* Increase MAX_SYSCACHE_CALLBACKS to provide more room for extensions.Tom Lane2017-05-11
| | | | | | | | | | | | | | | | | | Increase from the historical value of 32 to 64. We are up to 31 callers of CacheRegisterSyscacheCallback() in HEAD, so if they were all to be exercised in one process that would leave only one slot for add-on modules. It's probably not possible for that to happen, but still we clearly need more daylight here. (At some point it might be worth making the array dynamically resizable; but since we've never heard a complaint of "out of syscache_callback_list slots" happening in the field, I doubt it's worth it yet.) Back-patch as far as 9.4, which is where we increased the companion limit MAX_RELCACHE_CALLBACKS (cf commit f01d1ae3a). It's not as urgent in released branches, which have only a couple dozen call sites in core, but it still seems that somebody might hit the limit before these branches die. Discussion: https://postgr.es/m/12184.1494450131@sss.pgh.pa.us
* Further patch rangetypes_selfuncs.c's statistics slot management.Tom Lane2017-05-08
| | | | | | | | | | | | | | | Values in a STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM slot are float8, not of the type of the column the statistics are for. This bug is at least partly the fault of sloppy specification comments for get_attstatsslot()/free_attstatsslot(): the type OID they want is that of the stavalues entries, not of the underlying column. (I double-checked other callers and they seem to get this right.) Adjust the comments to be more correct. Per buildfarm. Security: CVE-2017-7484
* Avoid returning stale attribute bitmaps in RelationGetIndexAttrBitmap().Tom Lane2017-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem with the original coding here is that we might receive (and clear) a relcache invalidation signal for the target relation down inside one of the index_open calls we're doing. Since the target is open, we would not drop the relcache entry, just reset its rd_indexvalid and rd_indexlist fields. But RelationGetIndexAttrBitmap() kept going, and would eventually cache and return potentially-obsolete attribute bitmaps. The case where this matters is where the inval signal was from a CREATE INDEX CONCURRENTLY telling us about a new index on a formerly-unindexed column. (In all other cases, the lock we hold on the target rel should prevent any concurrent change in index state.) Even just returning the stale attribute bitmap is not such a problem, because it shouldn't matter during the transaction in which we receive the signal. What hurts is caching the stale data, because it can survive into later transactions, breaking CREATE INDEX CONCURRENTLY's expectation that later transactions will not create new broken HOT chains. The upshot is that there's a window for building corrupted indexes during CREATE INDEX CONCURRENTLY. This patch fixes the problem by rechecking that the set of index OIDs is still the same at the end of RelationGetIndexAttrBitmap() as it was at the start. If not, we loop back and try again. That's a little more than is strictly necessary to fix the bug --- in principle, we could return the stale data but not cache it --- but it seems like a bad idea on general principles for relcache to return data it knows is stale. There might be more hazards of the same ilk, or there might be a better way to fix this one, but this patch definitely improves matters and seems unlikely to make anything worse. So let's push it into today's releases even as we continue to study the problem. Pavan Deolasee and myself Discussion: https://postgr.es/m/CABOikdM2MUq9cyZJi1KyLmmkCereyGp5JQ4fuwKoyKEde_mzkQ@mail.gmail.com
* Fix typos in comments.Heikki Linnakangas2017-02-06
| | | | | | | | | Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
* Invalidate cached plans on FDW option changes.Tom Lane2017-01-06
| | | | | | | | | | | | | | | | | | | | | | | | This fixes problems where a plan must change but fails to do so, as seen in a bug report from Rajkumar Raghuwanshi. For ALTER FOREIGN TABLE OPTIONS, do this through the standard method of forcing a relcache flush on the table. For ALTER FOREIGN DATA WRAPPER and ALTER SERVER, just flush the whole plan cache on any change in pg_foreign_data_wrapper or pg_foreign_server. That matches the way we handle some other low-probability cases such as opclass changes, and it's unclear that the case arises often enough to be worth working harder. Besides, that gives a patch that is simple enough to back-patch with confidence. Back-patch to 9.3. In principle we could apply the code change to 9.2 as well, but (a) we lack postgres_fdw to test it with, (b) it's doubtful that anyone is doing anything exciting enough with FDWs that far back to need this desperately, and (c) the patch doesn't apply cleanly. Patch originally by Amit Langote, reviewed by Etsuro Fujita and Ashutosh Bapat, who each contributed substantial changes as well. Discussion: https://postgr.es/m/CAKcux6m5cA6rRPTKkqVdJ-R=KKDfe35Q_ZuUqxDSV_4hwga=og@mail.gmail.com
* Silence compiler warningsJoe Conway2017-01-02
| | | | | | | | | | | | In GetCachedPlan(), initialize 'plan' to silence a compiler warning, but also add an Assert() to make sure we don't ever actually fall through with 'plan' still being set to NULL, since we are about to dereference it. Back-patch back to 9.2. Author: Stephen Frost Discussion: https://postgr.es/m/20161129152102.GR13284%40tamriel.snowman.net
* Reset plan->row_security_env and planUserIdStephen Frost2016-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the plancache, we check if the environment we planned the query under has changed in a way which requires us to re-plan, such as when the user for whom the plan was prepared changes and RLS is being used (and, therefore, there may be different policies to apply). Unfortunately, while those values were set and checked, they were not being reset when the query was re-planned and therefore, in cases where we change role, re-plan, and then change role again, we weren't re-planning again. This leads to potentially incorrect policies being applied in cases where role-specific policies are used and a given query is planned under one role and then executed under other roles, which could happen under security definer functions or when a common user and query is planned initially and then re-used across multiple SET ROLEs. Further, extensions which made use of CopyCachedPlan() may suffer from similar issues as the RLS-related fields were not properly copied as part of the plan and therefore RevalidateCachedQuery() would copy in the current settings without invalidating the query. Fix by using the same approach used for 'search_path', where we set the correct values in CompleteCachedPlan(), check them early on in RevalidateCachedQuery() and then properly reset them if re-planning. Also, copy through the values during CopyCachedPlan(). Pointed out by Ashutosh Bapat. Reviewed by Michael Paquier. Back-patch to 9.5 where RLS was introduced. Security: CVE-2016-2193
* Avoid caching expression state trees for domain constraints across queries.Tom Lane2015-11-29
| | | | | | | | | | | | | | | | | | | | | | In commit 8abb3cda0ddc00a0ab98977a1633a95b97068d4e I attempted to cache the expression state trees constructed for domain CHECK constraints for the life of the backend (assuming the domain's constraints don't get redefined). However, this turns out not to work very well, because execQual.c will run those state trees with ecxt_per_query_memory pointing to a query-lifespan context, and in some situations we'll end up with pointers into that context getting stored into the state trees. This happens in particular with SQL-language functions, as reported by Emre Hasegeli, but there are many other cases. To fix, keep only the expression plan trees for domain CHECK constraints in the typcache's data structure, and revert to performing ExecInitExpr (at least) once per query to set up expression state trees in the query's context. Eventually it'd be nice to undo this, but that will require some careful thought about memory management for expression state trees, and it seems far too late for any such redesign in 9.5. This way is still much more efficient than what happened before 8abb3cda0.
* Remove the SECURITY_ROW_LEVEL_DISABLED security context bit.Noah Misch2015-09-20
| | | | | | | | | | This commit's parent made superfluous the bit's sole usage. Referential integrity checks have long run as the subject table's owner, and that now implies RLS bypass. Safe use of the bit was tricky, requiring strict control over the SQL expressions evaluating therein. Back-patch to 9.5, where the bit was introduced. Based on a patch by Stephen Frost.
* RLS refactoringStephen Frost2015-09-15
| | | | | | | | | | | | | | | | This refactors rewrite/rowsecurity.c to simplify the handling of the default deny case (reducing the number of places where we check for and add the default deny policy from three to one) by splitting up the retrival of the policies from the application of them. This also allowed us to do away with the policy_id field. A policy_name field was added for WithCheckOption policies and is used in error reporting, when available. Patch by Dean Rasheed, with various mostly cosmetic changes by me. Back-patch to 9.5 where RLS was introduced to avoid unnecessary differences, since we're still in alpha, per discussion with Robert.
* Fix subtransaction cleanup after an outer-subtransaction portal fails.Tom Lane2015-09-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Formerly, we treated only portals created in the current subtransaction as having failed during subtransaction abort. However, if the error occurred while running a portal created in an outer subtransaction (ie, a cursor declared before the last savepoint), that has to be considered broken too. To allow reliable detection of which ones those are, add a bookkeeping field to struct Portal that tracks the innermost subtransaction in which each portal has actually been executed. (Without this, we'd end up failing portals containing functions that had called the subtransaction, thereby breaking plpgsql exception blocks completely.) In addition, when we fail an outer-subtransaction Portal, transfer its resources into the subtransaction's resource owner, so that they're released early in cleanup of the subxact. This fixes a problem reported by Jim Nasby in which a function executed in an outer-subtransaction cursor could cause an Assert failure or crash by referencing a relation created within the inner subtransaction. The proximate cause of the Assert failure is that AtEOSubXact_RelationCache assumed it could blow away a relcache entry without first checking that the entry had zero refcount. That was a bad idea on its own terms, so add such a check there, and to the similar coding in AtEOXact_RelationCache. This provides an independent safety measure in case there are still ways to provoke the situation despite the Portal-level changes. This has been broken since subtransactions were invented, so back-patch to all supported branches. Tom Lane and Michael Paquier
* Plug RLS related information leak in pg_stats view.Joe Conway2015-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The pg_stats view is supposed to be restricted to only show rows about tables the user can read. However, it sometimes can leak information which could not otherwise be seen when row level security is enabled. Fix that by not showing pg_stats rows to users that would be subject to RLS on the table the row is related to. This is done by creating/using the newly introduced SQL visible function, row_security_active(). Along the way, clean up three call sites of check_enable_rls(). The second argument of that function should only be specified as other than InvalidOid when we are checking as a different user than the current one, as in when querying through a view. These sites were passing GetUserId() instead of InvalidOid, which can cause the function to return incorrect results if the current user has the BYPASSRLS privilege and row_security has been set to OFF. Additionally fix a bug causing RI Trigger error messages to unintentionally leak information when RLS is enabled, and other minor cleanup and improvements. Also add WITH (security_barrier) to the definition of pg_stats. Bumped CATVERSION due to new SQL functions and pg_stats view definition. Back-patch to 9.5 where RLS was introduced. Reported by Yaroslav. Patch by Joe Conway and Dean Rasheed with review and input by Michael Paquier and Stephen Frost.
* Redesign tablesample method API, and do extensive code review.Tom Lane2015-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original implementation of TABLESAMPLE modeled the tablesample method API on index access methods, which wasn't a good choice because, without specialized DDL commands, there's no way to build an extension that can implement a TSM. (Raw inserts into system catalogs are not an acceptable thing to do, because we can't undo them during DROP EXTENSION, nor will pg_upgrade behave sanely.) Instead adopt an API more like procedural language handlers or foreign data wrappers, wherein the only SQL-level support object needed is a single handler function identified by having a special return type. This lets us get rid of the supporting catalog altogether, so that no custom DDL support is needed for the feature. Adjust the API so that it can support non-constant tablesample arguments (the original coding assumed we could evaluate the argument expressions at ExecInitSampleScan time, which is undesirable even if it weren't outright unsafe), and discourage sampling methods from looking at invisible tuples. Make sure that the BERNOULLI and SYSTEM methods are genuinely repeatable within and across queries, as required by the SQL standard, and deal more honestly with methods that can't support that requirement. Make a full code-review pass over the tablesample additions, and fix assorted bugs, omissions, infelicities, and cosmetic issues (such as failure to put the added code stanzas in a consistent ordering). Improve EXPLAIN's output of tablesample plans, too. Back-patch to 9.5 so that we don't have to support the original API in production.
* Fix the logic for putting relations into the relcache init file.Tom Lane2015-06-25
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit f3b5565dd4e59576be4c772da364704863e6a835 was a couple of bricks shy of a load; specifically, it missed putting pg_trigger_tgrelid_tgname_index into the relcache init file, because that index is not used by any syscache. However, we have historically nailed that index into cache for performance reasons. The upshot was that load_relcache_init_file always decided that the init file was busted and silently ignored it, resulting in a significant hit to backend startup speed. To fix, reinstantiate RelationIdIsInInitFile() as a wrapper around RelationSupportsSysCache(), which can know about additional relations that should be in the init file despite being unknown to syscache.c. Also install some guards against future mistakes of this type: make write_relcache_init_file Assert that all nailed relations get written to the init file, and make load_relcache_init_file emit a WARNING if it takes the "wrong number of nailed relations" exit path. Now that we remove the init files during postmaster startup, that case should never occur in the field, even if we are starting a minor-version update that added or removed rels from the nailed set. So the warning shouldn't ever be seen by end users, but it will show up in the regression tests if somebody breaks this logic. Back-patch to all supported branches, like the previous commit.
* Use a safer method for determining whether relcache init file is stale.Tom Lane2015-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
* pgindent run for 9.5Bruce Momjian2015-05-23
|
* Collection of typo fixes.Heikki Linnakangas2015-05-20
| | | | | | | | | | | | | | | Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.
* TABLESAMPLE, SQL Standard and extensibleSimon Riggs2015-05-15
| | | | | | | | | | | | | | Add a TABLESAMPLE clause to SELECT statements that allows user to specify random BERNOULLI sampling or block level SYSTEM sampling. Implementation allows for extensible sampling functions to be written, using a standard API. Basic version follows SQLStandard exactly. Usable concrete use cases for the sampling API follow in later commits. Petr Jelinek Reviewed by Michael Paquier and Simon Riggs
* Add new OID alias type regnamespaceAndrew Dunstan2015-05-09
| | | | | | Catalog version bumped Kyotaro HORIGUCHI
* Add new OID alias type regroleAndrew Dunstan2015-05-09
| | | | | | | | | | | | | | The new type has the scope of whole the database cluster so it doesn't behave the same as the existing OID alias types which have database scope, concerning object dependency. To avoid confusion constants of the new type are prohibited from appearing where dependencies are made involving it. Also, add a note to the docs about possible MVCC violation and optimization issues, which are general over the all reg* types. Kyotaro Horiguchi
* Introduce replication progress tracking infrastructure.Andres Freund2015-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de
* Fix various typos and grammar errors in comments.Andres Freund2015-04-26
| | | | | Author: Dmitriy Olshevskiy Discussion: 553D00A6.4090205@bk.ru
* Add transforms featurePeter Eisentraut2015-04-26
| | | | | | | | This provides a mechanism for specifying conversions between SQL data types and procedural languages. As examples, there are transforms for hstore and ltree for PL/Perl and PL/Python. reviews by Pavel Stěhule and Andres Freund
* Fix typo in relcache's equalPolicy()Stephen Frost2015-04-17
| | | | | | | | | | | | | The USING policies were not being checked for differences as the same policy was being passed in to both sides of the equal(). This could result in backends not realizing that a policy had been changed, if none of the other attributes had been changed. Fix by passing to equal() the policy1 and policy2 using quals for comparison. No need to back-patch as this is not yet released. Noticed while testing changes to RLS proposed by Dean Rasheed.
* Reorganize our CRC source files again.Heikki Linnakangas2015-04-14
| | | | | | | | | | Now that we use CRC-32C in WAL and the control file, the "traditional" and "legacy" CRC-32 variants are not used in any frontend programs anymore. Move the code for those back from src/common to src/backend/utils/hash. Also move the slicing-by-8 implementation (back) to src/port. This is in preparation for next patch that will add another implementation that uses Intel SSE 4.2 instructions to calculate CRC-32C, where available.
* pg_event_trigger_dropped_objects: add is_temp columnAlvaro Herrera2015-04-06
| | | | | | | | | | It now also reports temporary objects dropped that are local to the backend. Previously we weren't reporting any temp objects because it was deemed unnecessary; but as it turns out, it is necessary if we want to keep close track of DDL command execution inside one session. Temp objects are reported as living in schema pg_temp, which works because such a schema-qualification always refers to the temp objects of the current session.
* Apply table and domain CHECK constraints in name order.Tom Lane2015-03-23
| | | | | | | | | | | | | | | | | | | | | | | | Previously, CHECK constraints of the same scope were checked in whatever order they happened to be read from pg_constraint. (Usually, but not reliably, this would be creation order for domain constraints and reverse creation order for table constraints, because of differing implementation details.) Nondeterministic results of this sort are problematic at least for testing purposes, and in discussion it was agreed to be a violation of the principle of least astonishment. Therefore, borrow the principle already established for triggers, and apply such checks in name order (using strcmp() sort rules). This lets users control the check order if they have a mind to. Domain CHECK constraints still follow the rule of checking lower nested domains' constraints first; the name sort only applies to multiple constraints attached to the same domain. In passing, I failed to resist the temptation to wordsmith a bit in create_domain.sgml. Apply to HEAD only, since this could result in a behavioral change in existing applications, and the potential regression test failures have not actually been observed in our buildfarm.
* Use the typcache to cache constraints for domain types.Tom Lane2015-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we cached domain constraints for the life of a query, or really for the life of the FmgrInfo struct that was used to invoke domain_in() or domain_check(). But plpgsql (and probably other places) are set up to cache such FmgrInfos for the whole lifespan of a session, which meant they could be enforcing really stale sets of constraints. On the other hand, searching pg_constraint once per query gets kind of expensive too: testing says that as much as half the runtime of a trivial query such as "SELECT 0::domaintype" went into that. To fix this, delegate the responsibility for tracking a domain's constraints to the typcache, which has the infrastructure needed to detect syscache invalidation events that signal possible changes. This not only removes unnecessary repeat reads of pg_constraint, but ensures that we never apply stale constraint data: whatever we use is the current data according to syscache rules. Unfortunately, the current configuration of the system catalogs means we have to flush cached domain-constraint data whenever either pg_type or pg_constraint changes, which happens rather a lot (eg, creation or deletion of a temp table will do it). It might be worth rearranging things to split pg_constraint into two catalogs, of which the domain constraint one would probably be very low-traffic. That's a job for another patch though, and in any case this patch should improve matters materially even with that handicap. This patch makes use of the recently-added memory context reset callback feature to manage the lifespan of domain constraint caches, so that we don't risk deleting a cache that might be in the midst of evaluation. Although this is a bug fix as well as a performance improvement, no back-patch. There haven't been many if any field complaints about stale domain constraint checks, so it doesn't seem worth taking the risk of modifying data structures as basic as MemoryContexts in back branches.
* Use FLEXIBLE_ARRAY_MEMBER in some more places.Tom Lane2015-02-20
| | | | | | Fix a batch of structs that are only visible within individual .c files. Michael Paquier
* Use FLEXIBLE_ARRAY_MEMBER in a bunch more places.Tom Lane2015-02-20
| | | | | | | | | | | | | | | | Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]". Aside from being more self-documenting, this should help prevent bogus warnings from static code analyzers and perhaps compiler misoptimizations. This patch is just a down payment on eliminating the whole problem, but it gets rid of a lot of easy-to-fix cases. Note that the main problem with doing this is that one must no longer rely on computing sizeof(the containing struct), since the result would be compiler-dependent. Instead use offsetof(struct, lastfield). Autoconf also warns against spelling that offsetof(struct, lastfield[0]). Michael Paquier, review and additional fixes by me.
* Fix column-privilege leak in error-message pathsStephen Frost2015-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | While building error messages to return to the user, BuildIndexValueDescription, ExecBuildSlotValueDescription and ri_ReportViolation would happily include the entire key or entire row in the result returned to the user, even if the user didn't have access to view all of the columns being included. Instead, include only those columns which the user is providing or which the user has select rights on. If the user does not have any rights to view the table or any of the columns involved then no detail is provided and a NULL value is returned from BuildIndexValueDescription and ExecBuildSlotValueDescription. Note that, for key cases, the user must have access to all of the columns for the key to be shown; a partial key will not be returned. Further, in master only, do not return any data for cases where row security is enabled on the relation and row security should be applied for the user. This required a bit of refactoring and moving of things around related to RLS- note the addition of utils/misc/rls.c. Back-patch all the way, as column-level privileges are now in all supported versions. This has been assigned CVE-2014-8161, but since the issue and the patch have already been publicized on pgsql-hackers, there's no point in trying to hide this commit.
* Clean up some mess in row-security patches.Tom Lane2015-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | Fix unsafe coding around PG_TRY in RelationBuildRowSecurity: can't change a variable inside PG_TRY and then use it in PG_CATCH without marking it "volatile". In this case though it seems saner to avoid that by doing a single assignment before entering the TRY block. I started out just intending to fix that, but the more I looked at the row-security code the more distressed I got. This patch also fixes incorrect construction of the RowSecurityPolicy cache entries (there was not sufficient care taken to copy pass-by-ref data into the cache memory context) and a whole bunch of sloppiness around the definition and use of pg_policy.polcmd. You can't use nulls in that column because initdb will mark it NOT NULL --- and I see no particular reason why a null entry would be a good idea anyway, so changing initdb's behavior is not the right answer. The internal value of '\0' wouldn't be suitable in a "char" column either, so after a bit of thought I settled on using '*' to represent ALL. Chasing those changes down also revealed that somebody wasn't paying attention to what the underlying values of ACL_UPDATE_CHR etc really were, and there was a great deal of lackadaiscalness in the catalogs.sgml documentation for pg_policy and pg_policies too. This doesn't pretend to be a complete code review for the row-security stuff, it just fixes the things that were in my face while dealing with the bugs in RelationBuildRowSecurity.
* Correctly handle relcache invalidation corner case during logical decoding.Andres Freund2015-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using a historic snapshot for logical decoding it can validly happen that a relation that's in the relcache isn't visible to that historic snapshot. E.g. if a newly created relation is referenced in the query that uses the SQL interface for logical decoding and a sinval reset occurs. The earlier commit that fixed the error handling for that corner case already improves the situation as a ERROR is better than hitting an assertion... But it's obviously not good enough. So additionally allow that case without an error if a historic snapshot is set up - that won't allow an invalid entry to stay in the cache because it's a) already marked invalid and will thus be rebuilt during the next access b) the syscaches will be reset at the end of decoding. There might be prettier solutions to handle this case, but all that we could think of so far end up being much more complex than this quite simple fix. This fixes the assertion failures reported by the buildfarm (markhor, tick, leech) after the introduction of new regression tests in 89fd41b390a4. The failure there weren't actually directly caused by CLOBBER_CACHE_ALWAYS but the extraordinary long runtimes due to it lead to sinval resets triggering the behaviour. Discussion: 22459.1418656530@sss.pgh.pa.us Backpatch to 9.4 where logical decoding was introduced.
* Improve relcache invalidation handling of currently invisible relations.Andres Freund2015-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | The corner case where a relcache invalidation tried to rebuild the entry for a referenced relation but couldn't find it in the catalog wasn't correct. The code tried to RelationCacheDelete/RelationDestroyRelation the entry. That didn't work when assertions are enabled because the latter contains an assertion ensuring the refcount is zero. It's also more generally a bad idea, because by virtue of being referenced somebody might actually look at the entry, which is possible if the error is trapped and handled via a subtransaction abort. Instead just error out, without deleting the entry. As the entry is marked invalid, the worst that can happen is that the invalid (and at some point unused) entry lingers in the relcache. Discussion: 22459.1418656530@sss.pgh.pa.us There should be no way to hit this case < 9.4 where logical decoding introduced a bug that can hit this. But since the code for handling the corner case is there it should do something halfway sane, so backpatch all the the way back. The logical decoding bug will be handled in a separate commit.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Improve hash_create's API for selecting simple-binary-key hash functions.Tom Lane2014-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if you wanted anything besides C-string hash keys, you had to specify a custom hashing function to hash_create(). Nearly all such callers were specifying tag_hash or oid_hash; which is tedious, and rather error-prone, since a caller could easily miss the opportunity to optimize by using hash_uint32 when appropriate. Replace this with a design whereby callers using simple binary-data keys just specify HASH_BLOBS and don't need to mess with specific support functions. hash_create() itself will take care of optimizing when the key size is four bytes. This nets out saving a few hundred bytes of code space, and offers a measurable performance improvement in tidbitmap.c (which was not exploiting the opportunity to use hash_uint32 for its 4-byte keys). There might be some wins elsewhere too, I didn't analyze closely. In future we could look into offering a similar optimized hashing function for 8-byte keys. Under this design that could be done in a centralized and machine-independent fashion, whereas getting it right for keys of platform-dependent sizes would've been notationally painful before. For the moment, the old way still works fine, so as not to break source code compatibility for loadable modules. Eventually we might want to remove tag_hash and friends from the exported API altogether, since there's no real need for them to be explicitly referenced from outside dynahash.c. Teodor Sigaev and Tom Lane
* Event Trigger for table_rewriteSimon Riggs2014-12-08
| | | | | | | | | | | | | | Generate a table_rewrite event when ALTER TABLE attempts to rewrite a table. Provide helper functions to identify table and reason. Intended use case is to help assess or to react to schema changes that might hold exclusive locks for long periods. Dimitri Fontaine, triggering an edit by Simon Riggs Reviewed in detail by Michael Paquier
* Improve typcache: cache negative lookup results, add invalidation logic.Tom Lane2014-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if the typcache had for example tried and failed to find a hash opclass for a given data type, it would nonetheless repeat the unsuccessful catalog lookup each time it was asked again. This can lead to a significant amount of useless bufmgr traffic, as in a recent report from Scott Marlowe. Like the catalog caches, typcache should be able to cache negative results. This patch arranges that by making use of separate flag bits to remember whether a particular item has been looked up, rather than treating a zero OID as an indicator that no lookup has been done. Also, install a credible invalidation mechanism, namely watching for inval events in pg_opclass. The sole advantage of the lack of negative caching was that the code would cope if operators or opclasses got added for a type mid-session; to preserve that behavior we have to be able to invalidate stale lookup results. Updates in pg_opclass should be pretty rare in production systems, so it seems sufficient to just invalidate all the dependent data whenever one happens. Adding proper invalidation also means that this code will now react sanely if an opclass is dropped mid-session. Arguably, that's a back-patchable bug fix, but in view of the lack of complaints from the field I'll refrain from back-patching. (Probably, in most cases where an opclass is dropped, the data type itself is dropped soon after, so that this misfeasance has no bad consequences.)
* Rename pg_rowsecurity -> pg_policy and other fixesStephen Frost2014-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | As pointed out by Robert, we should really have named pg_rowsecurity pg_policy, as the objects stored in that catalog are policies. This patch fixes that and updates the column names to start with 'pol' to match the new catalog name. The security consideration for COPY with row level security, also pointed out by Robert, has also been addressed by remembering and re-checking the OID of the relation initially referenced during COPY processing, to make sure it hasn't changed under us by the time we finish planning out the query which has been built. Robert and Alvaro also commented on missing OCLASS and OBJECT entries for POLICY (formerly ROWSECURITY or POLICY, depending) in various places. This patch fixes that too, which also happens to add the ability to COMMENT on policies. In passing, attempt to improve the consistency of messages, comments, and documentation as well. This removes various incarnations of 'row-security', 'row-level security', 'Row-security', etc, in favor of 'policy', 'row level security' or 'row_security' as appropriate. Happy Thanksgiving!