aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/cache/relcache.c
Commit message (Collapse)AuthorAge
* Arrange to cache FdwRoutine structs in foreign tables' relcache entries.Tom Lane2013-03-06
| | | | | | | This saves several catalog lookups per reference. It's not all that exciting right now, because we'd managed to minimize the number of places that need to fetch the data; but the upcoming writable-foreign-tables patch needs this info in a lot more places.
* Add a materialized view relations.Kevin Grittner2013-03-03
| | | | | | | | | | | | | | | | | | | | | | A materialized view has a rule just like a view and a heap and other physical properties like a table. The rule is only used to populate the table, references in queries refer to the materialized data. This is a minimal implementation, but should still be useful in many cases. Currently data is only populated "on demand" by the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements. It is expected that future releases will add incremental updates with various timings, and that a more refined concept of defining what is "fresh" data will be developed. At some point it may even be possible to have queries use a materialized in place of references to underlying tables, but that requires the other above-mentioned features to be working first. Much of the documentation work by Robert Haas. Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja Security review by KaiGai Kohei, with a decision on how best to implement sepgsql still pending.
* Move relpath() to libpgcommonAlvaro Herrera2013-02-21
| | | | | | | This enables non-backend code, such as pg_xlogdump, to use it easily. The previous location, in src/backend/catalog/catalog.c, made that essentially impossible because that file depends on many backend-only facilities; so this needs to live separately.
* Provide database object names as separate fields in error messages.Tom Lane2013-01-29
| | | | | | | | | | | | | | | | | | This patch addresses the problem that applications currently have to extract object names from possibly-localized textual error messages, if they want to know for example which index caused a UNIQUE_VIOLATION failure. It adds new error message fields to the wire protocol, which can carry the name of a table, table column, data type, or constraint associated with the error. (Since the protocol spec has always instructed clients to ignore unrecognized field types, this should not create any compatibility problem.) Support for providing these new fields has been added to just a limited set of error reports (mainly, those in the "integrity constraint violation" SQLSTATE class), but we will doubtless add them to more calls in future. Pavel Stehule, reviewed and extensively revised by Peter Geoghegan, with additional hacking by Tom Lane.
* Improve concurrency of foreign key lockingAlvaro Herrera2013-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov
* Fix an O(N^2) performance issue for sessions modifying many relations.Tom Lane2013-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | AtEOXact_RelationCache() scanned the entire relation cache at the end of any transaction that created a new relation or assigned a new relfilenode. Thus, clients such as pg_restore had an O(N^2) performance problem that would start to be noticeable after creating 10000 or so tables. Since typically only a small number of relcache entries need any cleanup, we can fix this by keeping a small list of their OIDs and doing hash_searches for them. We fall back to the full-table scan if the list overflows. Ideally, the maximum list length would be set at the point where N hash_searches would cost just less than the full-table scan. Some quick experimentation says that point might be around 50-100; I (tgl) conservatively set MAX_EOXACT_LIST = 32. For the case that we're worried about here, which is short single-statement transactions, it's unlikely there would ever be more than about a dozen list entries anyway; so it's probably not worth being too tense about the value. We could avoid the hash_searches by instead keeping the target relcache entries linked into a list, but that would be noticeably more complicated and bug-prone because of the need to maintain such a list in the face of relcache entry drops. Since a relcache entry can only need such cleanup after a somewhat-heavyweight filesystem operation, trying to save a hash_search per cleanup doesn't seem very useful anyway --- it's the scan over all the not-needing-cleanup entries that we wish to avoid here. Jeff Janes, reviewed and tweaked a bit by Tom Lane
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Keep rd_newRelfilenodeSubid across overflow.Simon Riggs2012-12-24
| | | | | | | Teach RelationCacheInvalidate() to keep rd_newRelfilenodeSubid across rel cache message overflows, so that behaviour is now fully deterministic. Noah Misch
* Fix failure to ignore leftover temp tables after a server crash.Tom Lane2012-12-17
| | | | | | | | | | | | | | | | | | | During crash recovery, we remove disk files belonging to temporary tables, but the system catalog entries for such tables are intentionally not cleaned up right away. Instead, the first backend that uses a temp schema is expected to clean out any leftover objects therein. This approach requires that we be careful to ignore leftover temp tables (since any actual access attempt would fail), *even if their BackendId matches our session*, if we have not yet established use of the session's corresponding temp schema. That worked fine in the past, but was broken by commit debcec7dc31a992703911a9953e299c8d730c778 which incorrectly removed the rd_islocaltemp relcache flag. Put it back, and undo various changes that substituted tests like "rel->rd_backend == MyBackendId" for use of a state-aware flag. Per trouble report from Heikki Linnakangas. Back-patch to 9.1 where the erroneous change was made. In the back branches, be careful to add rd_islocaltemp in a spot in the struct that was alignment padding before, so as not to break existing add-on code.
* Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY.Tom Lane2012-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, add an additional boolean column "indislive" to pg_index, so that the freshly-created and about-to-die states can be distinguished. (This change obviously is only possible in HEAD. This patch will need to be back-patched, but in 9.2 we'll use a kluge consisting of overloading the formerly-impossible state of indisvalid = true and indisready = false.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. This will need to be back-patched, but in a noticeably different form, so I'm committing it to HEAD before working on the back-patch. Problem reported by Amit Kapila, diagnosis by Pavan Deolassee, fix by Tom Lane and Andres Freund.
* Make equal() ignore CoercionForm fields for better planning with casts.Tom Lane2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | This change ensures that the planner will see implicit and explicit casts as equivalent for all purposes, except in the minority of cases where there's actually a semantic difference (as reflected by having a 3-argument cast function). In particular, this fixes cases where the EquivalenceClass machinery failed to consider two references to a varchar column as equivalent if one was implicitly cast to text but the other was explicitly cast to text, as seen in bug #7598 from Vaclav Juza. We have had similar bugs before in other parts of the planner, so I think it's time to fix this problem at the core instead of continuing to band-aid around it. Remove set_coercionform_dontcare(), which represents the band-aid previously in use for allowing matching of index and constraint expressions with inconsistent cast labeling. (We can probably get rid of COERCE_DONTCARE altogether, but I don't think removing that enum value in back branches would be wise; it's possible there's third party code referring to it.) Back-patch to 9.2. We could go back further, and might want to once this has been tested more; but for the moment I won't risk destabilizing plan choices in long-since-stable branches.
* Split tuple struct defs from htup.h to htup_details.hAlvaro Herrera2012-08-30
| | | | | | | | | | | | This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.
* Split resowner.hAlvaro Herrera2012-08-28
| | | | | This lets files that are mere users of ResourceOwner not automatically include the headers for stuff that is managed by the resowner mechanism.
* Remove RELKIND_UNCATALOGED.Robert Haas2012-06-14
| | | | | | | This may have been important at some point in the past, but it no longer does anything useful. Review by Tom Lane.
* Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian2012-06-10
| | | | commit-fest.
* Recast "ONLY" column CHECK constraints as NO INHERITAlvaro Herrera2012-04-20
| | | | | | | | | | | | | | | | | The original syntax wasn't universally loved, and it didn't allow its usage in CREATE TABLE, only ALTER TABLE. It now works everywhere, and it also allows using ALTER TABLE ONLY to add an uninherited CHECK constraint, per discussion. The pg_constraint column has accordingly been renamed connoinherit. This commit partly reverts some of the changes in 61d81bd28dbec65a6b144e0cd3d0bfe25913c3ac, particularly some pg_dump and psql bits, because now pg_get_constraintdef includes the necessary NO INHERIT within the constraint definition. Author: Nikhil Sontakke Some tweaks by me
* Add DROP INDEX CONCURRENTLY [IF EXISTS], uses ShareUpdateExclusiveLockSimon Riggs2012-04-06
|
* Do not access indclass through Form_pg_indexPeter Eisentraut2012-01-27
| | | | | | | | | | | | Normally, accessing variable-length members of catalog structures past the first one doesn't work at all. Here, it happened to work because indnatts was checked to be 1, and so the defined FormData_pg_index layout, using int2vector[1] and oidvector[1] for variable-length arrays, happened to match the actual memory layout. But it's a very fragile assumption, and it's not in a performance-critical path, so code it properly using heap_getattr() instead. bug analysis by Tom Lane
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Add a security_barrier option for views.Robert Haas2011-12-22
| | | | | | | | | | | | | | When a view is marked as a security barrier, it will not be pulled up into the containing query, and no quals will be pushed down into it, so that no function or operator chosen by the user can be applied to rows not exposed by the view. Views not configured with this option cannot provide robust row-level security, but will perform far better. Patch by KaiGai Kohei; original problem report by Heikki Linnakangas (in October 2009!). Review (in earlier versions) by Noah Misch and others. Design advice by Tom Lane and myself. Further review and cleanup by me.
* Allow CHECK constraints to be declared ONLYAlvaro Herrera2011-12-19
| | | | | | | | | | | | | This makes them enforceable only on the parent table, not on children tables. This is useful in various situations, per discussion involving people bitten by the restrictive behavior introduced in 8.4. Message-Id: 8762mp93iw.fsf@comcast.net CAFaPBrSMMpubkGf4zcRL_YL-AERUbYF_-ZNNYfb3CVwwEqc9TQ@mail.gmail.com Authors: Nikhil Sontakke, Alex Hunsaker Reviewed by Robert Haas and myself
* Measure the number of all-visible pages for use in index-only scan costing.Tom Lane2011-10-14
| | | | | | | | | | | | | | | | | Add a column pg_class.relallvisible to remember the number of pages that were all-visible according to the visibility map as of the last VACUUM (or ANALYZE, or some other operations that update pg_class.relpages). Use relallvisible/relpages, instead of an arbitrary constant, to estimate how many heap page fetches can be avoided during an index-only scan. This is pretty primitive and will no doubt see refinements once we've acquired more field experience with the index-only scan mechanism, but it's way better than using a constant. Note: I had to adjust an underspecified query in the window.sql regression test, because it was changing answers when the plan changed to use an index-only scan. Some of the adjacent tests perhaps should be adjusted as well, but I didn't do that here.
* Support index-only scans using the visibility map to avoid heap fetches.Tom Lane2011-10-07
| | | | | | | | | | | | | When a btree index contains all columns required by the query, and the visibility map shows that all tuples on a target heap page are visible-to-all, we don't need to fetch that heap page. This patch depends on the previous patches that made the visibility map reliable. There's a fair amount left to do here, notably trying to figure out a less chintzy way of estimating the cost of an index-only scan, but the core functionality seems ready to commit. Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
* Clean up the #include mess a little.Tom Lane2011-09-04
| | | | | | | | | | | | | | | | | walsender.h should depend on xlog.h, not vice versa. (Actually, the inclusion was circular until a couple hours ago, which was even sillier; but Bruce broke it in the expedient rather than logically correct direction.) Because of that poor decision, plus blind application of pgrminclude, we had a situation where half the system was depending on xlog.h to include such unrelated stuff as array.h and guc.h. Clean up the header inclusion, and manually revert a lot of what pgrminclude had done so things build again. This episode reinforces my feeling that pgrminclude should not be run without adult supervision. Inclusion changes in header files in particular need to be reviewed with great care. More generally, it'd be good if we had a clearer notion of module layering to dictate which headers can sanely include which others ... but that's a big task for another day.
* Remove unnecessary #include references, per pgrminclude script.Bruce Momjian2011-09-01
|
* Fix a missed case in code for "moving average" estimate of reltuples.Tom Lane2011-08-30
| | | | | | | | | | | | | | | | | | | | | | | | It is possible for VACUUM to scan no pages at all, if the visibility map shows that all pages are all-visible. In this situation VACUUM has no new information to report about the relation's tuple density, so it wasn't changing pg_class.reltuples ... but it updated pg_class.relpages anyway. That's wrong in general, since there is no evidence to justify changing the density ratio reltuples/relpages, but it's particularly bad if the previous state was relpages=reltuples=0, which means "unknown tuple density". We just replaced "unknown" with "zero". ANALYZE would eventually recover from this, but it could take a lot of repetitions of ANALYZE to do so if the relation size is much larger than the maximum number of pages ANALYZE will scan, because of the moving-average behavior introduced by commit b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. The only known situation where we could have relpages=reltuples=0 and yet the visibility map asserts everything's visible is immediately following a pg_upgrade. It might be advisable for pg_upgrade to try to preserve the relpages/reltuples statistics; but in any case this code is wrong on its own terms, so fix it. Per report from Sergey Koposov. Back-patch to 8.4, where the visibility map was introduced, same as the previous change.
* Fix incorrect order of operations during sinval reset processing.Tom Lane2011-08-16
| | | | | | | | | | | | | | | | | | | | | | | | | We have to be sure that we have revalidated each nailed-in-cache relcache entry before we try to use it to load data for some other relcache entry. The introduction of "mapped relations" in 9.0 broke this, because although we updated the state kept in relmapper.c early enough, we failed to propagate that information into relcache entries soon enough; in particular, we could try to fetch pg_class rows out of pg_class before we'd updated its relcache entry's rd_node.relNode value from the map. This bug accounts for Dave Gould's report of failures after "vacuum full pg_class", and I believe that there is risk for other system catalogs as well. The core part of the fix is to copy relmapper data into the relcache entries during "phase 1" in RelationCacheInvalidate(), before they'll be used in "phase 2". To try to future-proof the code against other similar bugs, I also rearranged the order in which nailed relations are visited during phase 2: now it's pg_class first, then pg_class_oid_index, then other nailed relations. This should ensure that RelationClearRelation can apply RelationReloadIndexInfo to all nailed indexes without risking use of not-yet-revalidated relcache entries. Back-patch to 9.0 where the relation mapper was introduced.
* Fix race condition in relcache init file invalidation.Tom Lane2011-08-16
| | | | | | | | | | | | | | | | The previous code tried to synchronize by unlinking the init file twice, but that doesn't actually work: it leaves a window wherein a third process could read the already-stale init file but miss the SI messages that would tell it the data is stale. The result would be bizarre failures in catalog accesses, typically "could not read block 0 in file ..." later during startup. Instead, hold RelCacheInitLock across both the unlink and the sending of the SI messages. This is more straightforward, and might even be a bit faster since only one unlink call is needed. This has been wrong since it was put in (in 2002!), so back-patch to all supported releases.
* Avoid index rebuild for no-rewrite ALTER TABLE .. ALTER TYPE.Robert Haas2011-07-18
| | | | Noah Misch. Review and minor cosmetic changes by me.
* Enable CHECK constraints to be declared NOT VALIDAlvaro Herrera2011-06-30
| | | | | | | | | | | | | | | | | | | | | | | This means that they can initially be added to a large existing table without checking its initial contents, but new tuples must comply to them; a separate pass invoked by ALTER TABLE / VALIDATE can verify existing data and ensure it complies with the constraint, at which point it is marked validated and becomes a normal part of the table ecosystem. An non-validated CHECK constraint is ignored in the planner for constraint_exclusion purposes; when validated, cached plans are recomputed so that partitioning starts working right away. This patch also enables domains to have unvalidated CHECK constraints attached to them as well by way of ALTER DOMAIN / ADD CONSTRAINT / NOT VALID, which can later be validated with ALTER DOMAIN / VALIDATE CONSTRAINT. Thanks to Thom Brown, Dean Rasheed and Jaime Casanova for the various reviews, and Robert Hass for documentation wording improvement suggestions. This patch was sponsored by Enova Financial.
* Unify spelling of "canceled", "canceling", "cancellation"Peter Eisentraut2011-06-29
| | | | | We had previously (af26857a2775e7ceb0916155e931008c2116632f) established the U.S. spellings as standard.
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-10
|
* Avoid potential deadlock in InitCatCachePhase2().Tom Lane2011-03-22
| | | | | | | | | | | | | | | Opening a catcache's index could require reading from that cache's own catalog, which of course would acquire AccessShareLock on the catalog. So the original coding here risks locking index before heap, which could deadlock against another backend trying to get exclusive locks in the normal order. Because InitCatCachePhase2 is only called when a backend has to start up without a relcache init file, the deadlock was seldom seen in the field. (And by the same token, there's no need to worry about any performance disadvantage; so not much point in trying to distinguish exactly which catalogs have the risk.) Bug report, diagnosis, and patch by Nikhil Sontakke. Additional commentary by me. Back-patch to all supported branches.
* Per-column collation supportPeter Eisentraut2011-02-08
| | | | | | | | This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Support unlogged tables.Robert Haas2010-12-29
| | | | | | | The contents of an unlogged table are WAL-logged; thus, they are not available on standby servers and are truncated whenever the database system enters recovery. Indexes on unlogged tables are also unlogged. Unlogged GiST indexes are not currently supported.
* Generalize concept of temporary relations to "relation persistence".Robert Haas2010-12-13
| | | | | | | | | | | | | | | This commit replaces pg_class.relistemp with pg_class.relpersistence; and also modifies the RangeVar node type to carry relpersistence rather than istemp. It also removes removes rd_istemp from RelationData and instead performs the correct computation based on relpersistence. For clarity, we add three new macros: RelationNeedsWAL(), RelationUsesLocalBuffers(), and RelationUsesTempNamespace(), so that we can clarify the purpose of each check that previous depended on rd_istemp. This is intended as infrastructure for the upcoming unlogged tables patch, as well as for future possible work on global temporary tables.
* Simplify and speed up mapping of index opfamilies to pathkeys.Tom Lane2010-11-29
| | | | | | | | | | | | | | | | | | | | | | Formerly we looked up the operators associated with each index (caching them in relcache) and then the planner looked up the btree opfamily containing such operators in order to build the btree-centric pathkey representation that describes the index's sort order. This is quite pointless for btree indexes: we might as well just use the index's opfamily information directly. That saves syscache lookup cycles during planning, and furthermore allows us to eliminate the relcache's caching of operators altogether, which may help in reducing backend startup time. I added code to plancat.c to perform the same type of double lookup on-the-fly if it's ever faced with a non-btree amcanorder index AM. If such a thing actually becomes interesting for production, we should replace that logic with some more-direct method for identifying the corresponding btree opfamily; but it's not worth spending effort on now. There is considerably more to do pursuant to my recent proposal to get rid of sort-operator-based representations of sort orderings, but this patch grabs some of the low-hanging fruit. I'll look at the remainder of that work after the current commitfest.
* Make TRUNCATE ... RESTART IDENTITY restart sequences transactionally.Tom Lane2010-11-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the previous coding, we simply issued ALTER SEQUENCE RESTART commands, which do not roll back on error. This meant that an error between truncating and committing left the sequences out of sync with the table contents, with potentially bad consequences as were noted in a Warning on the TRUNCATE man page. To fix, create a new storage file (relfilenode) for a sequence that is to be reset due to RESTART IDENTITY. If the transaction aborts, we'll automatically revert to the old storage file. This acts just like a rewriting ALTER TABLE operation. A penalty is that we have to take exclusive lock on the sequence, but since we've already got exclusive lock on its owning table, that seems unlikely to be much of a problem. The interaction of this with usual nontransactional behaviors of sequence operations is a bit weird, but it's hard to see what would be completely consistent. Our choice is to discard cached-but-unissued sequence values both when the RESTART is executed, and at rollback if any; but to not touch the currval() state either time. In passing, move the sequence reset operations to happen before not after any AFTER TRUNCATE triggers are fired. The previous ordering was not logically sensible, but was forced by the need to minimize inconsistency if the triggers caused an error. Transactional rollback is a much better solution to that. Patch by Steve Singer, rather heavily adjusted by me.
* Correct poor grammar in comment.Robert Haas2010-11-14
|
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Fix up flushing of composite-type typcache entries to be driven directly byTom Lane2010-09-02
| | | | | | | | | | | | | | | | | | | | | | SI invalidation events, rather than indirectly through the relcache. In the previous coding, we had to flush a composite-type typcache entry whenever we discarded the corresponding relcache entry. This caused problems at least when testing with RELCACHE_FORCE_RELEASE, as shown in recent report from Jeff Davis, and might result in real-world problems given the kind of unexpected relcache flush that that test mechanism is intended to model. The new coding decouples relcache and typcache management, which is a good thing anyway from a structural perspective. The cost is that we have to search the typcache linearly to find entries that need to be flushed. There are a couple of ways we could avoid that, but at the moment it's not clear it's worth any extra trouble, because the typcache contains very few entries in typical operation. Back-patch to 8.2, the same as some other recent fixes in this general area. The patch could be carried back to 8.0 with some additional work, but given that it's only hypothetical whether we're fixing any problem observable in the field, it doesn't seem worth the work now.
* Include the backend ID in the relpath of temporary relations.Robert Haas2010-08-13
| | | | | | | | | | | | | | | | | This allows us to reliably remove all leftover temporary relation files on cluster startup without reference to system catalogs or WAL; therefore, we no longer include temporary relations in XLOG_XACT_COMMIT and XLOG_XACT_ABORT WAL records. Since these changes require including a backend ID in each SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id field has been reduced from two bytes to one, and the maximum number of connections has been reduced from INT_MAX / 4 to 2^23-1. It would be possible to remove these restrictions by increasing the size of SharedInvalidationMessage by 4 bytes, but right now that doesn't seem like a good trade-off. Review by Jaime Casanova and Tom Lane.
* pgindent run for 9.0, second runBruce Momjian2010-07-06
|
* Arrange for client authentication to occur before we select a specificTom Lane2010-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | database to connect to. This is necessary for the walsender code to work properly (it was previously using an untenable assumption that template1 would always be available to connect to). This also gets rid of a small security shortcoming that was introduced in the original patch to eliminate the flat authentication files: before, you could find out whether or not the requested database existed even if you couldn't pass the authentication checks. The changes needed to support this are mainly just to treat pg_authid and pg_auth_members as nailed relations, so that we can read them without having to be able to locate real pg_class entries for them. This mechanism was already debugged for pg_database, but we hadn't recognized the value of applying it to those catalogs too. Since the current code doesn't have support for accessing toast tables before we've brought up all of the relcache, remove pg_authid's toast table to ensure that no one can store an out-of-line toasted value of rolpassword. The case seems quite unlikely to occur in practice, and was effectively unsupported anyway in the old "flatfiles" implementation. Update genbki.pl to actually implement the same rules as bootstrap.c does for not-nullability of catalog columns. The previous coding was a bit cheesy but worked all right for the previous set of bootstrap catalogs. It does not work for pg_authid, where rolvaliduntil needs to be nullable. Initdb forced due to minor catalog changes (mainly the toast table removal).
* Fix a problem introduced by my patch of 2010-01-12 that revised the wayTom Lane2010-04-14
| | | | | | | | | | | | | | | | | | | | | relcache reload works. In the patched code, a relcache entry in process of being rebuilt doesn't get unhooked from the relcache hash table; which means that if a cache flush occurs due to sinval queue overrun while we're rebuilding it, the entry could get blown away by RelationCacheInvalidate, resulting in crash or misbehavior. Fix by ensuring that an entry being rebuilt has positive refcount, so it won't be seen as a target for removal if a cache flush occurs. (This will mean that the entry gets rebuilt twice in such a scenario, but that's okay.) It appears that the problem can only arise within a transaction that has previously reassigned the relfilenode of a pre-existing table, via TRUNCATE or a similar operation. Per bug #5412 from Rusty Conover. Back-patch to 8.2, same as the patch that introduced the problem. I think that the failure can't actually occur in 8.2, since it lacks the rd_newRelfilenodeSubid optimization, but let's make it work like the later branches anyway. Patch by Heikki, slightly editorialized on by me.
* pgindent run for 9.0Bruce Momjian2010-02-26
|
* Stamp HEAD as 9.0devel, and update various places that were referring to 8.5Tom Lane2010-02-17
| | | | (hope I got 'em all). Per discussion, this release will be 9.0 not 8.5.
* Wrap calls to SearchSysCache and related functions using macros.Robert Haas2010-02-14
| | | | | | | | | | | | The purpose of this change is to eliminate the need for every caller of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists, GetSysCacheOid, and SearchSysCacheList to know the maximum number of allowable keys for a syscache entry (currently 4). This will make it far easier to increase the maximum number of keys in a future release should we choose to do so, and it makes the code shorter, too. Design and review by Tom Lane.
* Fix up rickety handling of relation-truncation interlocks.Tom Lane2010-02-09
| | | | | | | | | | | | | | | | | | | | Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr relation entries, so that they will get reset to InvalidBlockNumber whenever an smgr-level flush happens. Because we now send smgr invalidation messages immediately (not at end of transaction) when a relation truncation occurs, this ensures that other backends will reset their values before they next access the relation. We no longer need the unreliable assumption that a VACUUM that's doing a truncation will hold its AccessExclusive lock until commit --- in fact, we can intentionally release that lock as soon as we've completed the truncation. This patch therefore reverts (most of) Alvaro's patch of 2009-11-10, as well as my marginal hacking on it yesterday. We can also get rid of assorted no-longer-needed relcache flushes, which are far more expensive than an smgr flush because they kill a lot more state. In passing this patch fixes smgr_redo's failure to perform visibility-map truncation, and cleans up some rather dubious assumptions in freespace.c and visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of date.