aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/transam/multixact.c
Commit message (Collapse)AuthorAge
...
* Defer flushing of SLRU files.Thomas Munro2020-09-25
| | | | | | | | | | | | | | | | | | | | | | | | Previously, we called fsync() after writing out individual pg_xact, pg_multixact and pg_commit_ts pages due to cache pressure, leading to regular I/O stalls in user backends and recovery. Collapse requests for the same file into a single system call as part of the next checkpoint, as we already did for relation files, using the infrastructure developed by commit 3eb77eba. This can cause a significant improvement to recovery performance, especially when it's otherwise CPU-bound. Hoist ProcessSyncRequests() up into CheckPointGuts() to make it clearer that it applies to all the SLRU mini-buffer-pools as well as the main buffer pool. Rearrange things so that data collected in CheckpointStats includes SLRU activity. Also remove the Shutdown{CLOG,CommitTS,SUBTRANS,MultiXact}() functions, because they were redundant after the shutdown checkpoint that immediately precedes them. (I'm not sure if they were ever needed, but they aren't now.) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (parts) Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> Discussion: https://postgr.es/m/CA+hUKGLJ=84YT+NvhkEEDAuUtVHMfQ9i-N7k_o50JmQ6Rpj_OQ@mail.gmail.com
* Rename VariableCacheData.nextFullXid to nextXid.Andres Freund2020-08-11
| | | | | | | | | Including Full in variable names duplicates the type information and leads to overly long names. As FullTransactionId cannot accidentally be casted to TransactionId that does not seem necessary. Author: Andres Freund Discussion: https://postgr.es/m/20200724011143.jccsyvsvymuiqfxu@alap3.anarazel.de
* Change XID and mxact limits to warn at 40M and stop at 3M.Noah Misch2020-08-01
| | | | | | | | | | | | | | | We have edge-case bugs when assigning values in the last few dozen pages before the wrap limit. We may introduce similar bugs in the future. At default BLCKSZ, this makes such bugs unreachable outside of single-user mode. Also, when VACUUM began to consume mxacts, multiStopLimit did not change to compensate. pg_upgrade may fail on a cluster that was already printing "must be vacuumed" warnings. Follow the warning's instructions to clear the warning, then run pg_upgrade again. One can still, peacefully consume 98% of XIDs or mxacts, so DBAs need not change routine VACUUM settings. Discussion: https://postgr.es/m/20200621083513.GA3074645@rfd.leadboat.com
* Rename SLRU structures and associated LWLocks.Tom Lane2020-05-15
| | | | | | | | | | | | | | | | | | | | | | | | | | Originally, the names assigned to SLRUs had no purpose other than being shmem lookup keys, so not a lot of thought went into them. As of v13, though, we're exposing them in the pg_stat_slru view and the pg_stat_reset_slru function, so it seems advisable to take a bit more care. Rename them to names based on the associated on-disk storage directories (which fortunately we *did* think about, to some extent; since those are also visible to DBAs, consistency seems like a good thing). Also rename the associated LWLocks, since those names are likewise user-exposed now as wait event names. For the most part I only touched symbols used in the respective modules' SimpleLruInit() calls, not the names of other related objects. This renaming could have been taken further, and maybe someday we will do so. But for now it seems undesirable to change the names of any globally visible functions or structs, so some inconsistency is unavoidable. (But I *did* terminate "oldserxid" with prejudice, as I found that name both unreadable and not descriptive of the SLRU's contents.) Table 27.12 needs re-alphabetization now, but I'll leave that till after the other LWLock renamings I have in mind. Discussion: https://postgr.es/m/28683.1589405363@sss.pgh.pa.us
* Fix some typosMichael Paquier2020-04-27
| | | | | Author: Justin Pryzby Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
* snapshot scalability: Move delayChkpt from PGXACT to PGPROC.Andres Freund2020-04-07
| | | | | | | | | | | | | | | | | | | | | The goal of separating hotly accessed per-backend data from PGPROC into PGXACT is to make accesses fast (GetSnapshotData() in particular). But delayChkpt is not actually accessed frequently; only when starting a checkpoint. As it is frequently modified (multiple times in the course of a single transaction), storing it in the same cacheline as hotly accessed data unnecessarily dirties a contended cacheline. Therefore move delayChkpt to PGPROC. This is part of a larger series of patches intending to improve GetSnapshotData() scalability. It is committed and pushed separately, as it is independently beneficial (small but measurable win, limited by the other frequent modifications of PGXACT). Author: Andres Freund Reviewed-By: Robert Haas, Thomas Munro, David Rowley Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
* Remove useless pfree()s at the ends of various ValuePerCall SRFs.Tom Lane2020-03-16
| | | | | | | | | | | | | | | | | | | | We don't need to manually clean up allocations in a SRF's multi_call_memory_ctx, because the SRF_RETURN_DONE infrastructure takes care of that (and also ensures that it will happen even if the function never gets a final call, which simple manual cleanup cannot do). Hence, the code removed by this patch is a waste of code and cycles. Worse, it gives the impression that cleaning up manually is a thing, which can lead to more serious errors such as those fixed in commits 085b6b667 and b4570d33a. So we should get rid of it. These are not quite actual bugs though, so I couldn't muster the enthusiasm to back-patch. Fix in HEAD only. Justin Pryzby Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Fix inconsistencies and typos in the tree, take 9Michael Paquier2019-08-05
| | | | | | | | This addresses more issues with code comments, variable names and unreferenced variables. Author: Alexander Lakhin Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com
* Fix inconsistencies in the codeMichael Paquier2019-07-08
| | | | | | | | | | | This addresses a couple of issues in the code: - Typos and inconsistencies in comments and function declarations. - Removal of unreferenced function declarations. - Removal of unnecessary compile flags. - A cleanup error in regressplans.sh. Author: Alexander Lakhin Discussion: https://postgr.es/m/0c991fdf-2670-1997-c027-772a420c4604@gmail.com
* Update stale comments, and fix comment typos.Noah Misch2019-06-08
|
* Phase 2 pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | | Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
* Add basic infrastructure for 64 bit transaction IDs.Thomas Munro2019-03-28
| | | | | | | | | | | | | | | | | | | | | | | | Instead of inferring epoch progress from xids and checkpoints, introduce a 64 bit FullTransactionId type and use it to track xid generation. This fixes an unlikely bug where the epoch is reported incorrectly if the range of active xids wraps around more than once between checkpoints. The only user-visible effect of this commit is to correct the epoch used by txid_current() and txid_status(), also visible with pg_controldata, in those rare circumstances. It also creates some basic infrastructure so that later patches can use 64 bit transaction IDs in more places. The new type is a struct that we pass by value, as a form of strong typedef. This prevents the sort of accidental confusion between TransactionId and FullTransactionId that would be possible if we were to use a plain old uint64. Author: Thomas Munro Reported-by: Amit Kapila Reviewed-by: Andres Freund, Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/CAA4eK1%2BMv%2Bmb0HFfWM9Srtc6MVe160WFurXV68iAFMcagRZ0dQ%40mail.gmail.com
* Make release of 2PC identifier and locks consistent in COMMIT PREPAREDMichael Paquier2019-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When preparing a transaction in two-phase commit, a dummy PGPROC entry holding the GID used for the transaction is registered, which gets released once COMMIT PREPARED is run. Prior releasing its shared memory state, all the locks taken in the prepared transaction are released using a dedicated set of callbacks (pgstat and multixact having similar callbacks), which may cause the locks to be released before the GID is set free. Hence, there is a small window where lock conflicts could happen, for example: - Transaction A releases its locks, still holding its GID in shared memory. - Transaction B held a lock which conflicted with locks of transaction A. - Transaction B continues its processing, reusing the same GID as transaction A. - Transaction B fails because of a conflicting GID, already in use by transaction A. This commit changes the shared memory state release so as post-commit callbacks and predicate lock cleanup happen consistently with the shared memory state cleanup for the dummy PGPROC entry. The race window is small and 2PC had this issue from the start, so no backpatch is done. On top if that fixes discussed involved ABI breakages, which are not welcome in stable branches. Reported-by: Oleksii Kliukin, Ildar Musin Diagnosed-by: Oleksii Kliukin, Ildar Musin Author: Michael Paquier Reviewed-by: Masahiko Sawada, Oleksii Kliukin Discussion: https://postgr.es/m/BF9B38A4-2BFF-46E8-BA87-A2D00A8047A6@hintbits.com
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Remove WITH OIDS support, change oid catalog column visibility.Andres Freund2018-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
* Fix misc typos, mostly in comments.Heikki Linnakangas2018-07-18
| | | | | | | | A collection of typos I happened to spot while reading code, as well as grepping for common mistakes. Backpatch to all supported versions, as applicable, to avoid conflicts when backporting other commits in the future.
* Fix typo in comment.Robert Haas2018-03-22
| | | | | | Michael Paquier Discussion: http://postgr.es/m/20180205071404.GB17337@paquier.xyz
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Extend near-wraparound hints to include replication slotsSimon Riggs2017-12-29
| | | | | Author: Feike Steenbergen Reviewed-by: Michael Paquier
* Change TRUE/FALSE to true/falsePeter Eisentraut2017-11-08
| | | | | | | | | | | | | | The lower case spellings are C and C++ standard and are used in most parts of the PostgreSQL sources. The upper case spellings are only used in some files/modules. So standardize on the standard spellings. The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so those are left as is when using those APIs. In code comments, we use the lower-case spelling for the C concepts and keep the upper-case spelling for the SQL concepts. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
* Remove uses of "slave" in replication contextsPeter Eisentraut2017-08-10
| | | | | This affects mostly code comments, some documentation, and tests. Official APIs already used "standby".
* Phase 3 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Rename "pg_clog" directory to "pg_xact".Robert Haas2017-03-17
| | | | | | | | | | | Names containing the letters "log" sometimes confuse users into believing that only non-critical data is present. It is hoped this renaming will discourage ill-considered removals of transaction status data. Michael Paquier Discussion: http://postgr.es/m/CA+Tgmoa9xFQyjRZupbdEFuwUerFTvC6HjZq1ud6GYragGDFFgA@mail.gmail.com
* Make logging about multixact wraparound protection less chatty.Tom Lane2017-03-14
| | | | | | | | | | | | | | The original messaging design, introduced in commit 068cfadf9, seems too chatty now that some time has elapsed since the bug fix; most installations will be in good shape and don't really need a reminder about this on every postmaster start. Hence, arrange to suppress the "wraparound protections are now enabled" message during startup (specifically, during the TrimMultiXact() call). The message will still appear if protection becomes effective at some later point. Discussion: https://postgr.es/m/17211.1489189214@sss.pgh.pa.us
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Add macros to make AllocSetContextCreate() calls simpler and safer.Tom Lane2016-08-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>
* Fix typosPeter Eisentraut2016-08-16
| | | | From: Alexander Law <exclusion@gmail.com>
* Fix handling of multixacts predating pg_upgradeAlvaro Herrera2016-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After pg_upgrade, it is possible that some tuples' Xmax have multixacts corresponding to the old installation; such multixacts cannot have running members anymore. In many code sites we already know not to read them and clobber them silently, but at least when VACUUM tries to freeze a multixact or determine whether one needs freezing, there's an attempt to resolve it to its member transactions by calling GetMultiXactIdMembers, and if the multixact value is "in the future" with regards to the current valid multixact range, an error like this is raised: ERROR: MultiXactId 123 has not been created yet -- apparent wraparound and vacuuming fails. Per discussion with Andrew Gierth, it is completely bogus to try to resolve multixacts coming from before a pg_upgrade, regardless of where they stand with regards to the current valid multixact range. It's possible to get from under this problem by doing SELECT FOR UPDATE of the problem tuples, but if tables are large, this is slow and tedious, so a more thorough solution is desirable. To fix, we realize that multixacts in xmax created in 9.2 and previous have a specific bit pattern that is never used in 9.3 and later (we already knew this, per comments and infomask tests sprinkled in various places, but we weren't leveraging this knowledge appropriately). Whenever the infomask of the tuple matches that bit pattern, we just ignore the multixact completely as if Xmax wasn't set; or, in the case of tuple freezing, we act as if an unwanted value is set and clobber it without decoding. This guarantees that no errors will be raised, and that the values will be progressively removed until all tables are clean. Most callers of GetMultiXactIdMembers are patched to recognize directly that the value is a removable "empty" multixact and avoid calling GetMultiXactIdMembers altogether. To avoid changing the signature of GetMultiXactIdMembers() in back branches, we keep the "allow_old" boolean flag but rename it to "from_pgupgrade"; if the flag is true, we always return an empty set instead of looking up the multixact. (I suppose we could remove the argument in the master branch, but I chose not to do so in this commit). This was broken all along, but the error-facing message appeared first because of commit 8e9a16ab8f7f and was partially fixed in a25c2b7c4db3. This fix, backpatched all the way back to 9.3, goes approximately in the same direction as a25c2b7c4db3 but should cover all cases. Bug analysis by Andrew Gierth and Álvaro Herrera. A number of public reports match this bug: https://www.postgresql.org/message-id/20140330040029.GY4582@tamriel.snowman.net https://www.postgresql.org/message-id/538F3D70.6080902@publicrelay.com https://www.postgresql.org/message-id/556439CF.7070109@pscs.co.uk https://www.postgresql.org/message-id/SG2PR06MB0760098A111C88E31BD4D96FB3540@SG2PR06MB0760.apcprd06.prod.outlook.com https://www.postgresql.org/message-id/20160615203829.5798.4594@wrigleys.postgresql.org
* pgindent run for 9.6Robert Haas2016-06-09
|
* Make all built-in lwlock tranche IDs fixed.Robert Haas2016-02-02
| | | | | | | This makes the values more stable, which seems like a good thing for anybody who needs to look at at them. Alexander Korotkov and Amit Kapila
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Fix comments about WAL rule "write xlog before data" versus pg_multixact.Noah Misch2016-01-01
| | | | | | | | | | | Recovery does not achieve its goal of zeroing all pg_multixact entries whose accompanying WAL records never reached disk. Remove that claim and justify its expendability. Detail the need for TrimMultiXact(), which has little in common with the TrimCLOG() rationale. Merge two tightly-related comments. Stop presenting pg_multixact as specific to heap_lock_tuple(); PostgreSQL 9.3 extended its use to heap_update(). Noticed while investigating a report from Andres Freund.
* Fix bug in SetOffsetVacuumLimit() triggered by find_multixact_start() failure.Andres Freund2015-12-14
| | | | | | | | | | | | | | | | Previously, if find_multixact_start() failed, SetOffsetVacuumLimit() would install 0 into MultiXactState->offsetStopLimit if it previously succeeded. Luckily, there are no known cases where find_multixact_start() will return an error in 9.5 and above. But if it were to happen, for example due to filesystem permission issues, it'd be somewhat bad: GetNewMultiXactId() could continue allocating mxids even if close to a wraparound, or it could erroneously stop allocating mxids, even if no wraparound is looming. The wrong value would be corrected the next time SetOffsetVacuumLimit() is called, or by a restart. Reported-By: Noah Misch, although this is not his preferred fix Discussion: 20151210140450.GA22278@alap3.anarazel.de Backpatch: 9.5, where the bug was introduced as part of 4f627f
* Move each SLRU's lwlocks to a separate tranche.Robert Haas2015-11-12
| | | | | | | | | | This makes it significantly easier to identify these lwlocks in LWLOCK_STATS or Trace_lwlocks output. It's also arguably better from a modularity standpoint, since lwlock.c no longer needs to know anything about the LWLock needs of the higher-level SLRU facility. Ildus Kurbangaliev, reviewd by Álvaro Herrera and by me.
* Message style improvementsPeter Eisentraut2015-10-28
| | | | | Message style, plurals, quoting, spelling, consistency with similar messages
* Fix typos in comments.Robert Haas2015-10-22
| | | | CharSyam
* Remove legacy multixact truncation support.Andres Freund2015-09-26
| | | | | | | | | | | | | In 9.5 and master there is no need to support legacy truncation. This is just committed separately to make it easier to backpatch the WAL logged multixact truncation to 9.3 and 9.4 if we later decide to do so. I bumped master's magic from 0xD086 to 0xD088 and 9.5's from 0xD085 to 0xD087 to avoid 9.5 reusing a value that has been in use on master while keeping the numbers increasing between major versions. Discussion: 20150621192409.GA4797@alap3.anarazel.de Backpatch: 9.5
* Rework the way multixact truncations work.Andres Freund2015-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fact that multixact truncations are not WAL logged has caused a fair share of problems. Amongst others it requires to do computations during recovery while the database is not in a consistent state, delaying truncations till checkpoints, and handling members being truncated, but offset not. We tried to put bandaids on lots of these issues over the last years, but it seems time to change course. Thus this patch introduces WAL logging for multixact truncations. This allows: 1) to perform the truncation directly during VACUUM, instead of delaying it to the checkpoint. 2) to avoid looking at the offsets SLRU for truncation during recovery, we can just use the master's values. 3) simplify a fair amount of logic to keep in memory limits straight, this has gotten much easier During the course of fixing this a bunch of additional bugs had to be fixed: 1) Data was not purged from memory the member's SLRU before deleting segments. This happened to be hard or impossible to hit due to the interlock between checkpoints and truncation. 2) find_multixact_start() relied on SimpleLruDoesPhysicalPageExist - but that doesn't work for offsets that haven't yet been flushed to disk. Add code to flush the SLRUs to fix. Not pretty, but it feels slightly safer to only make decisions based on actual on-disk state. 3) find_multixact_start() could be called concurrently with a truncation and thus fail. Via SetOffsetVacuumLimit() that could lead to a round of emergency vacuuming. The problem remains in pg_get_multixact_members(), but that's quite harmless. For now this is going to only get applied to 9.5+, leaving the issues in the older branches in place. It is quite possible that we need to backpatch at a later point though. For the case this gets backpatched we need to handle that an updated standby may be replaying WAL from a not-yet upgraded primary. We have to recognize that situation and use "old style" truncation (i.e. looking at the SLRUs) during WAL replay. In contrast to before, this now happens in the startup process, when replaying a checkpoint record, instead of the checkpointer. Doing truncation in the restartpoint is incorrect, they can happen much later than the original checkpoint, thereby leading to wraparound. To avoid "multixact_redo: unknown op code 48" errors standbys would have to be upgraded before primaries. A later patch will bump the WAL page magic, and remove the legacy truncation codepaths. Legacy truncation support is just included to make a possible future backpatch easier. Discussion: 20150621192409.GA4797@alap3.anarazel.de Reviewed-By: Robert Haas, Alvaro Herrera, Thomas Munro Backpatch: 9.5 for now
* Fix off-by-one error in calculating subtrans/multixact truncation point.Heikki Linnakangas2015-07-23
| | | | | | | | | | If there were no subtransactions (or multixacts) active, we would calculate the oldestxid == next xid. That's correct, but if next XID happens to be on the next pg_subtrans (pg_multixact) page, the page does not exist yet, and SimpleLruTruncate will produce an "apparent wraparound" warning. The warning is harmless in this case, but looks very alarming to users. Backpatch to all supported versions. Patch and analysis by Thomas Munro.
* Improve multixact emergency autovacuum logic.Andres Freund2015-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | Previously autovacuum was not necessarily triggered if space in the members slru got tight. The first problem was that the signalling was tied to values in the offsets slru, but members can advance much faster. Thats especially a problem if old sessions had been around that previously prevented the multixact horizon to increase. Secondly the skipping logic doesn't work if the database was restarted after autovacuum was triggered - that knowledge is not preserved across restart. This is especially a problem because it's a common panic-reaction to restart the database if it gets slow to anti-wraparound vacuums. Fix the first problem by separating the logic for members from offsets. Trigger autovacuum whenever a multixact crosses a segment boundary, as the current member offset increases in irregular values, so we can't use a simple modulo logic as for offsets. Add a stopgap for the second problem, by signalling autovacuum whenver ERRORing out because of boundaries. Discussion: 20150608163707.GD20772@alap3.anarazel.de Backpatch into 9.3, where it became more likely that multixacts wrap around.
* Fix corner case in autovacuum-forcing logic for multixact wraparound.Robert Haas2015-06-19
| | | | | | | | | | | | | | | Since find_multixact_start() relies on SimpleLruDoesPhysicalPageExist(), and that function looks only at the on-disk state, it's possible for it to fail to find a page that exists in the in-memory SLRU that has not been written yet. If that happens, SetOffsetVacuumLimit() will erroneously decide to force emergency autovacuuming immediately. We should probably fix find_multixact_start() to consider the data cached in memory as well as on the on-disk state, but that's no excuse for SetOffsetVacuumLimit() to be stupid about the case where it can no longer read the value after having previously succeeded in doing so. Report by Andres Freund.
* Cope with possible failure of the oldest MultiXact to exist.Robert Haas2015-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent commits, mainly b69bf30b9bfacafc733a9ba77c9587cf54d06c0c and 53bb309d2d5a9432d2602c93ed18e58bd2924e15, introduced mechanisms to protect against wraparound of the MultiXact member space: the number of multixacts that can exist at one time is limited to 2^32, but the total number of members in those multixacts is also limited to 2^32, and older code did not take care to enforce the second limit, potentially allowing old data to be overwritten while it was still needed. Unfortunately, these new mechanisms failed to account for the fact that the code paths in which they run might be executed during recovery or while the cluster was in an inconsistent state. Also, they failed to account for the fact that users who used pg_upgrade to upgrade a PostgreSQL version between 9.3.0 and 9.3.4 might have might oldestMultiXid = 1 in the control file despite the true value being larger. To fix these problems, first, avoid unnecessarily examining the mmembers of MultiXacts when the cluster is not known to be consistent. TruncateMultiXact has done this for a long time, and this patch does not fix that. But the new calls used to prevent member wraparound are not needed until we reach normal running, so avoid calling them earlier. (SetMultiXactIdLimit is actually called before InRecovery is set, so we can't rely on that; we invent our own multixact-specific flag instead.) Second, make failure to look up the members of a MultiXact a non-fatal error. Instead, if we're unable to determine the member offset at which wraparound would occur, postpone arming the member wraparound defenses until we are able to do so. If we're unable to determine the member offset that should force autovacuum, force it continuously until we are able to do so. If we're unable to deterine the member offset at which we should truncate the members SLRU, log a message and skip truncation. An important consequence of these changes is that anyone who does have a bogus oldestMultiXid = 1 value in pg_control will experience immediate emergency autovacuuming when upgrading to a release that contains this fix. The release notes should highlight this fact. If a user has no pg_multixact/offsets/0000 file, but has oldestMultiXid = 1 in the control file, they may wish to vacuum any tables with relminmxid = 1 prior to upgrading in order to avoid an immediate emergency autovacuum after the upgrade. This must be done with a PostgreSQL version 9.3.5 or newer and with vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age set to 0. This patch also adds an additional log message at each database server startup, indicating either that protections against member wraparound have been engaged, or that they have not. In the latter case, once autovacuum has advanced oldestMultiXid to a sane value, the message indicating that the guards have been engaged will appear at the next checkpoint. A few additional messages have also been added at the DEBUG1 level so that the correct operation of this code can be properly audited. Along the way, this patch fixes another, related bug in TruncateMultiXact that has existed since PostgreSQL 9.3.0: when no MultiXacts exist at all, the truncation code looks up NextMultiXactId, which doesn't exist yet. This can lead to TruncateMultiXact removing every file in pg_multixact/offsets instead of keeping one around, as it should. This in turn will cause the database server to refuse to start afterwards. Patch by me. Review by Álvaro Herrera, Andres Freund, Noah Misch, and Thomas Munro.
* pgindent run for 9.5Bruce Momjian2015-05-23
|
* Fix whitespacePeter Eisentraut2015-05-16
|
* Increase threshold for multixact member emergency autovac to 50%.Robert Haas2015-05-11
| | | | | | | | | | | | | | | | | | | | | | | Analysis by Noah Misch shows that the 25% threshold set by commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 is lower than any other, similar autovac threshold. While we don't know exactly what value will be optimal for all users, it is better to err a little on the high side than on the low side. A higher value increases the risk that users might exhaust the available space and start seeing errors before autovacuum can clean things up sufficiently, but a user who hits that problem can compensate for it by reducing autovacuum_multixact_freeze_max_age to a value dependent on their average multixact size. On the flip side, if the emergency cap imposed by that patch kicks in too early, the user will experience excessive wraparound scanning and will be unable to mitigate that problem by configuration. The new value will hopefully reduce the risk of such bad experiences while still providing enough headroom to avoid multixact member exhaustion for most users. Along the way, adjust the documentation to reflect the effects of commit 04e6d3b877e060d8445eb653b7ea26b1ee5cec6b, which taught autovacuum to run for multixact wraparound even when autovacuum is configured off.
* Even when autovacuum=off, force it for members as we do in other cases.Robert Haas2015-05-11
| | | | Thomas Munro, with some adjustments by me.
* Advance the stop point for multixact offset creation only at checkpoint.Robert Haas2015-05-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b69bf30b9bfacafc733a9ba77c9587cf54d06c0c advanced the stop point at vacuum time, but this has subsequently been shown to be unsafe as a result of analysis by myself and Thomas Munro and testing by Thomas Munro. The crux of the problem is that the SLRU deletion logic may get confused about what to remove if, at exactly the right time during the checkpoint process, the head of the SLRU crosses what used to be the tail. This patch, by me, fixes the problem by advancing the stop point only following a checkpoint. This has the additional advantage of making the removal logic work during recovery more like the way it works during normal running, which is probably good. At least one of the calls to DetermineSafeOldestOffset which this patch removes was already dead, because MultiXactAdvanceOldest is called only during recovery and DetermineSafeOldestOffset was set up to do nothing during recovery. That, however, is inconsistent with the principle that recovery and normal running should work similarly, and was confusing to boot. Along the way, fix some comments that previous patches in this area neglected to update. It's not clear to me whether there's any concrete basis for the decision to use only half of the multixact ID space, but it's neither necessary nor sufficient to prevent multixact member wraparound, so the comments should not say otherwise.
* Fix DetermineSafeOldestOffset for the case where there are no mxacts.Robert Haas2015-05-10
| | | | | | | | Commit b69bf30b9bfacafc733a9ba77c9587cf54d06c0c failed to take into account the possibility that there might be no multixacts in existence at all. Report by Thomas Munro; patch by me.
* Fix whitespacePeter Eisentraut2015-05-08
|