aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Cancel CV sleep during subtransaction abort.Robert Haas2017-12-21
| | | | | | | | | | | | | | Generally, error recovery paths that need to do things like LWLockReleaseAll and pgstat_report_wait_end also need to call ConditionVariableCancelSleep, but AbortSubTransaction was missed. Since subtransaction abort might destroy up the DSM segment that contains the ConditionVariable stored in cv_sleep_target, this can result in a crash for anything using condition variables. Reported and diagnosed by Andres Freund. Discussion: http://postgr.es/m/20171221110048.rxk6464azzl5t2fi@alap3.anarazel.de
* Fix bug in cancellation of non-exclusive backup to avoid assertion failure.Fujii Masao2017-12-19
| | | | | | | | | | | | | | | | | | | | | | | | Previously an assertion failure occurred when pg_stop_backup() for non-exclusive backup was aborted while it's waiting for WAL files to be archived. This assertion failure happened in do_pg_abort_backup() which was called when a non-exclusive backup was canceled. do_pg_abort_backup() assumes that there is at least one non-exclusive backup running when it's called. But pg_stop_backup() can be canceled even after it marks the end of non-exclusive backup (e.g., during waiting for WAL archiving). This broke the assumption that do_pg_abort_backup() relies on, and which caused an assertion failure. This commit changes do_pg_abort_backup() so that it does nothing when non-exclusive backup has been already marked as completed. That is, the asssumption is also changed, and do_pg_abort_backup() now can handle even the case where it's called when there is no running backup. Backpatch to 9.6 where SQL-callable non-exclusive backup was added. Author: Masahiko Sawada and Michael Paquier Reviewed-By: Robert Haas and Fujii Masao Discussion: https://www.postgresql.org/message-id/CAD21AoD2L1Fu2c==gnVASMyFAAaq3y-AQ2uEVj-zTCGFFjvmDg@mail.gmail.com
* Perform a lot more sanity checks when freezing tuples.Andres Freund2017-12-14
| | | | | | | | | | | | | | | | The previous commit has shown that the sanity checks around freezing aren't strong enough. Strengthening them seems especially important because the existance of the bug has caused corruption that we don't want to make even worse during future vacuum cycles. The errors are emitted with ereport rather than elog, despite being "should never happen" messages, so a proper error code is emitted. To avoid superflous translations, mark messages as internal. Author: Andres Freund and Alvaro Herrera Reviewed-By: Alvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de Backpatch: 9.3-
* Fix parallel index scan hang with deleted or half-dead pages.Robert Haas2017-12-13
| | | | | | | | | | The previous coding forgot to release the scan before seizing it again, leading to a lockup. Report by Patrick Hemmer. Diagnosis by Thomas Munro. Patch by Amit Kapila. Discussion: http://postgr.es/m/CAEepm=2xZUcOGP9V0O_G0=2P2wwXwPrkF=upWTCJSisUxMnuSg@mail.gmail.com
* Fix mistake in commentPeter Eisentraut2017-12-08
| | | | Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
* Clean up assorted messiness around AllocateDir() usage.Tom Lane2017-12-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a couple of low-probability bugs that could lead to reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE) concerning directory-open or file-open failures. It also fixes places where we took shortcuts in reporting such errors, either by using elog instead of ereport or by using ereport but forgetting to specify an errcode. And it eliminates a lot of just plain redundant error-handling code. In service of all this, export fd.c's formerly-static function ReadDirExtended, so that external callers can make use of the coding pattern dir = AllocateDir(path); while ((de = ReadDirExtended(dir, path, LOG)) != NULL) if they'd like to treat directory-open failures as mere LOG conditions rather than errors. Also fix FreeDir to be a no-op if we reach it with dir == NULL, as such a coding pattern would cause. Then, remove code at many call sites that was throwing an error or log message for AllocateDir failure, as ReadDir or ReadDirExtended can handle that job just fine. Aside from being a net code savings, this gets rid of a lot of not-quite-up-to-snuff reports, as mentioned above. (In some places these changes result in replacing a custom error message such as "could not open tablespace directory" with more generic wording "could not open directory", but it was agreed that the custom wording buys little as long as we report the directory name.) In some other call sites where we can't just remove code, change the error reports to be fully project-style-compliant. Also reorder code in restoreTwoPhaseData that was acquiring a lock between AllocateDir and ReadDir; in the unlikely but surely not impossible case that LWLockAcquire changes errno, AllocateDir failures would be misreported. There is no great value in opening the directory before acquiring TwoPhaseStateLock, so just do it in the other order. Also fix CheckXLogRemoved to guarantee that it preserves errno, as quite a number of call sites are implicitly assuming. (Again, it's unlikely but I think not impossible that errno could change during a SpinLockAcquire. If so, this function was broken for its own purposes as well as breaking callers.) And change a few places that were using not-per-project-style messages, such as "could not read directory" when "could not open directory" is more correct. Back-patch the exporting of ReadDirExtended, in case we have occasion to back-patch some fix that makes use of it; it's not needed right now but surely making it global is pretty harmless. Also back-patch the restoreTwoPhaseData and CheckXLogRemoved fixes. The rest of this is essentially cosmetic and need not get back-patched. Michael Paquier, with a bit of additional work by me Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com
* Fix ReinitializeParallelDSM to tolerate finding no error queues.Robert Haas2017-11-28
| | | | | | | | | | | | | | | | Commit d4663350646ca0c069a36d906155a0f7e3372eb7 changed things so that shm_toc_lookup would fail with an error rather than silently returning NULL in the hope that such failures would be reported in a useful way rather than via a system crash. However, it overlooked the fact that the lookup of PARALLEL_KEY_ERROR_QUEUE in ReinitializeParallelDSM is expected to fail when no DSM segment was created in the first place; in that case, we end up with a backend-private memory segment that still contains an entry for PARALLEL_KEY_FIXED but no others. Consequently a benign failure to initialize parallelism can escalate into an elog(ERROR); repair. Discussion: http://postgr.es/m/CA+Tgmob8LFw55DzH1QEREpBEA9RJ_W_amhBFCVZ6WMwUhVpOqg@mail.gmail.com
* Pad XLogReaderState's main_data buffer more aggressively.Tom Lane2017-11-26
| | | | | | | | | | | | | | | | | | | | | | | | | Originally, we palloc'd this buffer just barely big enough to hold the largest xlog record seen so far. It turns out that that can result in valgrind complaints, because some compilers will emit code that assumes it can safely fetch padding bytes at the end of a struct, and those padding bytes were unallocated so far as aset.c was concerned. We can fix that by MAXALIGN'ing the palloc request size, ensuring that it is big enough to include any possible padding that might've been omitted from the on-disk record. An additional objection to the original coding is that it could result in many repeated palloc cycles, in the worst case where we see a series of gradually larger xlog records. We can ameliorate that cheaply by imposing a minimum buffer size that's large enough for most xlog records. BLCKSZ/2 was chosen after a bit of discussion. In passing, remove an obsolete comment in struct xl_heap_new_cid that the combocid field is free due to alignment considerations. Perhaps that was true at some point, but it's not now. Back-patch to 9.5 where this code came in. Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org
* Fix broken cleanup interlock for GIN pending list.Robert Haas2017-11-16
| | | | | | | | | | | | | The pending list must (for correctness) always be cleaned up by vacuum, and should (for the avoidance of surprising behavior) always be cleaned up by an explicit call to gin_clean_pending_list, but cleanup is optional when inserting. The old logic got this backward: cleanup was forced if (stats == NULL), but that's going to be *false* when vacuuming and *true* for inserts. Masahiko Sawada, reviewed by me. Discussion: http://postgr.es/m/CAD21AoBLUSyiYKnTYtSAbC+F=XDjiaBrOUEGK+zUXdQ8owfPKw@mail.gmail.com
* Fix thinkos in BRIN summarizationAlvaro Herrera2017-11-03
| | | | | | The previous commit contained a thinko that made a single-range summarization request process from there to end of table. Fix by setting the correct end range point. Per buildfarm.
* Fix BRIN summarization concurrent with extensionAlvaro Herrera2017-11-03
| | | | | | | | | | | | | | | | | If a process is extending a table concurrently with some BRIN summarization process, it is possible for the latter to miss pages added by the former because the number of pages is computed ahead of time. Fix by determining a fresh relation size after inserting the placeholder tuple: any process that further extends the table concurrently will update the placeholder tuple, while previous pages will be processed by the heap scan. Reported-by: Tomas Vondra Reviewed-by: Tom Lane Author: Álvaro Herrera Discussion: https://postgr.es/m/083d996a-4a8a-0e13-800a-851dd09ad8cc@2ndquadrant.com Backpatch-to: 9.5
* Fix corner-case errors in brin_doupdate().Tom Lane2017-11-02
| | | | | | | | | | | | | | | | | | | In some cases the BRIN code releases lock on an index page, and later re-acquires lock and tries to check that the tuple it was working on is still there. That check was a couple bricks shy of a load. It didn't consider that the page might have turned into a "revmap" page. (The samepage code path doesn't call brin_getinsertbuffer(), so it isn't protected by the checks for revmap status there.) It also didn't check whether the tuple offset was now off the end of the linepointer array. Since commit 24992c6db the latter case is pretty common, but at least in principle it could have occurred before that. The net result is that concurrent updates of a BRIN index could fail with errors like "invalid index offnum" or "inconsistent range map". Per report from Tomas Vondra. Back-patch to 9.5, since this code is substantially the same in all versions containing BRIN. Discussion: https://postgr.es/m/10d2b9f9-f427-03b8-8ad9-6af4ecacbee9@2ndquadrant.com
* Revert bogus fixes of HOT-freezing bugAlvaro Herrera2017-11-02
| | | | | | | | | | It turns out we misdiagnosed what the real problem was. Revert the previous changes, because they may have worse consequences going forward. A better fix is forthcoming. The simplistic test case is kept, though disabled. Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
* Fix problems with the "role" GUC and parallel query.Robert Haas2017-10-29
| | | | | | | | | | Without this fix, dropping a role can sometimes result in parallel query failures in sessions that have used "SET ROLE" to assume the dropped role, even if that setting isn't active any more. Report by Pavan Deolasee. Patch by Amit Kapila, reviewed by me. Discussion: http://postgr.es/m/CABOikdOomRcZsLsLK+Z+qENM1zxyaWnAvFh3MJZzZnnKiF+REg@mail.gmail.com
* Fix access-off-end-of-array in clog.c.Tom Lane2017-10-06
| | | | | | | | | | | | | Sloppy loop coding in set_status_by_pages() resulted in fetching one array element more than it should from the subxids[] array. The odds of this resulting in SIGSEGV are pretty small, but we've certainly seen that happen with similar mistakes elsewhere. While at it, we can get rid of an extra TransactionIdToPage() calculation per loop. Per report from David Binderman. Back-patch to all supported branches, since this code is quite old. Discussion: https://postgr.es/m/HE1PR0802MB2331CBA919CBFFF0C465EB429C710@HE1PR0802MB2331.eurprd08.prod.outlook.com
* Fix traversal of half-frozen update chainsAlvaro Herrera2017-10-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When some tuple versions in an update chain are frozen due to them being older than freeze_min_age, the xmax/xmin trail can become broken. This breaks HOT (and probably other things). A subsequent VACUUM can break things in more serious ways, such as leaving orphan heap-only tuples whose root HOT redirect items were removed. This can be seen because index creation (or REINDEX) complain like ERROR: XX000: failed to find parent tuple for heap-only tuple at (0,7) in table "t" Because of relfrozenxid contraints, we cannot avoid the freezing of the early tuples, so we must cope with the results: whenever we see an Xmin of FrozenTransactionId, consider it a match for whatever the previous Xmax value was. This problem seems to have appeared in 9.3 with multixact changes, though strictly speaking it seems unrelated. Since 9.4 we have commit 37484ad2a "Change the way we mark tuples as frozen", so the fix is simple: just compare the raw Xmin (still stored in the tuple header, since freezing merely set an infomask bit) to the Xmax. But in 9.3 we rewrite the Xmin value to FrozenTransactionId, so the original value is lost and we have nothing to compare the Xmax with. To cope with that case we need to compare the Xmin with FrozenXid, assume it's a match, and hope for the best. Sadly, since you can pg_upgrade a 9.3 instance containing half-frozen pages to newer releases, we need to keep the old check in newer versions too, which seems a bit brittle; I hope we can somehow get rid of that. I didn't optimize the new function for performance. The new coding is probably a bit slower than before, since there is a function call rather than a straight comparison, but I'd rather have it work correctly than be fast but wrong. This is a followup after 20b655224249 fixed a few related problems. Apparently, in 9.6 and up there are more ways to get into trouble, but in 9.3 - 9.5 I cannot reproduce a problem anymore with this patch, so there must be a separate bug. Reported-by: Peter Geoghegan Diagnosed-by: Peter Geoghegan, Michael Paquier, Daniel Wood, Yi Wen Wong, Álvaro Discussion: https://postgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com
* Fix freezing of a dead HOT-updated tupleAlvaro Herrera2017-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vacuum calls page-level HOT prune to remove dead HOT tuples before doing liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But concurrent transaction commit/abort may turn DEAD some of the HOT tuples that survived the prune, before HeapTupleSatisfiesVacuum tests them. This happens to activate the code that decides to freeze the tuple ... which resuscitates it, duplicating data. (This is especially bad if there's any unique constraints, because those are now internally violated due to the duplicate entries, though you won't know until you try to REINDEX or dump/restore the table.) One possible fix would be to simply skip doing anything to the tuple, and hope that the next HOT prune would remove it. But there is a problem: if the tuple is older than freeze horizon, this would leave an unfrozen XID behind, and if no HOT prune happens to clean it up before the containing pg_clog segment is truncated away, it'd later cause an error when the XID is looked up. Fix the problem by having the tuple freezing routines cope with the situation: don't freeze the tuple (and keep it dead). In the cases that the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag so that there is no need to look up the XID in pg_clog later on. An isolation test is included, authored by Michael Paquier, loosely based on Daniel Wood's original reproducer. It only tests one particular scenario, though, not all the possible ways for this problem to surface; it be good to have a more reliable way to test this more fully, but it'd require more work. In message https://postgr.es/m/20170911140103.5akxptyrwgpc25bw@alvherre.pgsql I outlined another test case (more closely matching Dan Wood's) that exposed a few more ways for the problem to occur. Backpatch all the way back to 9.3, where this problem was introduced by multixact juggling. In branches 9.3 and 9.4, this includes a backpatch of commit e5ff9fefcd50 (of 9.5 era), since the original is not correctable without matching the coding pattern in 9.5 up. Reported-by: Daniel Wood Diagnosed-by: Daniel Wood Reviewed-by: Yi Wen Wong, Michaël Paquier Discussion: https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
* Revert to 9.6 treatment of ALTER TYPE enumtype ADD VALUE.Tom Lane2017-09-27
| | | | | | | | | | | | | This reverts commit 15bc038f9, along with the followon commits 1635e80d3 and 984c92074 that tried to clean up the problems exposed by bug #14825. The result was incomplete because it failed to address parallel-query requirements. With 10.0 release so close upon us, now does not seem like the time to be adding more code to fix that. I hope we can un-revert this code and add the missing parallel query support during the v11 cycle. Back-patch to v10. Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org
* Use a blacklist to distinguish original from add-on enum values.Tom Lane2017-09-26
| | | | | | | | | | | | | | | | | | | | | | | | Commit 15bc038f9 allowed ALTER TYPE ADD VALUE to be executed inside transaction blocks, by disallowing the use of the added value later in the same transaction, except under limited circumstances. However, the test for "limited circumstances" was heuristic and could reject references to enum values that were created during CREATE TYPE AS ENUM, not just later. This breaks the use-case of restoring pg_dump scripts in a single transaction, as reported in bug #14825 from Balazs Szilfai. We can improve this by keeping a "blacklist" table of enum value OIDs created by ALTER TYPE ADD VALUE during the current transaction. Any visible-but-uncommitted value whose OID is not in the blacklist must have been created by CREATE TYPE AS ENUM, and can be used safely because it could not have a lifespan shorter than its parent enum type. This change also removes the restriction that a renamed enum value can't be used before being committed (unless it was on the blacklist). Andrew Dunstan, with cosmetic improvements by me. Back-patch to v10. Discussion: https://postgr.es/m/20170922185904.1448.16585@wrigleys.postgresql.org
* For wal_consistency_checking, mask page checksum as well as page LSN.Robert Haas2017-09-22
| | | | | | | | If the LSN is different, the checksum will be different, too. Ashwin Agrawal, reviewed by Michael Paquier and Kuntal Ghosh Discussion: http://postgr.es/m/CALfoeis5iqrAU-+JAN+ZzXkpPr7+-0OAGv7QUHwFn=-wDy4o4Q@mail.gmail.com
* Message style fixesPeter Eisentraut2017-09-11
|
* Clean up shm_mq cleanup.Tom Lane2017-08-31
| | | | | | | | | | | | | | | | | | | | | | | | | The logic around shm_mq_detach was a few bricks shy of a load, because (contrary to the comments for shm_mq_attach) all it did was update the shared shm_mq state. That left us leaking a bit of process-local memory, but much worse, the on_dsm_detach callback for shm_mq_detach was still armed. That means that whenever we ultimately detach from the DSM segment, we'd run shm_mq_detach again for already-detached, possibly long-dead queues. This accidentally fails to fail today, because we only ever re-use a shm_mq's memory for another shm_mq, and multiple detach attempts on the last such shm_mq are fairly harmless. But it's gonna bite us someday, so let's clean it up. To do that, change shm_mq_detach's API so it takes a shm_mq_handle not the underlying shm_mq. This makes the callers simpler in most cases anyway. Also fix a few places in parallel.c that were just pfree'ing the handle structs rather than doing proper cleanup. Back-patch to v10 because of the risk that the revenant shm_mq_detach callbacks would cause a live bug sometime. Since this is an API change, it's too late to do it in 9.6. (We could make a variant patch that preserves API, but I'm not excited enough to do that.) Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us
* Separate reinitialization of shared parallel-scan state from ExecReScan.Tom Lane2017-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the parallel executor logic did reinitialization of shared state within the ExecReScan code for parallel-aware scan nodes. This is problematic, because it means that the ExecReScan call has to occur synchronously (ie, during the parent Gather node's ReScan call). That is swimming very much against the tide so far as the ExecReScan machinery is concerned; the fact that it works at all today depends on a lot of fragile assumptions, such as that no plan node between Gather and a parallel-aware scan node is parameterized. Another objection is that because ExecReScan might be called in workers as well as the leader, hacky extra tests are needed in some places to prevent unwanted shared-state resets. Hence, let's separate this code into two functions, a ReInitializeDSM call and the ReScan call proper. ReInitializeDSM is called only in the leader and is guaranteed to run before we start new workers. ReScan is returned to its traditional function of resetting only local state, which means that ExecReScan's usual habits of delaying or eliminating child rescan calls are safe again. As with the preceding commit 7df2c1f8d, it doesn't seem to be necessary to make these changes in 9.6, which is a good thing because the FDW and CustomScan APIs are impacted. Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
* Final pgindent + perltidy run for v10.Tom Lane2017-08-14
|
* Handle elog(FATAL) during ROLLBACK more robustly.Tom Lane2017-08-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stress testing by Andreas Seltenreich disclosed longstanding problems that occur if a FATAL exit (e.g. due to receipt of SIGTERM) occurs while we are trying to execute a ROLLBACK of an already-failed transaction. In such a case, xact.c is in TBLOCK_ABORT state, so that AbortOutOfAnyTransaction would skip AbortTransaction and go straight to CleanupTransaction. This led to an assert failure in an assert-enabled build (due to the ROLLBACK's portal still having a cleanup hook) or without assertions, to a FATAL exit complaining about "cannot drop active portal". The latter's not disastrous, perhaps, but it's messy enough to want to improve it. We don't really want to run all of AbortTransaction in this code path. The minimum required to clean up the open portal safely is to do AtAbort_Memory and AtAbort_Portals. It seems like a good idea to do AtAbort_Memory unconditionally, to be entirely sure that we are starting with a safe CurrentMemoryContext. That means that if the main loop in AbortOutOfAnyTransaction does nothing, we need an extra step at the bottom to restore CurrentMemoryContext = TopMemoryContext, which I chose to do by invoking AtCleanup_Memory. This'll result in calling AtCleanup_Memory twice in many of the paths through this function, but that seems harmless and reasonably inexpensive. The original motivation for the assertion in AtCleanup_Portals was that we wanted to be sure that any user-defined code executed as a consequence of the cleanup hook runs during AbortTransaction not CleanupTransaction. That still seems like a valid concern, and now that we've seen one case of the assertion firing --- which means that exactly that would have happened in a production build --- let's replace the Assert with a runtime check. If we see the cleanup hook still set, we'll emit a WARNING and just drop the hook unexecuted. This has been like this a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/877ey7bmun.fsf@ansel.ydns.eu
* Remove AtEOXact_CatCache().Tom Lane2017-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sole useful effect of this function, to check that no catcache entries have positive refcounts at transaction end, has really been obsolete since we introduced ResourceOwners in PG 8.1. We reduced the checks to assertions years ago, so that the function was a complete no-op in production builds. There have been previous discussions about removing it entirely, but consensus up to now was that it had some small value as a cross-check for bugs in the ResourceOwner logic. However, it now emerges that it's possible to trigger these assertions if you hit an assert-enabled backend with SIGTERM during a call to SearchCatCacheList, because that function temporarily increases the refcounts of entries it's intending to add to a catcache list construct. In a normal ERROR scenario, the extra refcounts are cleaned up by SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a transaction abort and exit without ever executing PG_CATCH handlers. There's a case to be made that this is a generic hazard and we should consider restructuring elog(FATAL) handling so that pending PG_CATCH handlers do get run. That's pretty scary though: it could easily create more problems than it solves. Preliminary stress testing by Andreas Seltenreich suggests that there are not many live problems of this ilk, so we rejected that idea. There are more-localized ways to fix the problem; the most principled one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY. But adding cycles to SearchCatCacheList isn't very appealing. We could also weaken the assertions in AtEOXact_CatCache in some more or less ad-hoc way, but that just makes its raison d'etre even less compelling. In the end, the most reasonable solution seems to be to just remove AtEOXact_CatCache altogether, on the grounds that it's not worth trying to fix it. It hasn't found any bugs for us in many years. Per report from Jeevan Chalke. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com
* Remove uses of "slave" in replication contextsPeter Eisentraut2017-08-10
| | | | | This affects mostly code comments, some documentation, and tests. Official APIs already used "standby".
* Remove incorrect assertion in clog.cRobert Haas2017-08-10
| | | | | | | | | | | We must advance the oldest XID that can be safely looked up in clog *before* truncating CLOG, and the oldest XID that can't be reused *after* truncating CLOG. This assertion, and the accompanying comment, are confused; remove them. Reported by Neha Sharma. Discussion: http://postgr.es/m/CANiYTQumC3T=UMBMd1Hor=5XWZYuCEQBioL3ug0YtNQCMMT5wQ@mail.gmail.com
* Reword some unclear commentsAlvaro Herrera2017-08-08
|
* Make pg_stop_backup's wait_for_archive flag work on standbys.Robert Haas2017-08-05
| | | | | | | | | | | | Previously, it had no effect. Now, if archive_mode=always, it will work, and if not, you'll get a warning. Masahiko Sawada, Michael Paquier, and Robert Haas. The patch as submitted also changed the behavior so that we would write and remove history files on standbys, but that seems like material for a separate patch to me. Discussion: http://postgr.es/m/CAD21AoC2Xw6M=ZJyejq_9d_iDkReC_=rpvQRw5QsyzKQdfYpkw@mail.gmail.com
* Only kill sync workers at commit time in subscription DDLPeter Eisentraut2017-08-04
| | | | | | This allows a transaction abort to avoid killing those workers. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
* hash: Immediately after a bucket split, try to clean the old bucket.Robert Haas2017-08-04
| | | | | | | | | | | | If it works, then we won't be storing two copies of all the tuples that were just moved. If not, VACUUM will still take care of it eventually. Per a report from AP and analysis from Amit Kapila, it seems that a bulk load can cause splits fast enough that VACUUM won't deal with the problem in time to prevent bloat. Amit Kapila; I rewrote the comment. Discussion: http://postgr.es/m/20170704105728.mwb72jebfmok2nm2@zip.com.au
* Fix concurrent locking of tuple update chainAlvaro Herrera2017-07-26
| | | | | | | | | | | | | | | | | | | If several sessions are concurrently locking a tuple update chain with nonconflicting lock modes using an old snapshot, and they all succeed, it may happen that some of them fail because of restarting the loop (due to a concurrent Xmax change) and getting an error in the subsequent pass while trying to obtain a tuple lock that they already have in some tuple version. This can only happen with very high concurrency (where a row is being both updated and FK-checked by multiple transactions concurrently), but it's been observed in the field and can have unpleasant consequences such as an FK check failing to see a tuple that definitely exists: ERROR: insert or update on table "child_table" violates foreign key constraint "fk_constraint_name" DETAIL: Key (keyid)=(123456) is not present in table "parent_table". (where the key is observably present in the table). Discussion: https://postgr.es/m/20170714210011.r25mrff4nxjhmf3g@alvherre.pgsql
* Fix double shared memory allocation.Teodor Sigaev2017-07-21
| | | | | | | | | | | SLRU buffer lwlocks are allocated twice by oversight in commit fe702a7b3f9f2bc5bf6d173166d7d55226af82c8 where that locks were moved to separate tranche. The bug doesn't have user-visible effects except small overspending of shared memory. Backpatch to 9.6 where it was introduced. Alexander Korotkov with small editorization by me.
* Add static assertions about pg_control fitting into one disk sector.Tom Lane2017-07-19
| | | | | | | | | | | | | | | | | When pg_control was first designed, sizeof(ControlFileData) was small enough that a comment seemed like plenty to document the assumption that it'd fit into one disk sector. Now it's nearly 300 bytes, raising the possibility that somebody would carelessly add enough stuff to create a problem. Let's add a StaticAssertStmt() to ensure that the situation doesn't pass unnoticed if it ever occurs. While at it, rename PG_CONTROL_SIZE to PG_CONTROL_FILE_SIZE to make it clearer what that symbol means, and convert the existing runtime comparisons of sizeof(ControlFileData) vs. PG_CONTROL_FILE_SIZE to be static asserts --- we didn't have that technology when this code was first written. Discussion: https://postgr.es/m/9192.1500490591@sss.pgh.pa.us
* hash: Fix write-ahead logging bugs related to init forks.Robert Haas2017-07-17
| | | | | | | | | | | | | | | | | | One, logging for CREATE INDEX was oblivious to the fact that when an unlogged table is created, *only* operations on the init fork should be logged. Two, init fork buffers need to be flushed after they are written; otherwise, a filesystem-level copy following recovery may do the wrong thing. (There may be a better fix for this issue than the one used here, but this is transposed from the similar logic already present in XLogReadBufferForRedoExtended, and a broader refactoring after beta2 seems inadvisable.) Amit Kapila, reviewed by Ashutosh Sharma, Kyotaro Horiguchi, and Michael Paquier Discussion: http://postgr.es/m/CAA4eK1JpcMsEtOL_J7WODumeEfyrPi7FPYHeVdS7fyyrCrgp4w@mail.gmail.com
* Fix potential data corruption during freezeTeodor Sigaev2017-07-06
| | | | | | | Fix oversight in 3b97e6823b94 bug fix. Bitwise AND is used instead of OR and it cleans all bits in t_infomask heap tuple field. Backpatch to 9.3
* Fix typo in commentPeter Eisentraut2017-06-30
| | | | Author: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
* Fix typo in commentAlvaro Herrera2017-06-22
| | | | | | | | Once upon a time, WAL pointers could be NULL, but no longer. We talk about "valid" now. Reported-by: Amit Langote Discussion: https://postgr.es/m/33e9617d-27f1-eee8-3311-e27af98eaf2b@lab.ntt.co.jp
* Fix possibility of creating a "phantom" segment after promotion.Andres Freund2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When promoting a standby just after a XLOG_SWITCH record was replayed, and next segment(s) are already are locally available (via walsender, restore_command + trigger/recovery target), that segment could accidentally be recycled onto the past of the new timeline. Later checkpointer would create a .ready file for it, assuming there was an error during creation, and it would get archived. That causes trouble if another standby is later brought up from a basebackup from before the timeline creation, because it would try to read the segment, because XLogFileReadAnyTLI just tries all possible timelines, which doesn't have valid contents. Thus replay would fail. The problem, if already occurred, can be fixed by removing the segment and/or having restore_command filter it out. The reason for the creation of such "phantom" segments was, that after an XLOG_SWITCH record the EndOfLog variable points to the beginning of the next segment, and RemoveXlogFile() used XLByteToPrevSeg(). Normally RemoveXlogFile() doing so is harmless, because the last segment will still exist preventing InstallXLogFileSegment() from causing harm, but just after promotion there's no previous segment on the new timeline. Fix that by using XLByteToSeg() instead of XLByteToPrevSeg(). Author: Andres Freund Reported-By: Greg Burek Discussion: https://postgr.es/m/20170619073026.zcwpe6mydsaz5ygd@alap3.anarazel.de Backpatch: 9.2-, bug older than all supported versions
* Phase 3 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Phase 2 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Initial pgindent run with pg_bsd_indent version 2.0.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new indent version includes numerous fixes thanks to Piotr Stefaniak. The main changes visible in this commit are: * Nicer formatting of function-pointer declarations. * No longer unexpectedly removes spaces in expressions using casts, sizeof, or offsetof. * No longer wants to add a space in "struct structname *varname", as well as some similar cases for const- or volatile-qualified pointers. * Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely. * Fixes bug where comments following declarations were sometimes placed with no space separating them from the code. * Fixes some odd decisions for comments following case labels. * Fixes some cases where comments following code were indented to less than the expected column 33. On the less good side, it now tends to put more whitespace around typedef names that are not listed in typedefs.list. This might encourage us to put more effort into typedef name collection; it's not really a bug in indent itself. There are more changes coming after this round, having to do with comment indentation and alignment of lines appearing within parentheses. I wanted to limit the size of the diffs to something that could be reviewed without one's eyes completely glazing over, so it seemed better to split up the changes as much as practical. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Fix typos in commentsMagnus Hagander2017-06-17
| | | | Author: Daniel Gustafsson <daniel@yesql.se>
* Avoid bogus TwoPhaseState locking sequencesAlvaro Herrera2017-06-14
| | | | | | | | | | | | | The optimized code in 728bd991c3c4 contains a few invalid locking sequences. To wit, the original code would try to acquire an lwlock that it already holds. Avoid this by moving lock acquisitions to higher-level code, and install appropriate assertions in low-level that the correct mode is held. Authors: Michael Paquier, Álvaro Herrera Reported-By: chuanting wang Bug: #14680 Discussion: https://postgr.es/m/20170531033228.1487.10124@wrigleys.postgresql.org
* Re-run pgindent.Tom Lane2017-06-13
| | | | | | | | This is just to have a clean base state for testing of Piotr Stefaniak's latest version of FreeBSD indent. I fixed up a couple of places where pgindent would have changed format not-nicely. perltidy not included. Discussion: https://postgr.es/m/VI1PR03MB119959F4B65F000CA7CD9F6BF2CC0@VI1PR03MB1199.eurprd03.prod.outlook.com
* Formatting improvements in config file samplesPeter Eisentraut2017-06-09
|
* Update code commentsPeter Eisentraut2017-06-09
| | | | Author: Neha Khatri <nehakhatri5@gmail.com>
* Clean up latch related code.Andres Freund2017-06-06
| | | | | | | | | | | | | | | | | | | | | | The larger part of this patch replaces usages of MyProc->procLatch with MyLatch. The latter works even early during backend startup, where MyProc->procLatch doesn't yet. While the affected code shouldn't run in cases where it's not initialized, it might get copied into places where it might. Using MyLatch is simpler and a bit faster to boot, so there's little point to stick with the previous coding. While doing so I noticed some weaknesses around newly introduced uses of latches that could lead to missed events, and an omitted CHECK_FOR_INTERRUPTS() call in worker_spi. As all the actual bugs are in v10 code, there doesn't seem to be sufficient reason to backpatch this. Author: Andres Freund Discussion: https://postgr.es/m/20170606195321.sjmenrfgl2nu6j63@alap3.anarazel.de https://postgr.es/m/20170606210405.sim3yl6vpudhmufo@alap3.anarazel.de Backpatch: -
* Prevent possibility of panics during shutdown checkpoint.Andres Freund2017-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the checkpointer writes the shutdown checkpoint, it checks afterwards whether any WAL has been written since it started and throws a PANIC if so. At that point, only walsenders are still active, so one might think this could not happen, but walsenders can also generate WAL, for instance in BASE_BACKUP and logical decoding related commands (e.g. via hint bits). So they can trigger this panic if such a command is run while the shutdown checkpoint is being written. To fix this, divide the walsender shutdown into two phases. First, checkpointer, itself triggered by postmaster, sends a PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend is idle or runs an SQL query this causes the backend to shutdown, if logical replication is in progress all existing WAL records are processed followed by a shutdown. Otherwise this causes the walsender to switch to the "stopping" state. In this state, the walsender will reject any further replication commands. The checkpointer begins the shutdown checkpoint once all walsenders are confirmed as stopping. When the shutdown checkpoint finishes, the postmaster sends us SIGUSR2. This instructs walsender to send any outstanding WAL, including the shutdown checkpoint record, wait for it to be replicated to the standby, and then exit. Author: Andres Freund, based on an earlier patch by Michael Paquier Reported-By: Fujii Masao, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de Backpatch: 9.4, where logical decoding was introduced