aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Cope with possible failure of the oldest MultiXact to exist.Robert Haas2015-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent commits, mainly b69bf30b9bfacafc733a9ba77c9587cf54d06c0c and 53bb309d2d5a9432d2602c93ed18e58bd2924e15, introduced mechanisms to protect against wraparound of the MultiXact member space: the number of multixacts that can exist at one time is limited to 2^32, but the total number of members in those multixacts is also limited to 2^32, and older code did not take care to enforce the second limit, potentially allowing old data to be overwritten while it was still needed. Unfortunately, these new mechanisms failed to account for the fact that the code paths in which they run might be executed during recovery or while the cluster was in an inconsistent state. Also, they failed to account for the fact that users who used pg_upgrade to upgrade a PostgreSQL version between 9.3.0 and 9.3.4 might have might oldestMultiXid = 1 in the control file despite the true value being larger. To fix these problems, first, avoid unnecessarily examining the mmembers of MultiXacts when the cluster is not known to be consistent. TruncateMultiXact has done this for a long time, and this patch does not fix that. But the new calls used to prevent member wraparound are not needed until we reach normal running, so avoid calling them earlier. (SetMultiXactIdLimit is actually called before InRecovery is set, so we can't rely on that; we invent our own multixact-specific flag instead.) Second, make failure to look up the members of a MultiXact a non-fatal error. Instead, if we're unable to determine the member offset at which wraparound would occur, postpone arming the member wraparound defenses until we are able to do so. If we're unable to determine the member offset that should force autovacuum, force it continuously until we are able to do so. If we're unable to deterine the member offset at which we should truncate the members SLRU, log a message and skip truncation. An important consequence of these changes is that anyone who does have a bogus oldestMultiXid = 1 value in pg_control will experience immediate emergency autovacuuming when upgrading to a release that contains this fix. The release notes should highlight this fact. If a user has no pg_multixact/offsets/0000 file, but has oldestMultiXid = 1 in the control file, they may wish to vacuum any tables with relminmxid = 1 prior to upgrading in order to avoid an immediate emergency autovacuum after the upgrade. This must be done with a PostgreSQL version 9.3.5 or newer and with vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age set to 0. This patch also adds an additional log message at each database server startup, indicating either that protections against member wraparound have been engaged, or that they have not. In the latter case, once autovacuum has advanced oldestMultiXid to a sane value, the message indicating that the guards have been engaged will appear at the next checkpoint. A few additional messages have also been added at the DEBUG1 level so that the correct operation of this code can be properly audited. Along the way, this patch fixes another, related bug in TruncateMultiXact that has existed since PostgreSQL 9.3.0: when no MultiXacts exist at all, the truncation code looks up NextMultiXactId, which doesn't exist yet. This can lead to TruncateMultiXact removing every file in pg_multixact/offsets instead of keeping one around, as it should. This in turn will cause the database server to refuse to start afterwards. Patch by me. Review by Álvaro Herrera, Andres Freund, Noah Misch, and Thomas Munro.
* pgindent run on access/transam/multixact.cAlvaro Herrera2015-06-04
| | | | | | | | | This file has been patched over and over, and the differences to master caused by pgindent are annoying enough that it seems saner to make the older branches look the same. Backpatch to 9.3, which is as far back as backpatching of bugfixes is necessary.
* Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.Tom Lane2015-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the inner side of a nestloop SEMI or ANTI join is an indexscan that uses all the join clauses as indexquals, it can be presumed that both matched and unmatched outer rows will be processed very quickly: for matched rows, we'll stop after fetching one row from the indexscan, while for unmatched rows we'll have an indexscan that finds no matching index entries, which should also be quick. The planner already knew about this, but it was nonetheless charging for at least one full run of the inner indexscan, as a consequence of concerns about the behavior of materialized inner scans --- but those concerns don't apply in the fast case. If the inner side has low cardinality (many matching rows) this could make an indexscan plan look far more expensive than it actually is. To fix, rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we don't add the inner scan cost until we've inspected the indexquals, and then we can add either the full-run cost or just the first tuple's cost as appropriate. Experimentation with this fix uncovered another problem: add_path and friends were coded to disregard cheap startup cost when considering parameterized paths. That's usually okay (and desirable, because it thins the path herd faster); but in this fast case for SEMI/ANTI joins, it could result in throwing away the desired plain indexscan path in favor of a bitmap scan path before we ever get to the join costing logic. In the many-matching-rows cases of interest here, a bitmap scan will do a lot more work than required, so this is a problem. To fix, add a per-relation flag consider_param_startup that works like the existing consider_startup flag, but applies to parameterized paths, and set it for relations that are the inside of a SEMI or ANTI join. To make this patch reasonably safe to back-patch, care has been taken to avoid changing the planner's behavior except in the very narrow case of SEMI/ANTI joins with inner indexscans. There are places in compare_path_costs_fuzzily and add_path_precheck that are not terribly consistent with the new approach, but changing them will affect planner decisions at the margins in other cases, so we'll leave that for a HEAD-only fix. Back-patch to 9.3; before that, the consider_startup flag didn't exist, meaning that the second aspect of the patch would be too invasive. Per a complaint from Peter Holzer and analysis by Tomas Vondra.
* Remove special cases for ETXTBSY from new fsync'ing logic.Tom Lane2015-05-29
| | | | | | | | | | The argument that this is a sufficiently-expected case to be silently ignored seems pretty thin. Andres had brought it up back when we were still considering that most fsync failures should be hard errors, and it probably would be legit not to fail hard for ETXTBSY --- but the same is true for EROFS and other cases, which is why we gave up on hard failures. ETXTBSY is surely not a normal case, so logging the failure seems fine from here.
* Fix fsync-at-startup code to not treat errors as fatal.Tom Lane2015-05-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2ce439f3379aed857517c8ce207485655000fc8e introduced a rather serious regression, namely that if its scan of the data directory came across any un-fsync-able files, it would fail and thereby prevent database startup. Worse yet, symlinks to such files also caused the problem, which meant that crash restart was guaranteed to fail on certain common installations such as older Debian. After discussion, we agreed that (1) failure to start is worse than any consequence of not fsync'ing is likely to be, therefore treat all errors in this code as nonfatal; (2) we should not chase symlinks other than those that are expected to exist, namely pg_xlog/ and tablespace links under pg_tblspc/. The latter restriction avoids possibly fsync'ing a much larger part of the filesystem than intended, if the user has left random symlinks hanging about in the data directory. This commit takes care of that and also does some code beautification, mainly moving the relevant code into fd.c, which seems a much better place for it than xlog.c, and making sure that the conditional compilation for the pre_sync_fname pass has something to do with whether pg_flush_data works. I also relocated the call site in xlog.c down a few lines; it seems a bit silly to be doing this before ValidateXLOGDirectoryStructure(). The similar logic in initdb.c ought to be made to match this, but that change is noncritical and will be dealt with separately. Back-patch to all active branches, like the prior commit. Abhijit Menon-Sen and Tom Lane
* Fix pg_get_functiondef() to print a function's LEAKPROOF property.Tom Lane2015-05-28
| | | | | | | Seems to have been an oversight in the original leakproofness patch. Per report and patch from Jeevan Chalke. In passing, prettify some awkward leakproof-related code in AlterFunction.
* Revert "Add all structured objects passed to pushJsonbValue piecewise."Andrew Dunstan2015-05-26
| | | | | | | This reverts commit 54547bd87f49326d67051254c363e6597d16ffda. This appears to have been a thinko on my part. I will try to come up wioth a better solution.
* Add all structured objects passed to pushJsonbValue piecewise.Andrew Dunstan2015-05-26
| | | | | | | | | | Commit 9b74f32cdbff8b9be47fc69164eae552050509ff did this for objects of type jbvBinary, but in trying further to simplify some of the new jsonb code I discovered that objects of type jbvObject or jbvArray passed as WJB_ELEM or WJB_VALUE also caused problems. These too are now added component by component. Backpatch to 9.4.
* Update README.tuplockAlvaro Herrera2015-05-25
| | | | | | | | | | | Multixact truncation is now handled differently, and this file hadn't gotten the memo. Per note from Amit Langote. I didn't use his patch, though. Also update the description of infomask bits, which weren't completely up to date either. This commit also propagates b01a4f6838 back to 9.3 and 9.4, which apparently I failed to do back then.
* Rename pg_shdepend.c's typedef "objectType" to SharedDependencyObjectType.Tom Lane2015-05-24
| | | | | | | | | | | | The name objectType is widely used as a field name, and it's pure luck that this conflict has not caused pgindent to go crazy before. It messed up pg_audit.c pretty good though. Since pg_shdepend.c doesn't export this typedef and only uses it in three places, changing that seems saner than changing the field usages. Back-patch because we're contemplating using the union of all branch typedefs for future pgindent runs, so this won't fix anything if it stays the same in back branches.
* Unpack jbvBinary objects passed to pushJsonbValueAndrew Dunstan2015-05-22
| | | | | | | | | | | | | | | | | | | pushJsonbValue was accepting jbvBinary objects passed as WJB_ELEM or WJB_VALUE data. While this succeeded, when those objects were later encountered in attempting to convert the result to Jsonb, errors occurred. With this change we ghuarantee that a JSonbValue constructed from calls to pushJsonbValue does not contain any jbvBinary objects. This cures a problem observed with jsonb_delete. This means callers of pushJsonbValue no longer need to perform this unpacking themselves. A subsequent patch will perform some cleanup in that area. The error was not triggered by any 9.4 code, but this is a publicly visible routine, and so the error could be exercised by third party code, therefore backpatch to 9.4. Bug report from Peter Geoghegan, fix by me.
* Fix spelling in commentSimon Riggs2015-05-19
|
* Fix off-by-one error in Assertion.Heikki Linnakangas2015-05-19
| | | | | | | | | | | | The point of the assertion is to ensure that the arrays allocated in stack are large enough, but the check was one item short. This won't matter in practice because MaxIndexTuplesPerPage is an overestimate, so you can't have that many items on a page in reality. But let's be tidy. Spotted by Anastasia Lubennikova. Backpatch to all supported versions, like the patch that added the assertion.
* Fix error message in pre_sync_fname.Robert Haas2015-05-18
| | | | | | | The old one didn't include %m anywhere, and required extra translation. Report by Peter Eisentraut. Fix by me. Review by Tom Lane.
* Check return values of sensitive system library calls.Noah Misch2015-05-18
| | | | | | | | | | | | | | | | | | | | | | PostgreSQL already checked the vast majority of these, missing this handful that nearly cannot fail. If putenv() failed with ENOMEM in pg_GSS_recvauth(), authentication would proceed with the wrong keytab file. If strftime() returned zero in cache_locale_time(), using the unspecified buffer contents could lead to information exposure or a crash. Back-patch to 9.0 (all supported versions). Other unchecked calls to these functions, especially those in frontend code, pose negligible security concern. This patch does not address them. Nonetheless, it is always better to check return values whose specification provides for indicating an error. In passing, fix an off-by-one error in strftime_win32()'s invocation of WideCharToMultiByte(). Upon retrieving a value of exactly MAX_L10N_DATA bytes, strftime_win32() would overrun the caller's buffer by one byte. MAX_L10N_DATA is chosen to exceed the length of every possible value, so the vulnerable scenario probably does not arise. Security: CVE-2015-3166
* Prevent a double free by not reentering be_tls_close().Noah Misch2015-05-18
| | | | | | | | | | | | Reentering this function with the right timing caused a double free, typically crashing the backend. By synchronizing a disconnection with the authentication timeout, an unauthenticated attacker could achieve this somewhat consistently. Call be_tls_close() solely from within proc_exit_prepare(). Back-patch to 9.0 (all supported versions). Benkocs Norbert Attila Security: CVE-2015-3165
* Translation updatesPeter Eisentraut2015-05-18
| | | | | Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: d5e6d568c213297ebd5530ad14fb26d75ed66c25
* Fix whitespacePeter Eisentraut2015-05-16
|
* Fix RBM_ZERO_AND_LOCK mode to not acquire lock on local buffers.Heikki Linnakangas2015-05-13
| | | | | | | | | | | | | | | Commit 81c45081 introduced a new RBM_ZERO_AND_LOCK mode to ReadBuffer, which takes a lock on the buffer before zeroing it. However, you cannot take a lock on a local buffer, and you got a segfault instead. The version of that patch committed to master included a check for !isLocalBuf, and therefore didn't crash, but oddly I missed that in the back-patched versions. This patch adds that check to the back-branches too. RBM_ZERO_AND_LOCK mode is only used during WAL replay, and in hash indexes. WAL replay only deals with shared buffers, so the only way to trigger the bug is with a temporary hash index. Reported by Artem Ignatyev, analysis by Tom Lane.
* Fix incorrect checking of deferred exclusion constraint after a HOT update.Tom Lane2015-05-11
| | | | | | | | | | | | | If a row that potentially violates a deferred exclusion constraint is HOT-updated later in the same transaction, the exclusion constraint would be reported as violated when the check finally occurs, even if the row(s) the new row originally conflicted with have since been removed. This happened because the wrong TID was passed to check_exclusion_constraint(), causing the live HOT-updated row to be seen as a conflicting row rather than recognized as the row-under-test. Per bug #13148 from Evan Martin. It's been broken since exclusion constraints were invented, so back-patch to all supported branches.
* Increase threshold for multixact member emergency autovac to 50%.Robert Haas2015-05-11
| | | | | | | | | | | | | | | | | | | | | | | Analysis by Noah Misch shows that the 25% threshold set by commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 is lower than any other, similar autovac threshold. While we don't know exactly what value will be optimal for all users, it is better to err a little on the high side than on the low side. A higher value increases the risk that users might exhaust the available space and start seeing errors before autovacuum can clean things up sufficiently, but a user who hits that problem can compensate for it by reducing autovacuum_multixact_freeze_max_age to a value dependent on their average multixact size. On the flip side, if the emergency cap imposed by that patch kicks in too early, the user will experience excessive wraparound scanning and will be unable to mitigate that problem by configuration. The new value will hopefully reduce the risk of such bad experiences while still providing enough headroom to avoid multixact member exhaustion for most users. Along the way, adjust the documentation to reflect the effects of commit 04e6d3b877e060d8445eb653b7ea26b1ee5cec6b, which taught autovacuum to run for multixact wraparound even when autovacuum is configured off.
* Even when autovacuum=off, force it for members as we do in other cases.Robert Haas2015-05-11
| | | | Thomas Munro, with some adjustments by me.
* Advance the stop point for multixact offset creation only at checkpoint.Robert Haas2015-05-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b69bf30b9bfacafc733a9ba77c9587cf54d06c0c advanced the stop point at vacuum time, but this has subsequently been shown to be unsafe as a result of analysis by myself and Thomas Munro and testing by Thomas Munro. The crux of the problem is that the SLRU deletion logic may get confused about what to remove if, at exactly the right time during the checkpoint process, the head of the SLRU crosses what used to be the tail. This patch, by me, fixes the problem by advancing the stop point only following a checkpoint. This has the additional advantage of making the removal logic work during recovery more like the way it works during normal running, which is probably good. At least one of the calls to DetermineSafeOldestOffset which this patch removes was already dead, because MultiXactAdvanceOldest is called only during recovery and DetermineSafeOldestOffset was set up to do nothing during recovery. That, however, is inconsistent with the principle that recovery and normal running should work similarly, and was confusing to boot. Along the way, fix some comments that previous patches in this area neglected to update. It's not clear to me whether there's any concrete basis for the decision to use only half of the multixact ID space, but it's neither necessary nor sufficient to prevent multixact member wraparound, so the comments should not say otherwise.
* Fix DetermineSafeOldestOffset for the case where there are no mxacts.Robert Haas2015-05-10
| | | | | | | | Commit b69bf30b9bfacafc733a9ba77c9587cf54d06c0c failed to take into account the possibility that there might be no multixacts in existence at all. Report by Thomas Munro; patch by me.
* Teach autovacuum about multixact member wraparound.Robert Haas2015-05-08
| | | | | | | | | | | | | | | | | | | | | The logic introduced in commit b69bf30b9bfacafc733a9ba77c9587cf54d06c0c and repaired in commits 669c7d20e6374850593cb430d332e11a3992bbcf and 7be47c56af3d3013955c91c2877c08f2a0e3e6a2 helps to ensure that we don't overwrite old multixact member information while it is still needed, but a user who creates many large multixacts can still exhaust the member space (and thus start getting errors) while autovacuum stands idly by. To fix this, progressively ramp down the effective value (but not the actual contents) of autovacuum_multixact_freeze_max_age as member space utilization increases. This makes autovacuum more aggressive and also reduces the threshold for a manual VACUUM to perform a full-table scan. This patch leaves unsolved the problem of ensuring that emergency autovacuums are triggered even when autovacuum=off. We'll need to fix that via a separate patch. Thomas Munro and Robert Haas
* Fix incorrect math in DetermineSafeOldestOffset.Robert Haas2015-05-07
| | | | | | | | | | The old formula didn't have enough parentheses, so it would do the wrong thing, and it used / rather than % to find a remainder. The effect of these oversights is that the stop point chosen by the logic introduced in commit b69bf30b9bfacafc733a9ba77c9587cf54d06c0c might be rather meaningless. Thomas Munro, reviewed by Kevin Grittner, with a whitespace tweak by me.
* Fix some problems with patch to fsync the data directory.Robert Haas2015-05-05
| | | | | | | | pg_win32_is_junction() was a typo for pgwin32_is_junction(). open() was used not only in a two-argument form, which breaks on Windows, but also where BasicOpenFile() should have been used. Per reports from Andrew Dunstan and David Rowley.
* Recursively fsync() the data directory after a crash.Robert Haas2015-05-04
| | | | | | | | | | | Otherwise, if there's another crash, some writes from after the first crash might make it to disk while writes from before the crash fail to make it to disk. This could lead to data corruption. Back-patch to all supported versions. Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised by me.
* Fix two small bugs in json's populate_record_workerAndrew Dunstan2015-05-04
| | | | | | | | | | The first bug is not releasing a tupdesc when doing an early return out of the function. The second bug is a logic error in choosing when to do an early return if given an empty jsonb object. Bug reports from Pavel Stehule and Tom Lane respectively. Backpatch to 9.4 where these were introduced.
* Fix overlooked relcache invalidation in ALTER TABLE ... ALTER CONSTRAINT.Tom Lane2015-05-03
| | | | | | | | | | | | | | | When altering the deferredness state of a foreign key constraint, we correctly updated the catalogs and then invalidated the relcache state for the target relation ... but that's not the only relation with relevant triggers. Must invalidate the other table as well, or the state change fails to take effect promptly for operations triggered on the other table. Per bug #13224 from Christian Ullrich. In passing, reorganize regression test case for this feature so that it isn't randomly injected into the middle of an unrelated test sequence. Oversight in commit f177cbfe676dc2c7ca2b206c54d6bf819feeea8b. Back-patch to 9.4 where the faulty code was added.
* Mark views created from tables as replication identity 'nothing'Bruce Momjian2015-05-01
| | | | | | | | | pg_dump turns tables into views using a method that was not setting pg_class.relreplident properly. Patch by Marko Tiikkaja Backpatch through 9.4
* Fix pg_upgrade's multixact handling (again)Alvaro Herrera2015-04-30
| | | | | | | | | | | | | | | We need to create the pg_multixact/offsets file deleted by pg_upgrade much earlier than we originally were: it was in TrimMultiXact(), which runs after we exit recovery, but it actually needs to run earlier than the first call to SetMultiXactIdLimit (before recovery), because that routine already wants to read the first offset segment. Per pg_upgrade trouble report from Jeff Janes. While at it, silence a compiler warning about a pointless assert that an unsigned variable was being tested non-negative. This was a signed constant in Thomas Munro's patch which I changed to unsigned before commit. Pointed out by Andres Freund.
* Code review for multixact bugfixAlvaro Herrera2015-04-28
| | | | | | Reword messages, rename a confusingly named function. Per Robert Haas.
* Protect against multixact members wraparoundAlvaro Herrera2015-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multixact member files are subject to early wraparound overflow and removal: if the average multixact size is above a certain threshold (see note below) the protections against offset overflow are not enough: during multixact truncation at checkpoint time, some pg_multixact/members files would be removed because the server considers them to be old and not needed anymore. This leads to loss of files that are critical to interpret existing tuples's Xmax values. To protect against this, since we don't have enough info in pg_control and we can't modify it in old branches, we maintain shared memory state about the oldest value that we need to keep; we use this during new multixact creation to abort if an old still-needed file would get overwritten. This value is kept up to date by checkpoints, which makes it not completely accurate but should be good enough. We start emitting warnings sometime earlier, so that the eventual multixact-shutdown doesn't take DBAs completely by surprise (more precisely: once 20 members SLRU segments are remaining before shutdown.) On troublesome average multixact size: The threshold size depends on the multixact freeze parameters. The oldest age is related to the greater of multixact_freeze_table_age and multixact_freeze_min_age: anything older than that should be removed promptly by autovacuum. If autovacuum is keeping up with multixact freezing, the troublesome multixact average size is (2^32-1) / Max(freeze table age, freeze min age) or around 28 members per multixact. Having an average multixact size larger than that will eventually cause new multixact data to overwrite the data area for older multixacts. (If autovacuum is not able to keep up, or there are errors in vacuuming, the actual maximum is multixact_freeeze_max_age instead, at which point multixact generation is stopped completely. The default value for this limit is 400 million, which means that the multixact size that would cause trouble is about 10 members). Initial bug report by Timothy Garnett, bug #12990 Backpatch to 9.3, where the problem was introduced. Authors: Álvaro Herrera, Thomas Munro Reviews: Thomas Munro, Amit Kapila, Robert Haas, Kevin Grittner
* Use a fd opened for read/write when syncing slots during startup.Andres Freund2015-04-28
| | | | | | | | | | | | | | | | | | | | | Some operating systems, including the reporter's windows, return EBADFD or similar when fsync() is invoked on a O_RDONLY file descriptor. Unfortunately RestoreSlotFromDisk() does exactly that; which causes failures after restarts in at least some scenarios. If you hit the bug the error message will be something like ERROR: could not fsync file "pg_replslot/$name/state": Bad file descriptor Simply use O_RDWR instead of O_RDONLY when opening the relevant file descriptor to fix the bug. Unfortunately I have no way of verifying the fix, but we've seen similar problems in the past. This bug goes back to 9.4 where slots were introduced. Backpatch accordingly. Reported-By: Patrice Drolet Bug: #13143: Discussion: 20150424101006.2556.60897@wrigleys.postgresql.org
* Prevent improper reordering of antijoins vs. outer joins.Tom Lane2015-04-25
| | | | | | | | | | | An outer join appearing within the RHS of an antijoin can't commute with the antijoin, but somehow I missed teaching make_outerjoininfo() about that. In Teodor Sigaev's recent trouble report, this manifests as a "could not find RelOptInfo for given relids" error within eqjoinsel(); but I think silently wrong query results are possible too, if the planner misorders the joins and doesn't happen to trigger any internal consistency checks. It's broken as far back as we had antijoins, so back-patch to all supported branches.
* Fix obsolete comment in set_rel_size().Tom Lane2015-04-24
| | | | | | | | | | | The cross-reference to set_append_rel_pathlist() was obsoleted by commit e2fa76d80ba571d4de8992de6386536867250474, which split what had been set_rel_pathlist() and child routines into two sets of functions. But I (tgl) evidently missed updating this comment. Back-patch to 9.2 to avoid unnecessary divergence among branches. Amit Langote
* Fix deadlock at startup, if max_prepared_transactions is too small.Heikki Linnakangas2015-04-23
| | | | | | | | | | | | | | | When the startup process recovers transactions by scanning pg_twophase directory, it should clear MyLockedGxact after it's done processing each transaction. Like we do during normal operation, at PREPARE TRANSACTION. Otherwise, if the startup process exits due to an error, it will try to clear the locking_backend field of the last recovered transaction. That's usually harmless, but if the error happens in MarkAsPreparing, while holding TwoPhaseStateLock, the shmem-exit hook will try to acquire TwoPhaseStateLock again, and deadlock with itself. This fixes bug #13128 reported by Grant McAlister. The bug was introduced by commit bb38fb0d, so backpatch to all supported versions like that commit.
* Fix typo in commentAlvaro Herrera2015-04-14
| | | | | | | | SLRU_SEGMENTS_PER_PAGE -> SLRU_PAGES_PER_SEGMENT I introduced this ancient typo in subtrans.c and later propagated it to multixact.c. I fixed the latter in f741300c, but only back to 9.3; backpatch to all supported branches for consistency.
* Don't archive bogus recycled or preallocated files after timeline switch.Heikki Linnakangas2015-04-13
| | | | | | | | | | | | | | | | | | | After a timeline switch, we would leave behind recycled WAL segments that are in the future, but on the old timeline. After promotion, and after they become old enough to be recycled again, we would notice that they don't have a .ready or .done file, create a .ready file for them, and archive them. That's bogus, because the files contain garbage, recycled from an older timeline (or prealloced as zeros). We shouldn't archive such files. This could happen when we're following a timeline switch during replay, or when we switch to new timeline at end-of-recovery. To fix, whenever we switch to a new timeline, scan the data directory for WAL segments on the old timeline, but with a higher segment number, and remove them. Those don't belong to our timeline history, and are most likely bogus recycled or preallocated files. They could also be valid files that we streamed from the primary ahead of time, but in any case, they're not needed to recover to the new timeline.
* Remove duplicated words in comments.Heikki Linnakangas2015-04-12
| | | | David Rowley
* Fix autovacuum launcher shutdown sequenceAlvaro Herrera2015-04-08
| | | | | | | | | | | | | | | It was previously possible to have the launcher re-execute its main loop before shutting down if some other signal was received or an error occurred after getting SIGTERM, as reported by Qingqing Zhou. While investigating, Tom Lane further noticed that if autovacuum had been disabled in the config file, it would misbehave by trying to start a new worker instead of bailing out immediately -- it would consider itself as invoked in emergency mode. Fix both problems by checking the shutdown flag in a few more places. These problems have existed since autovacuum was introduced, so backpatch all the way back.
* Fix incorrect matching of subexpressions in outer-join plan nodes.Tom Lane2015-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we would re-use input subexpressions in all expression trees attached to a Join plan node. However, if it's an outer join and the subexpression appears in the nullable-side input, this is potentially incorrect for apparently-matching subexpressions that came from above the outer join (ie, targetlist and qpqual expressions), because the executor will treat the subexpression value as NULL when maybe it should not be. The case is fairly hard to hit because (a) you need a non-strict subexpression (else NULL is correct), and (b) we don't usually compute expressions in the outputs of non-toplevel plan nodes. But we might do so if the expressions are sort keys for a mergejoin, for example. Probably in the long run we should make a more explicit distinction between Vars appearing above and below an outer join, but that will be a major planner redesign and not at all back-patchable. For the moment, just hack set_join_references so that it will not match any non-Var expressions coming from nullable inputs to expressions that came from above the join. (This is somewhat overkill, in that a strict expression could still be matched, but it doesn't seem worth the effort to check that.) Per report from Qingqing Zhou. The added regression test case is based on his example. This has been broken for a very long time, so back-patch to all active branches.
* Remove unnecessary variables in _hash_splitbucket().Tom Lane2015-04-03
| | | | | | | | | | | Commit ed9cc2b5df59fdbc50cce37399e26b03ab2c1686 made it unnecessary to pass start_nblkno to _hash_splitbucket(), and for that matter unnecessary to have the internal nblkno variable either. My compiler didn't complain about that, but some did. I also rearranged the use of oblkno a bit to make that case more parallel. Report and initial patch by Petr Jelinek, rearranged a bit by me. Back-patch to all branches, like the previous patch.
* Fix rare startup failure induced by MVCC-catalog-scans patch.Tom Lane2015-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | While a new backend nominally participates in sinval signaling starting from the SharedInvalBackendInit call near the top of InitPostgres, it cannot recognize sinval messages for unshared catalogs of its database until it has set up MyDatabaseId. This is not problematic for the catcache or relcache, which by definition won't have loaded any data from or about such catalogs before that point. However, commit 568d4138c646cd7c introduced a mechanism for re-using MVCC snapshots for catalog scans, and made invalidation of those depend on recognizing relevant sinval messages. So it's possible to establish a catalog snapshot to read pg_authid and pg_database, then before we set MyDatabaseId, receive sinval messages that should result in invalidating that snapshot --- but do not, because we don't realize they are for our database. This mechanism explains the intermittent buildfarm failures we've seen since commit 31eae6028eca4365. That commit was not itself at fault, but it introduced a new regression test that does reconnections concurrently with the "vacuum full pg_am" command in vacuum.sql. This allowed the pre-existing error to be exposed, given just the right timing, because we'd fail to update our information about how to access pg_am. In principle any VACUUM FULL on a system catalog could have created a similar hazard for concurrent incoming connections. Perhaps there are more subtle failure cases as well. To fix, force invalidation of the catalog snapshot as soon as we've set MyDatabaseId. Back-patch to 9.4 where the error was introduced.
* After a crash, don't restart workers with BGW_NEVER_RESTART.Robert Haas2015-04-02
| | | | Amit Khandekar
* Correct comment to use RS_EPHEMERALSimon Riggs2015-04-02
|
* Remove spurious semicolons.Heikki Linnakangas2015-03-31
| | | | Petr Jelinek
* Fix bogus concurrent use of _hash_getnewbuf() in bucket split code.Tom Lane2015-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | _hash_splitbucket() obtained the base page of the new bucket by calling _hash_getnewbuf(), but it held no exclusive lock that would prevent some other process from calling _hash_getnewbuf() at the same time. This is contrary to _hash_getnewbuf()'s API spec and could in fact cause failures. In practice, we must only call that function while holding write lock on the hash index's metapage. An additional problem was that we'd already modified the metapage's bucket mapping data, meaning that failure to extend the index would leave us with a corrupt index. Fix both issues by moving the _hash_getnewbuf() call to just before we modify the metapage in _hash_expandtable(). Unfortunately there's still a large problem here, which is that we could also incur ENOSPC while trying to get an overflow page for the new bucket. That would leave the index corrupt in a more subtle way, namely that some index tuples that should be in the new bucket might still be in the old one. Fixing that seems substantially more difficult; even preallocating as many pages as we could possibly need wouldn't entirely guarantee that the bucket split would complete successfully. So for today let's just deal with the base case. Per report from Antonin Houska. Back-patch to all active branches.
* Fix rare core dump in BackendIdGetTransactionIds().Tom Lane2015-03-30
| | | | | | | | | | BackendIdGetTransactionIds() neglected the possibility that the PROC pointer in a ProcState array entry is null. In current usage, this could only crash if the other backend had exited since pgstat_read_current_status saw it as active, which is a pretty narrow window. But it's reachable in the field, per bug #12918 from Vladimir Borodin. Back-patch to 9.4 where the faulty code was introduced.