aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
...
* Do wal_level and hot standby checks when doing crash-then-archive recovery.Heikki Linnakangas2014-03-05
| | | | | | | | CheckRequiredParameterValues() should perform the checks if archive recovery was requested, even if we are going to perform crash recovery first. Reported by Kyotaro HORIGUCHI. Backpatch to 9.2, like the crash-then-archive recovery mode.
* Fix lastReplayedEndRecPtr calculation when starting from shutdown checkpoint.Heikki Linnakangas2014-03-05
| | | | | | | | | | | | | | | When entering crash recovery followed by archive recovery, and the latest checkpoint is a shutdown checkpoint, and there are no more WAL records to replay before transitioning from crash to archive recovery, we would not immediately allow read-only connections in hot standby mode even if we could. That's because when starting from a shutdown checkpoint, we set lastReplayedEndRecPtr incorrectly to the record before the checkpoint record, instead of the checkpoint record itself. We don't run the redo routine of the shutdown checkpoint record, but starting recovery from it goes through the same motions, so it should be considered as replayed. Reported by Kyotaro HORIGUCHI. All versions with hot standby are affected, so backpatch to 9.0.
* Introduce logical decoding.Robert Haas2014-03-03
| | | | | | | | | | | | | | | | | | | | | | This feature, building on previous commits, allows the write-ahead log stream to be decoded into a series of logical changes; that is, inserts, updates, and deletes and the transactions which contain them. It is capable of handling decoding even across changes to the schema of the effected tables. The output format is controlled by a so-called "output plugin"; an example is included. To make use of this in a real replication system, the output plugin will need to be modified to produce output in the format appropriate to that system, and to perform filtering. Currently, information can be extracted from the logical decoding system only via SQL; future commits will add the ability to stream changes via walsender. Andres Freund, with review and other contributions from many other people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan, Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve Singer.
* Remove bogus while-loop.Heikki Linnakangas2014-02-28
| | | | | | | | | | Commit abf5c5c9a4f142b3343614746bb9e99a794f8e7b added a bogus while- statement after the for(;;)-loop. It went unnoticed in testing, because it was dead code. Report by KONDO Mitsumasa. Backpatch to 9.3. The commit that introduced this was also applied to 9.2, but not the bogus while-loop part, because the code in 9.2 looks quite different.
* Fix WAL replay of locking an updated tupleAlvaro Herrera2014-02-27
| | | | | | | | | | | | | | | We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on WAL replay of a tuple-lock operation, which is incorrect when the tuple is already updated. Back-patch to 9.3. The clearing of both header elements was there previously, but since no update could be present on a tuple that was being locked, it was harmless. Bug reported by Peter Geoghegan and Greg Stark in CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=O67w0cSswPvV7XfUcU5g@mail.gmail.com and CAM-w4HPTOeMT4KP0OJK+mGgzgcTOtLRTvFZyvD0O4aH-7dxo3Q@mail.gmail.com respectively; diagnosis by Andres Freund.
* btbuild no longer calls _bt_doinsert(), update comment.Heikki Linnakangas2014-02-26
| | | | Peter Geoghegan
* Improve comment on setting data_checksum GUC.Heikki Linnakangas2014-02-20
| | | | There was an extra space there, and "fixed" wasn't very descriptive.
* Switch various builtin functions to use pg_lsn instead of text.Robert Haas2014-02-19
| | | | | | | | | | | The functions in slotfuncs.c don't exist in any released version, but the changes to xlogfuncs.c represent backward-incompatibilities. Per discussion, we're hoping that the queries using these functions are few enough and simple enough that this won't cause too much breakage for users. Michael Paquier, reviewed by Andres Freund and further modified by me.
* Fix comment; checkpointer, not bgwriter, performs checkpoints since 9.2.Heikki Linnakangas2014-02-18
| | | | Amit Langote
* Prevent potential overruns of fixed-size buffers.Tom Lane2014-02-17
| | | | | | | | | | | | | | | | | | | | | | | Coverity identified a number of places in which it couldn't prove that a string being copied into a fixed-size buffer would fit. We believe that most, perhaps all of these are in fact safe, or are copying data that is coming from a trusted source so that any overrun is not really a security issue. Nonetheless it seems prudent to forestall any risk by using strlcpy() and similar functions. Fixes by Peter Eisentraut and Jozef Mlich based on Coverity reports. In addition, fix a potential null-pointer-dereference crash in contrib/chkpass. The crypt(3) function is defined to return NULL on failure, but chkpass.c didn't check for that before using the result. The main practical case in which this could be an issue is if libc is configured to refuse to execute unapproved hashing algorithms (e.g., "FIPS mode"). This ideally should've been a separate commit, but since it touches code adjacent to one of the buffer overrun changes, I included it in this commit to avoid last-minute merge issues. This issue was reported by Honza Horak. Security: CVE-2014-0065 for buffer overruns, CVE-2014-0066 for crypt()
* Change the order that pg_xlog and WAL archive are polled for WAL segments.Heikki Linnakangas2014-02-14
| | | | | | | | | | | | | | | | | | | If there is a WAL segment with same ID but different TLI present in both the WAL archive and pg_xlog, prefer the one with higher TLI. Before this patch, the archive was polled first, for all expected TLIs, and only if no file was found was pg_xlog scanned. This was a change in behavior from 9.3, which first scanned archive and pg_xlog for the highest TLI, then archive and pg_xlog for the next highest TLI and so forth. This patch reverts the behavior back to what it was in 9.2. The reason for this is that if for example you try to do archive recovery to timeline 2, which branched off timeline 1, but the WAL for timeline 2 is not archived yet, we would replay past the timeline switch point on timeline 1 using the archived files, before even looking timeline 2's files in pg_xlog Report and patch by Kyotaro Horiguchi. Backpatch to 9.3 where the behavior was changed.
* Separate multixact freezing parameters from xid'sAlvaro Herrera2014-02-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we were piggybacking on transaction ID parameters to freeze multixacts; but since there isn't necessarily any relationship between rates of Xid and multixact consumption, this turns out not to be a good idea. Therefore, we now have multixact-specific freezing parameters: vacuum_multixact_freeze_min_age: when to remove multis as we come across them in vacuum (default to 5 million, i.e. early in comparison to Xid's default of 50 million) vacuum_multixact_freeze_table_age: when to force whole-table scans instead of scanning only the pages marked as not all visible in visibility map (default to 150 million, same as for Xids). Whichever of both which reaches the 150 million mark earlier will cause a whole-table scan. autovacuum_multixact_freeze_max_age: when for cause emergency, uninterruptible whole-table scans (default to 400 million, double as that for Xids). This means there shouldn't be more frequent emergency vacuuming than previously, unless multixacts are being used very rapidly. Backpatch to 9.3 where multixacts were made to persist enough to require freezing. To avoid an ABI break in 9.3, VacuumStmt has a couple of fields in an unnatural place, and StdRdOptions is split in two so that the newly added fields can go at the end. Patch by me, reviewed by Robert Haas, with additional input from Andres Freund and Tom Lane.
* In XLogReadBufferExtended, don't assume P_NEW yields consecutive pages.Tom Lane2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | In a database that's not yet reached consistency, it's possible that some segments of a relation are not full-size but are not the last ones either. Because of the way smgrnblocks() works, asking for a new page with P_NEW will fill in the last not-full-size segment --- and if that makes it full size, the apparent EOF of the relation will increase by more than one page, so that the next P_NEW request will yield a page past the next consecutive one. This breaks the relation-extension logic in XLogReadBufferExtended, possibly allowing a page update to be applied to some page far past where it was intended to go. This appears to be the explanation for reports of table bloat on replication slaves compared to their masters, and probably explains some corrupted-slave reports as well. Fix the loop to check the page number it actually got, rather than merely Assert()'ing that dead reckoning got it to the desired place. AFAICT, there are no other places that make assumptions about exactly which page they'll get from P_NEW. Problem identified by Greg Stark, though this is not the same as his proposed patch. It's been like this for a long time, so back-patch to all supported branches.
* Fix WakeupWaiters() to not wake up an exclusive locker unnecessarily.Heikki Linnakangas2014-02-10
| | | | | | | | WakeupWaiters() is supposed to wake up all LW_WAIT_UNTIL_FREE waiters of the slot, but the loop incorrectly also woke up the first LW_EXCLUSIVE waiter, if there was no LW_WAIT_UNTIL_FREE waiters in the queue. Noted by Andres Freund. This code is new in 9.4, so no backpatching.
* Initialize the entryRes array between each call to triConsistent.Heikki Linnakangas2014-02-07
| | | | | | | | | | | | | | | | | | | | | | The shimTriConstistentFn, which calls the opclass's consistent function with all combinations of TRUE/FALSE for any MAYBE argument, modifies the entryRes array passed by the caller. Change startScanKey to re-initialize it between each call to accommodate that. It's actually a bad habit by shimTriConsistentFn to modify its argument. But the only caller that doesn't already re-initialize the entryRes array was startScanKey, and it's easy for startScanKey to do so. Add a comment to shimTriConsistentFn about that. Note: this does not give a free pass to opclass-provided consistent functions to modify the entryRes argument; shimTriConsistent assumes that they don't, even though it does it itself. While at it, refactor startScanKey to allocate the requiredEntries and additionalEntries after it knows exactly how large they need to be. Saves a little bit of memory, and looks nicer anyway. Per complaint by Tom Lane, buildfarm and the pg_trgm regression test.
* Speed up "rare & frequent" type GIN queries.Heikki Linnakangas2014-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | If you have a GIN query like "rare & frequent", we currently fetch all the items that match either rare or frequent, call the consistent function for each item, and let the consistent function filter out items that only match one of the terms. However, if we can deduce that "rare" must be present for the overall qual to be true, we can scan all the rare items, and for each rare item, skip over to the next frequent item with the same or greater TID. That greatly speeds up "rare & frequent" type queries. To implement that, introduce the concept of a tri-state consistent function, where the 3rd value is MAYBE, indicating that we don't know if that term is present. Operator classes only provide a boolean consistent function, so we simulate the tri-state consistent function by calling the boolean function several times, with the MAYBE arguments set to all combinations of TRUE and FALSE. Testing all combinations is only feasible for a small number of MAYBE arguments, but it is envisioned that we'll provide a way for operator classes to provide a native tri-state consistent function, which can be much more efficient. But that is not included in this patch. We were already using that trick to for lossy pages, calling the consistent function with the lossy entry set to TRUE and FALSE. Now that we have the tri-state consistent function, use it for lossy pages too. Alexander Korotkov, with fair amount of refactoring by me.
* Remove unnecessary relcache flushes after changing btree metapages.Tom Lane2014-02-05
| | | | | | | | | | | | | | | | | | | | | These flushes were added in my commit d2896a9ed, which added the btree logic that keeps a cached copy of the index metapage data in index relcache entries. The idea was to ensure that other backends would promptly update their cached copies after a change. However, this is not really necessary, since _bt_getroot() has adequate defenses against believing a stale root page link, and _bt_getrootheight() doesn't have to be 100% right. Moreover, if it were necessary, a relcache flush would be an unreliable way to do it, since the sinval mechanism believes that relcache flush requests represent transactional updates, and therefore discards them on transaction rollback. Therefore, we might as well drop these flush requests and save the time to rebuild the whole relcache entry after a metapage change. If we ever try to support in-place truncation of btree indexes, it might be necessary to revisit this issue so that _bt_getroot() can't get caught by trying to follow a metapage link to a page that no longer exists. A possible solution to that is to make use of an smgr, rather than relcache, inval request to force other backends to discard their cached metapages. But for the moment this is not worth pursuing.
* Add primary_slotname to recovery.conf.sample.Fujii Masao2014-02-03
|
* Introduce replication slots.Robert Haas2014-01-31
| | | | | | | | | | | | | | | | Replication slots are a crash-safe data structure which can be created on either a master or a standby to prevent premature removal of write-ahead log segments needed by a standby, as well as (with hot_standby_feedback=on) pruning of tuples whose removal would cause replication conflicts. Slots have some advantages over existing techniques, as explained in the documentation. In a few places, we refer to the type of replication slots introduced by this patch as "physical" slots, because forthcoming patches for logical decoding will also have slots, but with somewhat different properties. Andres Freund and Robert Haas
* Further optimize GIN multi-key searches.Heikki Linnakangas2014-01-29
| | | | | | | | When skipping over some items in a posting tree, re-find the new location by descending the tree from root, rather than walking the right links. This can save a lot of I/O. Heavily modified from Alexander Korotkov's fast scan patch.
* Further optimize multi-key GIN searches.Heikki Linnakangas2014-01-29
| | | | | | | If we're skipping past a certain TID, avoid decoding posting list segments that only contain smaller TIDs. Extracted from Alexander Korotkov's fast scan patch, heavily modified.
* Allow skipping some items in a multi-key GIN search.Heikki Linnakangas2014-01-29
| | | | | | | | | | In a multi-key search, ie. something like "col @> 'foo' AND col @> 'bar'", as soon as we find the next item that matches the first criteria, we don't need to check the second criteria for TIDs smaller the first match. That saves a lot of effort, especially if one of the terms is rare, while the second occurs very frequently. Based on ideas from Alexander Korotkov's fast scan patch.
* Revert C comment change in slot_attisnull()Bruce Momjian2014-01-28
| | | | Revert 89774b58b0ea2874765cae10c094bb6aaf707feb
* Relax the requirement that all lwlocks be stored in a single array.Robert Haas2014-01-27
| | | | | | | | | | | | | | This makes it possible to store lwlocks as part of some other data structure in the main shared memory segment, or in a dynamic shared memory segment. There is still a main LWLock array and this patch does not move anything out of it, but it provides necessary infrastructure for doing that in the future. This change is likely to increase the size of LWLockPadded on some platforms, especially 32-bit platforms where it was previously only 16 bytes. Patch by me. Review by Andres Freund and KaiGai Kohei.
* Adjust C comment in slot_attisnull() regarding nulls.Bruce Momjian2014-01-25
|
* Add recovery_target='immediate' option.Heikki Linnakangas2014-01-25
| | | | | | | | This allows ending recovery as a consistent state has been reached. Without this, there was no easy way to e.g restore an online backup, without replaying any extra WAL after the backup ended. MauMau and me.
* Reset unused fields in GIN data leaf page footer.Heikki Linnakangas2014-01-24
| | | | | | | The maxoff field is not used in the new, compressed page format. Let's reset it when converting an old-format page to the new format. The code won't care either way, but this makes it possible to use the field for something else in the future.
* Fix off-by-one in newly-introdcued GIN assertion.Heikki Linnakangas2014-01-24
| | | | Spotted by Alexander Korotkov
* In GIN recompression code, use mmemove rather than memcpy, for vacuum.Heikki Linnakangas2014-01-24
| | | | | | | | | | When vacuuming a data leaf page, any compressed posting lists that are not modified, are copied back to the buffer from a later location in the same buffer rather than from a palloc'd copy. IOW, they are just moved downwards in the same buffer. Because the source and destination addresses can overlap, we must use memmove rather than memcpy. Report and fix by Alexander Korotkov.
* Allow use of "z" flag in our printf calls, and use it where appropriate.Tom Lane2014-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | Since C99, it's been standard for printf and friends to accept a "z" size modifier, meaning "whatever size size_t has". Up to now we've generally dealt with printing size_t values by explicitly casting them to unsigned long and using the "l" modifier; but this is really the wrong thing on platforms where pointers are wider than longs (such as Win64). So let's start using "z" instead. To ensure we can do that on all platforms, teach src/port/snprintf.c to understand "z", and add a configure test to force use of that implementation when the platform's version doesn't handle "z". Having done that, modify a bunch of places that were using the unsigned-long hack to use "z" instead. This patch doesn't pretend to have gotten everyplace that could benefit, but it catches many of them. I made an effort in particular to ensure that all uses of the same error message text were updated together, so as not to increase the number of translatable strings. It's possible that this change will result in format-string warnings from pre-C99 compilers. We might have to reconsider if there are any popular compilers that will warn about this; but let's start by seeing what the buildfarm thinks. Andres Freund, with a little additional work by me
* Fix alignment of GIN in-line posting lists stored in entry tuples.Heikki Linnakangas2014-01-23
| | | | | | | | | | | | The Sparc machines in the buildfarm are crashing because of misaligned access to posting lists stored in entry tuples. I accidentally removed a critical SHORTALIGN() from ginFormTuple, as part of the packed posting lists patch. Perhaps I thought it was unnecessary, because the index_form_tuple() call above the SHORTALIGN already aligned the size, missing the fact that the null-category byte makes it misaligned again (I think the SHORTALIGN is indeed unnecessary if there's no null- category byte, but let's just play it safe...)
* Silence compiler warning.Heikki Linnakangas2014-01-23
| | | | Not all compilers understand that elog(ERROR, ...) never returns.
* Fix declaration of GinVacuumState.Heikki Linnakangas2014-01-22
| | | | | gcc 4.8 was happy with having a duplicate typedef, but most compilers seem not to be, per buildfarm.
* Compress GIN posting lists, for smaller index size.Heikki Linnakangas2014-01-22
| | | | | | | | | | | | | | | | | | | | | GIN posting lists are now encoded using varbyte-encoding, which allows them to fit in much smaller space than the straight ItemPointer array format used before. The new encoding is used for both the lists stored in-line in entry tree items, and in posting tree leaf pages. To maintain backwards-compatibility and keep pg_upgrade working, the code can still read old-style pages and tuples. Posting tree leaf pages in the new format are flagged with GIN_COMPRESSED flag, to distinguish old and new format pages. Likewise, entry tree tuples in the new format have a GIN_ITUP_COMPRESSED flag set in a bit that was previously unused. This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with version 9.4 will therefore have version number 2 in the metapage, while old pg_upgraded indexes will have version 1. The code treats them the same, but it might be come handy in the future, if we want to drop support for the uncompressed format. Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.
* Fix missing parentheses resulting in wrong order of dereference.Robert Haas2014-01-15
| | | | | | This could result in referencing uninitialized memory. Michael Paquier, in response to a complaint from Andres Freund
* Fix multiple bugs in index page locking during hot-standby WAL replay.Tom Lane2014-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ordinary operation, VACUUM must be careful to take a cleanup lock on each leaf page of a btree index; this ensures that no indexscans could still be "in flight" to heap tuples due to be deleted. (Because of possible index-tuple motion due to concurrent page splits, it's not enough to lock only the pages we're deleting index tuples from.) In Hot Standby, the WAL replay process must likewise lock every leaf page. There were several bugs in the code for that: * The replay scan might come across unused, all-zero pages in the index. While btree_xlog_vacuum itself did the right thing (ie, nothing) with such pages, xlogutils.c supposed that such pages must be corrupt and would throw an error. This accounts for various reports of replication failures with "PANIC: WAL contains references to invalid pages". To fix, add a ReadBufferMode value that instructs XLogReadBufferExtended not to complain when we're doing this. * btree_xlog_vacuum performed the extra locking if standbyState == STANDBY_SNAPSHOT_READY, but that's not the correct test: we won't open up for hot standby queries until the database has reached consistency, and we don't want to do the extra locking till then either, for fear of reading corrupted pages (which bufmgr.c would complain about). Fix by exporting a new function from xlog.c that will report whether we're actually in hot standby replay mode. * To ensure full coverage of the index in the replay scan, btvacuumscan would emit a dummy WAL record for the last page of the index, if no vacuuming work had been done on that page. However, if the last page of the index is all-zero, that would result in corruption of said page, since the functions called on it weren't prepared to handle that case. There's no need to lock any such pages, so change the logic to target the last normal leaf page instead. The first two of these bugs were diagnosed by Andres Freund, the other one by me. Fixes based on ideas from Heikki Linnakangas and myself. This has been wrong since Hot Standby was introduced, so back-patch to 9.0.
* Accept pg_upgraded tuples during multixact freezingAlvaro Herrera2014-01-10
| | | | | | | | | | | | | | | | | | | | The new MultiXact freezing routines introduced by commit 8e9a16ab8f7 neglected to consider tuples that came from a pg_upgrade'd database; a vacuum run that tried to freeze such tuples would die with an error such as ERROR: MultiXactId 11415437 does no longer exist -- apparent wraparound To fix, ensure that GetMultiXactIdMembers is allowed to return empty multis when the infomask bits are right, as is done in other callsites. Per trouble report from F-Secure. In passing, fix a copy&paste bug reported by Andrey Karpov from VIVA64 from their PVS-Studio static checked, that instead of setting relminmxid to Invalid, we were setting relfrozenxid twice. Not an important mistake because that code branch is about relations for which we don't use the frozenxid/minmxid values at all in the first place, but seems to warrants a fix nonetheless.
* Refactor checking whether we've reached the recovery target.Heikki Linnakangas2014-01-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Makes the replay loop slightly more readable, by separating the concerns of whether to stop and whether to delay, and how to extract the timestamp from a record. This has the user-visible change that the timestamp of the last applied record is now updated after actually applying it. Before, it was updated just before applying it. That meant that pg_last_xact_replay_timestamp() could return the timestamp of a commit record that is in process of being replayed, but not yet applied. Normally the difference is small, but if min_recovery_apply_delay is set, there could be a significant delay between reading a record and applying it. Another behavioral change is that if you recover to a restore point, we stop after the restore point record, not before it. It makes no difference as far as running queries on the server is concerned, as applying a restore point record changes nothing, but if examine the timeline history you will see that the new timeline branched off just after the restore point record, not before it. One practical consequence is that if you do PITR to the new timeline, and set recovery target to the same named restore point again, it will find and stop recovery at the same restore point. Conceptually, I think it makes more sense to consider the restore point as part of the new timeline's history than not. In principle, setting the last-replayed timestamp before actually applying the record was a bug all along, but it doesn't seem worth the risk to backpatch, since min_recovery_apply_delay was only added in 9.4.
* Fix pause_at_recovery_target + recovery_target_inclusive combination.Heikki Linnakangas2014-01-08
| | | | | | | | | | | | If pause_at_recovery_target is set, recovery pauses *before* applying the target record, even if recovery_target_inclusive is set. If you then continue with pg_xlog_replay_resume(), it will apply the target record before ending recovery. In other words, if you log in while it's paused and verify that the database looks OK, ending recovery changes its state again, possibly destroying data that you were tring to salvage with PITR. Backpatch to 9.1, this has been broken since pause_at_recovery_target was added.
* If multiple recovery_targets are specified, use the latest one.Heikki Linnakangas2014-01-08
| | | | | | | | | | | | | The docs say that only one of recovery_target_xid, recovery_target_time, or recovery_target_name can be specified. But the code actually did something different, so that a name overrode time, and xid overrode both time and name. Now the target specified last takes effect, whether it's an xid, time or name. With this patch, we still accept multiple recovery_target settings, even though docs say that only one can be specified. It's a general property of the recovery.conf file parser that you if you specify the same option twice, the last one takes effect, like with postgresql.conf.
* Fix bug in determining when recovery has reached consistency.Heikki Linnakangas2014-01-08
| | | | | | | | | | | | | | | | | | | | | | | | | | When starting WAL replay from an online checkpoint, the last replayed WAL record variable was initialized using the checkpoint record's location, even though the records between the REDO location and the checkpoint record had not been replayed yet. That was noted as "slightly confusing" but harmless in the comment, but in some cases, it fooled CheckRecoveryConsistency to incorrectly conclude that we had already reached a consistent state immediately at the beginning of WAL replay. That caused the system to accept read-only connections in hot standby mode too early, and also PANICs with message "WAL contains references to invalid pages". Fix by initializing the variables to the REDO location instead. In 9.2 and above, change CheckRecoveryConsistency() to use lastReplayedEndRecPtr variable when checking if backup end location has been reached. It was inconsistently using EndRecPtr for that check, but lastReplayedEndRecPtr when checking min recovery point. It made no difference before this patch, because in all the places where CheckRecoveryConsistency was called the two variables were the same, but it was always an accident waiting to happen, and would have been wrong after this patch anyway. Report and analysis by Tomonari Katsumata, bug #8686. Backpatch to 9.0, where hot standby was introduced.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Move permissions check from do_pg_start_backup to pg_start_backupMagnus Hagander2014-01-07
| | | | | | | | | | And the same for do_pg_stop_backup. The code in do_pg_* is not allowed to access the catalogs. For manual base backups, the permissions check can be handled in the calling function, and for streaming base backups only users with the required permissions can get past the authentication step in the first place. Reported by Antonin Houska, diagnosed by Andres Freund
* Add more use of psprintf()Peter Eisentraut2014-01-06
|
* Handle 5-char filenames in SlruScanDirectoryAlvaro Herrera2014-01-02
| | | | | | | | | | | | | | | | | | Original users of slru.c were all producing 4-digit filenames, so that was all that that code was prepared to handle. Changes to multixact.c in the course of commit 0ac5ad5134f made pg_multixact/members create 5-digit filenames once a certain threshold was reached, which SlruScanDirectory wasn't prepared to deal with; in particular, 5-digit-name files were not removed during truncation. Change that routine to make it aware of those files, and have it process them just like any others. Right now, some pg_multixact/members directories will contain a mixture of 4-char and 5-char filenames. A future commit is expected fix things so that each slru.c user declares the correct maximum width for the files it produces, to avoid such unsightly mixtures. Noticed while investigating bug #8673 reported by Serge Negodyuck.
* Wrap multixact/members correctly during extensionAlvaro Herrera2014-01-02
| | | | | | | | | | | | | | In the 9.2 code for extending multixact/members, the logic was very simple because the number of entries in a members page was a proper divisor of 2^32, and thus at 2^32 wraparound the logic for page switch was identical than at any other page boundary. In commit 0ac5ad5134f I failed to realize this and introduced code that was not able to go over the 2^32 boundary. Fix that by ensuring that when we reach the last page of the last segment we correctly zero the initial page of the initial segment, using correct uint32-wraparound-safe arithmetic. Noticed while investigating bug #8673 reported by Serge Negodyuck, as diagnosed by Andres Freund.
* Handle wraparound during truncation in multixact/membersAlvaro Herrera2014-01-02
| | | | | | | | | | | | | | | | | | | | | | | In pg_multixact/members, relying on modulo-2^32 arithmetic for wraparound handling doesn't work all that well. Because we don't explicitely track wraparound of the allocation counter for members, it is possible that the "live" area exceeds 2^31 entries; trying to remove SLRU segments that are "old" according to the original logic might lead to removal of segments still in use. To fix, have the truncation routine use a tailored SlruScanDirectory callback that keeps track of the live area in actual use; that way, when the live range exceeds 2^31 entries, the oldest segments still live will not get removed untimely. This new SlruScanDir callback needs to take care not to remove segments that are "in the future": if new SLRU segments appear while the truncation is ongoing, make sure we don't remove them. This requires examination of shared memory state to recheck for false positives, but testing suggests that this doesn't cause a problem. The original coding didn't suffer from this pitfall because segments created when truncation is running are never considered to be removable. Per Andres Freund's investigation of bug #8673 reported by Serge Negodyuck.
* Aggressively freeze tables when CLUSTER or VACUUM FULL rewrites them.Robert Haas2014-01-02
| | | | | | | | | We haven't wanted to do this in the past on the grounds that in rare cases the original xmin value will be needed for forensic purposes, but commit 37484ad2aacef5ec794f4dd3d5cf814475180a78 removes that objection, so now we can. Per extensive discussion, among many people, on pgsql-hackers.
* Rename walLogHints to wal_log_hints for easier grepping.Robert Haas2014-01-01
| | | | Michael Paquier
* Revise documentation for new freezing method.Robert Haas2013-12-23
| | | | | | | | Commit 37484ad2aacef5ec794f4dd3d5cf814475180a78 invalidated a good chunk of documentation, so patch it up to reflect the new state of play. Along the way, patch remaining documentation references to FrozenXID to say instead FrozenTransactionId, so that they match the way we actually spell it in the code.