postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
...
*	Add buffer_std flag to MarkBufferDirtyHint().	Jeff Davis	2013-06-17
\| \| \| \| \| \| \| \| \| \|	MarkBufferDirtyHint() writes WAL, and should know if it's got a standard buffer or not. Currently, the only callers where buffer_std is false are related to the FSM. In passing, rename XLOG_HINT to XLOG_FPI, which is more descriptive. Back-patch to 9.3.
*	Avoid deadlocks during insertion into SP-GiST indexes.	Tom Lane	2013-06-14
\| \| \| \| \| \| \| \| \| \| \| \| \|	SP-GiST's original scheme for avoiding deadlocks during concurrent index insertions doesn't work, as per report from Hailong Li, and there isn't any evident way to make it work completely. We could possibly lock individual inner tuples instead of their whole pages, but preliminary experimentation suggests that the performance penalty would be huge. Instead, if we fail to get a buffer lock while descending the tree, just restart the tree descent altogether. We keep the old tuple positioning rules, though, in hopes of reducing the number of cases where this can happen. Teodor Sigaev, somewhat edited by Tom Lane
*	Remove special-case treatment of LOG severity level in standalone mode.	Tom Lane	2013-06-13
\| \| \| \| \| \| \| \| \| \| \| \| \|	elog.c has historically treated LOG messages as low-priority during bootstrap and standalone operation. This has led to confusion and even masked a bug, because the normal expectation of code authors is that elog(LOG) will put something into the postmaster log, and that wasn't happening during initdb. So get rid of the special-case rule and make the priority order the same as it is in normal operation. To keep from cluttering initdb's output and the behavior of a standalone backend, tweak the severity level of three messages routinely issued by xlog.c during startup and shutdown so that they won't appear in these cases. Per my proposal back in December.
*	Observe array length in HaveVirtualXIDsDelayingChkpt().	Noah Misch	2013-06-12
\| \| \| \| \| \| \| \|	Since commit f21bb9cfb5646e1793dcc9c0ea697bab99afa523, this function ignores the caller-provided length and loops until it finds a terminator, which GetVirtualXIDsDelayingChkpt() never adds. Restore the previous loop control logic. In passing, revert the addition of an unused variable by the same commit, presumably a debugging relic.
*	Fix typo in comment.	Heikki Linnakangas	2013-06-06
\|
*	Additional spelling corrections	Stephen Frost	2013-06-03
\| \| \| \| \| \|	A few more minor spelling corrections, no functional changes. Thom Brown
*	Code review of recycling WAL segments in a restartpoint.	Heikki Linnakangas	2013-06-03
\| \| \| \| \| \| \| \|	Seems cleaner to get the currently-replayed TLI in the same call to GetXLogReplayRecPtr that we get the WAL position. Make it more clear in the comment what the code does when recovery has already ended (RecoveryInProgress() will set ThisTimeLineID in that case). Finally, make resetting ThisTimeLineID afterwards more explicit.
*	Minor spelling fixes	Stephen Frost	2013-06-01
\| \| \| \| \| \|	Fix a few spelling mistakes. Per bug report #8193 from Lajos Veres.
*	Post-pgindent cleanup	Stephen Frost	2013-06-01
\| \| \| \| \| \| \| \| \| \|	Make slightly better decisions about indentation than what pgindent is capable of. Mostly breaking out long function calls into one line per argument, with a few other minor adjustments. No functional changes- all whitespace. pgindent ran cleanly (didn't change anything) after. Passes all regressions.
*	pgindent run for release 9.3	Bruce Momjian	2013-05-29
\| \| \| \| \|	This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
*	After fast promotion use CHECKPOINT_FORCE	Simon Riggs	2013-05-21
\| \| \| \| \| \| \|	Not necessary for correctness, just to make log_checkpoints output look less singular. Requested by Fujii Masao
*	Maintain ThisTimeLineID correctly in checkpointer	Simon Riggs	2013-05-21
\| \| \| \| \| \| \| \| \| \| \| \|	checkpointer needs to reset ThisTimeLineID after a restartpoint to allow installing/recycling new WAL files. If recovery has already ended this would leave ThisTimeLineID set incorrectly and so we must reset it otherwise later checkpoints do not have the correct timeline. Bug report by Heikki Linnakangas. Further investigation by Heikki and myself.
*	Init crash recovery using the latest available TLI	Simon Riggs	2013-05-19
\| \| \| \| \| \| \| \|	This simplifies the handling of crashes after fast promotion and various minor cases that can exist in short timing windows around that case. Broad fix to bug reported by Michael Paquier on -hackers, approach prompted by Heikki Linnakangas
*	Emit msg correctly for timeline-crossing crash	Simon Riggs	2013-05-19
\|
*	Remove single space on end of a line in xlog.c	Simon Riggs	2013-05-19
\| \| \| \|	Michael Paquier
*	Fix handling of OID wraparound while in standalone mode.	Tom Lane	2013-05-13
\| \| \| \| \| \| \| \| \| \| \|	If OID wraparound should occur while in standalone mode (unlikely but possible), we want to advance the counter to FirstNormalObjectId not FirstBootstrapObjectId. Otherwise, user objects might be created with OIDs in the system-reserved range. That isn't immediately harmful but it poses a risk of conflicts during future pg_upgrade operations. Noted by Andres Freund. Back-patch to all supported branches, since all of them are supported sources for pg_upgrade operations.
*	Fix management of fn_extra caching during repeated GiST index scans.	Tom Lane	2013-05-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit d22a09dc70f9830fa78c1cd1a3a453e4e473d354 introduced official support for GiST consistentFns that want to cache data using the FmgrInfo fn_extra pointer: the idea was to preserve the cached values across gistrescan(), whereas formerly they'd been leaked. However, there was an oversight in that, namely that multiple scan keys might reference the same column's consistentFn; the code would result in propagating the same cache value into multiple scan keys, resulting in crashes or wrong answers. Use a separate array instead to ensure that each scan key keeps its own state. Per bug #8143 from Joel Roller. Back-patch to 9.2 where the bug was introduced.
*	Fix walsender failure at promotion.	Heikki Linnakangas	2013-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a standby server has a cascading standby server connected to it, it's possible that WAL has already been sent up to the next WAL page boundary, splitting a WAL record in the middle, when the first standby server is promoted. Don't throw an assertion failure or error in walsender if that happens. Also, fix a variant of the same bug in pg_receivexlog: if it had already received WAL on previous timeline up to a segment boundary, when the upstream standby server is promoted so that the timeline switch record falls on the previous segment, pg_receivexlog would miss the segment containing the timeline switch. To fix that, have walsender send the position of the timeline switch at end-of-streaming, in addition to the next timeline's ID. It was previously assumed that the switch happened exactly where the streaming stopped. Note: this is an incompatible change in the streaming protocol. You might get an error if you try to stream over timeline switches, if the client is running 9.3beta1 and the server is more recent. It should be fine after a reconnect, however. Reported by Fujii Masao.
*	Use the term "radix tree" instead of "suffix tree" for SP-GiST text opclass.	Heikki Linnakangas	2013-05-08
\| \| \| \| \| \| \|	What we have implemented is a radix tree (or a radix trie or a patricia trie), but the docs and code comments incorrectly called it a "suffix tree". Alexander Korotkov
*	Record data_checksum_version in control file.	Simon Riggs	2013-04-30
\| \| \| \| \| \|	The value is not used anywhere in code, but will allow future changes to the checksum version should that become necessary in the future.
*	Make fast promotion the default promotion mode.	Simon Riggs	2013-04-24
\| \| \| \| \|	Continue to allow a request for synchronous checkpoints as a mechanism in case of problems.
*	Remove some unused and seldom used fields from RelationAmInfo.	Heikki Linnakangas	2013-04-16
\| \| \| \| \| \| \|	This saves some memory from each index relcache entry. At least on a 64-bit machine, it saves just enough to shrink a typical relcache entry's memory usage from 2k to 1k. That's nice if you have a lot of backends and a lot of indexes.
*	Remove duplicate initialization in XLogReadRecord.	Robert Haas	2013-04-09
\| \| \| \|	Per a note from Dickson S. Guedes.
*	Fix calculation of how many segments to retain for wal_keep_segments.	Heikki Linnakangas	2013-04-08
\| \| \| \| \| \| \|	KeepLogSeg function was broken when we switched to use a 64-bit int for the segment number. Per report from Jeff Janes.
*	Skip extraneous locking in XLogCheckBuffer().	Simon Riggs	2013-04-08
\| \| \| \| \| \| \|	Heikki reported comment was wrong, so fixed code to match the comment: we only need to take additional locking precautions when we have a shared lock on the buffer.
*	Avoid tricky race condition recording XLOG_HINT	Simon Riggs	2013-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We copy the buffer before inserting an XLOG_HINT to avoid WAL CRC errors caused by concurrent hint writes to buffer while share locked. To make this work we refactor RestoreBackupBlock() to allow an XLOG_HINT to avoid the normal path for backup blocks, which assumes the underlying buffer is exclusive locked. Resulting code completely changes layout of XLOG_HINT WAL records, but this isn't even beta code, so this is a low impact change. In passing, avoid taking WALInsertLock for full page writes on checksummed hints, remove related cruft from XLogInsert() and improve xlog_desc record for XLOG_HINT. Andres Freund Bug report by Fujii Masao, testing by Jeff Janes and Jaime Casanova, review by Jeff Davis and Simon Riggs. Applied with changes from review and some comment editing.
*	Fix checksums for CLUSTER, VACUUM FULL etc.	Simon Riggs	2013-04-07
\| \| \| \| \| \| \| \| \|	In CLUSTER, VACUUM FULL and ALTER TABLE SET TABLESPACE I erroneously set checksum before log_newpage, which sets the LSN and invalidates the checksum. So set checksum immediately after log_newpage. Bug report Fujii Masao, Fix and patch by Jeff Davis
*	Make REPLICATION privilege checks test current user not authenticated user.	Tom Lane	2013-04-01
\| \| \| \| \| \| \| \| \| \| \|	The pg_start_backup() and pg_stop_backup() functions checked the privileges of the initially-authenticated user rather than the current user, which is wrong. For example, a user-defined index function could successfully call these functions when executed by ANALYZE within autovacuum. This could allow an attacker with valid but low-privilege database access to interfere with creation of routine backups. Reported and fixed by Noah Misch. Security: CVE-2013-1901
*	Revoke bc5334d8679c428a709d150666b288171795bd76	Simon Riggs	2013-03-28
\|
*	Fix buffer pin leak in heap update redo routine.	Heikki Linnakangas	2013-03-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a heap update, if the old and new tuple were on different pages, and the new page no longer existed (because it was subsequently truncated away by vacuum), heap_xlog_update forgot to release the pin on the old buffer. This bug was introduced by the "Fix multiple problems in WAL replay" patch, commit 3bbf668de9f1bc172371681e80a4e769b6d014c8 (on master branch). With full_page_writes=off, this triggered an "incorrect local pin count" error later in replay, if the old page was vacuumed. This fixes bug #7969, reported by Yunong Xiao. Backpatch to 9.0, like the commit that introduced this bug.
*	Allow external recovery_config_directory	Simon Riggs	2013-03-27
\| \| \| \| \|	If required, recovery.conf can now be located outside of the data directory. Server needs read/write permissions on this directory.
*	Fix grammatical errors in some new message strings.	Tom Lane	2013-03-26
\| \| \| \|	Daniele Varrazzo
*	Allow I/O reliability checks using 16-bit checksums	Simon Riggs	2013-03-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Checksums are set immediately prior to flush out of shared buffers and checked when pages are read in again. Hint bit setting will require full page write when block is dirtied, which causes various infrastructure changes. Extensive comments, docs and README. WARNING message thrown if checksum fails on non-all zeroes page; ERROR thrown but can be disabled with ignore_checksum_failure = on. Feature enabled by an initdb option, since transition from option off to option on is long and complex and has not yet been implemented. Default is not to use checksums. Checksum used is WAL CRC-32 truncated to 16-bits. Simon Riggs, Jeff Davis, Greg Smith Wide input and assistance from many community members. Thank you.
*	Remove PageSetTLI and rename pd_tli to pd_checksum	Simon Riggs	2013-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove use of PageSetTLI() from all page manipulation functions and adjust README to indicate change in the way we make changes to pages. Repurpose those bytes into the pd_checksum field and explain how that works in comments about page header. Refactoring ahead of actual feature patch which would make use of the checksum field, arriving later. Jeff Davis, with comments and doc changes by Simon Riggs Direction suggested by Robert Haas; many others providing review comments.
*	Move pqsignal() to libpgport.	Tom Lane	2013-03-17
\| \| \| \| \| \| \| \| \|	We had two copies of this function in the backend and libpq, which was already pretty bogus, but it turns out that we need it in some other programs that don't use libpq (such as pg_test_fsync). So put it where it probably should have been all along. The signal-mask-initialization support in src/backend/libpq/pqsignal.c stays where it is, though, since we only need that in the backend.
*	Fix tli history file fetching, broken by the archive after crash recevery patch.	Heikki Linnakangas	2013-03-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we were about to enter archive recovery after crash recovery, we scanned the archive for the latest tli history file, and set the recovery target timeline to that. However, when we actually tried to read the history file, we would not fetch the file from the archive, because we were not in archive recovery yet. To fix, make readTimeLineHistory and existsTimeLineHistory to always fetch the file from archive if archive recovery is requested, even if we're not in archive recovery yet. Backpatch to 9.2. Mitsumasa KONDO
*	Add a materialized view relations.	Kevin Grittner	2013-03-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A materialized view has a rule just like a view and a heap and other physical properties like a table. The rule is only used to populate the table, references in queries refer to the materialized data. This is a minimal implementation, but should still be useful in many cases. Currently data is only populated "on demand" by the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements. It is expected that future releases will add incremental updates with various timings, and that a more refined concept of defining what is "fresh" data will be developed. At some point it may even be possible to have queries use a materialized in place of references to underlying tables, but that requires the other above-mentioned features to be working first. Much of the documentation work by Robert Haas. Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja Security review by KaiGai Kohei, with a decision on how best to implement sepgsql still pending.
*	Fix SQL function execution to be safe with long-lived FmgrInfos.	Tom Lane	2013-03-03
\| \| \| \| \| \| \| \| \| \| \| \|	fmgr_sql had been designed on the assumption that the FmgrInfo it's called with has only query lifespan. This is demonstrably unsafe in connection with range types, as shown in bug #7881 from Andrew Gierth. Fix things so that we re-generate the function's cache data if the (sub)transaction it was made in is no longer active. Back-patch to 9.2. This might be needed further back, but it's not clear whether the case can realistically arise without range types, so for now I'll desist from back-patching further.
*	Fix thinko in previous commit.	Heikki Linnakangas	2013-02-22
\| \| \| \| \| \|	We must still initialize minRecoveryPoint if we start straight with archive recovery, e.g when recovering from a normal base backup taken with pg_start/stop_backup. Otherwise we never consider the system consistent.
*	If recovery.conf is created after "pg_ctl stop -m i", do crash recovery.	Heikki Linnakangas	2013-02-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If you create a base backup using an atomic filesystem snapshot, and try to perform PITR starting from that base backup, or if you just kill a master server and create recovery.conf to put it into standby mode, we don't know how far we need to recover before reaching consistency. Normally in crash recovery, we replay all the WAL present in pg_xlog, and assume that we're consistent after that. And normally in archive recovery, minRecoveryPoint, backupEndRequired, or backupEndPoint is set in the control file, indicating how far we need to replay to reach consistency. But if the server was previously up and running normally, and you kill -9 it or take an atomic filesystem snapshot, none of those fields are set in the control file. The solution is to perform crash recovery first, replaying all the WAL in pg_xlog. After that's done, we assume that the system is consistent like in normal crash recovery, and switch to archive recovery mode after that. Per report from Kyotaro HORIGUCHI. In his scenario, recovery.conf was created after "pg_ctl stop -m i". I'm not sure we need to support that exact scenario, but we should support backing up using a filesystem snapshot, which looks identical. This issue goes back to at least 9.0, where hot standby was introduced and we started to track when consistency is reached. In 9.1 and 9.2, we would open up for hot standby too early, and queries could briefly see an inconsistent state. But 9.2 made it more visible, as we started to PANIC if we see a reference to a non-existing page during recovery, if we've already reached consistency. This is a fairly big patch, so back-patch to 9.2 only, where the issue is more visible. We can consider back-patching further after this has received some more testing in 9.2 and master.
*	Move relpath() to libpgcommon	Alvaro Herrera	2013-02-21
\| \| \| \| \| \| \|	This enables non-backend code, such as pg_xlogdump, to use it easily. The previous location, in src/backend/catalog/catalog.c, made that essentially impossible because that file depends on many backend-only facilities; so this needs to live separately.
*	Better fix for "unarchived WAL files get deleted on crash recovery" bug.	Heikki Linnakangas	2013-02-15
\| \| \| \| \| \| \| \| \| \|	Revert my earlier fix for the bug that unarchived WAL files get deleted on crash recovery, commit c9cc7e05c6d82a9781883a016c70d95aa4923122. We create a .done file for files streamed or restored from archive, so the WAL file recycling logic used during normal operation works just as well during archive recovery. Per Fujii Masao's suggestion.
*	Force archive_status of .done for xlogs created by dearchival/replication.	Simon Riggs	2013-02-15
\| \| \| \| \| \| \| \| \|	This is a forward-patch of commit 6f4b8a4f4f7a2d683ff79ab59d3693714b965e3d, applied to 9.2 back in August. The plan was to do something else in master, but it looks like it's not going to happen, so let's just apply the 9.2 solution to master as well. Fujii Masao
*	Don't delete unarchived WAL files during crash recovery.	Heikki Linnakangas	2013-02-15
\| \| \| \| \|	Bug reported by Jehan-Guillaume (ioguix) de Rorthais. This was introduced with the change to keep WAL files restored from archive in pg_xlog, in 9.2.
*	Invent pre-commit/pre-prepare/pre-subcommit events for xact callbacks.	Tom Lane	2013-02-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently it's only possible for loadable modules to get control during post-commit cleanup of a transaction. That doesn't work too well if they want to do something that could throw an error; for example, an FDW might need to issue a remote commit, which could well fail. To improve matters, extend the existing APIs for XactCallback and SubXactCallback functions to provide new pre-commit events for this purpose. The release notes will need to mention that existing callback functions should be checked to make sure they don't do something unwanted when one of the new event types occurs. In the examples within our source tree, contrib/sepgsql was fine but plpgsql had been a bit too cute.
*	Support unlogged GiST index.	Heikki Linnakangas	2013-02-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The reason this wasn't supported before was that GiST indexes need an increasing sequence to detect concurrent page-splits. In a regular WAL- logged GiST index, the LSN of the page-split record is used for that purpose, and in a temporary index, we can get away with a backend-local counter. Neither of those methods works for an unlogged relation. To provide such an increasing sequence of numbers, create a "fake LSN" counter that is saved and restored across shutdowns. On recovery, unlogged relations are blown away, so the counter doesn't need to survive that either. Jeevan Chalke, based on discussions with Robert Haas, Tom Lane and me.
*	Fix checkpoint after fast promotion.	Heikki Linnakangas	2013-02-11
\| \| \| \| \| \| \| \| \| \|	The intention was to request a regular online checkpoint immediately after end of recovery, when performing "fast promotion". However, because the checkpoint was requested before other backends were allowed to write WAL, the checkpointer process performed a restartpoint rather than a checkpoint. Delay the RequestCheckPoint call until after recovery has truly ended, so that you get a real checkpoint.
*	Include previous TLI in end-of-recovery and shutdown checkpoint records.	Heikki Linnakangas	2013-02-11
\| \| \| \| \| \|	This isn't used for anything but a sanity check at the moment, but it could be highly valuable for debugging purposes. It could also be used to recreate timeline history by traversing WAL, which seems useful.
*	Further cleanup of gistsplit.c.	Tom Lane	2013-02-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After further reflection I was unconvinced that the existing coding is guaranteed to return valid union datums in every code path for multi-column indexes. Fix that by forcing a gistunionsubkey() call at the end of the recursion. Having done that, we can remove some clearly-redundant calls elsewhere. This should be a little faster for multi-column indexes (since the previous coding would uselessly do such a call for each column while unwinding the recursion), as well as much harder to break. Also, simplify the handling of cases where one side or the other of a primary split contains only don't-care tuples. The previous coding used a very ugly hack in removeDontCares() that essentially forced one random tuple to be treated as non-don't-care, providing a random initial choice of seed datum for the secondary split. It seems unlikely that that method will give better-than-random splits. Instead, treat such a split as degenerate and just let the next column determine the split, the same way that we handle fully degenerate cases where the two sides produce identical union datums.
*	Remove useless picksplit-doesn't-support-secondary-split log spam.	Tom Lane	2013-02-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This LOG message was put in over five years ago with the evident expectation that we'd make all GiST opclasses support secondary split directly. However, no such thing ever happened, and indeed the number of opclasses supporting it decreased to zero in 9.2. The reason is that improving on the default implementation isn't that easy --- the opclass-specific code that did exist, before 9.2, doesn't appear to have been any improvement over the default. Hence, remove the message altogether. There's certainly no point in nagging users about this in released branches, but I doubt that we'll ever implement complete opclass-specific support anyway.