postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Fix two bugs in WAL-logging of GIN pending-list pages.	Heikki Linnakangas	2014-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In writeListPage, never take a full-page image of the page, because we have all the information required to re-initialize in the WAL record anyway. Before this fix, a full-page image was always generated, unless full_page_writes=off, because when the page is initialized its LSN is always 0. In stable-branches, keep the code to restore the backup blocks if they exist, in case that the WAL is generated with an older minor version, but in master Assert that there are no full-page images. In the redo routine, add missing "off++". Otherwise the tuples are added to the page in reverse order. That happens to be harmless because we always scan and remove all the tuples together, but it was clearly wrong. Also, it was masked by the first bug unless full_page_writes=off, because the page was always restored from a full-page image. Backpatch to all supported versions.
*	Improve generation algorithm for database system identifier.	Tom Lane	2014-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \|	As noted some time ago, the original coding had a typo ("\|" for "^") that made the result less unique than intended. Even the intended behavior is obsolete since it was based on wanting to produce a usable value even if we didn't have int64 arithmetic --- a limitation we stopped supporting years ago. Instead, let's redefine the system identifier as tv_sec in the upper 32 bits (same as before), tv_usec in the next 20 bits, and the low 12 bits of getpid() in the remaining bits. This is still hardly guaranteed-universally-unique, but it's noticeably better than before. Per my proposal at <29019.1374535940@sss.pgh.pa.us>
*	Fix race when updating a tuple concurrently locked by another process	Alvaro Herrera	2014-04-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a tuple is locked, and this lock is later upgraded either to an update or to a stronger lock, and in the meantime some other process tries to lock, update or delete the same tuple, it (the tuple) could end up being updated twice, or having conflicting locks held. The reason for this is that the second updater checks for a change in Xmax value, or in the HEAP_XMAX_IS_MULTI infomask bit, after noticing the first lock; and if there's a change, it restarts and re-evaluates its ability to update the tuple. But it neglected to check for changes in lock strength or in lock-vs-update status when those two properties stayed the same. This would lead it to take the wrong decision and continue with its own update, when in reality it shouldn't do so but instead restart from the top. This could lead to either an assertion failure much later (when a multixact containing multiple updates is detected), or duplicate copies of tuples. To fix, make sure to compare the other relevant infomask bits alongside the Xmax value and HEAP_XMAX_IS_MULTI bit, and restart from the top if necessary. Also, in the belt-and-suspenders spirit, add a check to MultiXactCreateFromMembers that a multixact being created does not have two or more members that are claimed to be updates. This should protect against other bugs that might cause similar bogus situations. Backpatch to 9.3, where the possibility of multixacts containing updates was introduced. (In prior versions it was possible to have the tuple lock upgraded from shared to exclusive, and an update would not restart from the top; yet we're protected against a bug there because there's always a sleep to wait for the locking transaction to complete before continuing to do anything. Really, the fact that tuple locks always conflicted with concurrent updates is what protected against bugs here.) Per report from Andrew Dunstan and Josh Berkus in thread at http://www.postgresql.org/message-id/534C8B33.9050807@pgexperts.com Bug analysis by Andres Freund.
*	Reset pg_stat_activity.xact_start during PREPARE TRANSACTION.	Tom Lane	2014-04-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Once we've completed a PREPARE, our session is not running a transaction, so its entry in pg_stat_activity should show xact_start as null, rather than leaving the value as the start time of the now-prepared transaction. I think possibly this oversight was triggered by faulty extrapolation from the adjacent comment that says PrepareTransaction should not call AtEOXact_PgStat, so tweak the wording of that comment. Noted by Andres Freund while considering bug #10123 from Maxim Boguk, although this error doesn't seem to explain that report. Back-patch to all active branches.
*	Update obsolete comments.	Heikki Linnakangas	2014-04-23
\| \| \| \|	We no longer have a TLI field in the page header.
*	Fix typos in comment.	Heikki Linnakangas	2014-04-23
\|
*	Cleanup of new b-tree page deletion code.	Heikki Linnakangas	2014-04-23
\| \| \| \| \| \| \| \| \| \| \| \|	When marking a branch as half-dead, a pointer to the top of the branch is stored in the leaf block's hi-key. During normal operation, the high key was left in place, and the block number was just stored in the ctid field of the high key tuple, but in WAL replay, the high key was recreated as a truncated tuple with zero columns. For the sake of easier debugging, also truncate the tuple in normal operation, so that the page is identical after WAL replay. Also, rename the 'downlink' field in the WAL record to 'topparent', as that seems like a more descriptive name. And make sure it's set to invalid when unlinking the leaf page.
*	Fix broken logic in logical_heap_rewrite_flush_mappings().	Tom Lane	2014-04-22
\| \| \| \| \|	It's blatantly obvious that commit 4d0d607a454ee832574afd52a3c515099cc85eb3 wasn't tested. The leak's real enough, though.
*	revert 4d0d607a454ee832574afd52a3c515099cc85eb3	Bruce Momjian	2014-04-22
\| \| \| \|	Revert due to contrib/test_decoding regression failure
*	release memory used while flushing logical mappings	Bruce Momjian	2014-04-22
\| \| \| \|	Patch by Ants Aasma
*	Fix bug in the new B-tree incomplete-split code.	Heikki Linnakangas	2014-04-22
\| \| \| \| \| \|	Forgot to update LSN of left sibling's page, when creating a new root. I fixed this for regular insertions and page splits earlier, but missed new root creation.
*	Fix Gin README.	Heikki Linnakangas	2014-04-22
\| \| \| \| \| \| \|	The README incorrectly claimed that GIN posting tree pages contain an array of uncompressed items in addition to compressed posting lists. Earlier versions of the GIN posting list compression patch worked that way, but not the one that was committed.
*	Fix bug in new B-tree page deletion code.	Heikki Linnakangas	2014-04-22
\| \| \| \| \|	When modifying a page, must hold an exclusive lock. A shared lock is obviously not good enough.
*	Retain original physical order of tuples in redo of b-tree splits.	Heikki Linnakangas	2014-04-22
\| \| \| \| \|	It makes no difference to the system, but minimizing the differences between a master and standby makes debugging simpler.
*	Fix rm_desc routine of b-tree page delete records.	Heikki Linnakangas	2014-04-22
\| \| \| \|	A couple of typos from my refactoring of the page deletion patch.
*	Fix typo.	Robert Haas	2014-04-20
\| \| \| \|	Etsuro Fujita
*	Fix typo	Magnus Hagander	2014-04-18
\| \| \| \|	Amit Langote
*	report stat() error in trigger file check	Bruce Momjian	2014-04-17
\| \| \| \| \| \| \|	Permissions might prevent the existence of the trigger file from being checked. Per report from Andres Freund
*	Use correctly-sized buffer when zero-filling a WAL file.	Heikki Linnakangas	2014-04-16
\| \| \| \| \| \|	I mixed up BLCKSZ and XLOG_BLCKSZ when I changed the way the buffer is allocated a couple of weeks ago. With the default settings, they are both 8k, but they can be changed at compile-time.
*	Set pd_lower on internal GIN posting tree pages.	Heikki Linnakangas	2014-04-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows squeezing out the unused space in full-page writes. And more importantly, it can be a useful debugging aid. In hindsight we should've done this back when GIN was added - we wouldn't need the 'maxoff' field in the page opaque struct if we had used pd_lower and pd_upper like on normal pages. But as long as there can be pages in the index that have been binary-upgraded from pre-9.4 versions, we can't rely on that, and have to continue using 'maxoff'. Most of the code churn comes from renaming some macros, now that they're used on internal pages, too. This change is completely backwards-compatible, no effect on pg_upgrade.
*	Fix bogus handling of bad strategy number in GIST consistent() functions.	Tom Lane	2014-04-14
\| \| \| \| \| \| \| \| \| \| \|	Make sure we throw an error instead of silently doing the wrong thing when fed a strategy number we don't recognize. Also, in the places that did already throw an error, spell the error message in a way more consistent with our message style guidelines. Per report from Paul Jones. Although this is a bug, it won't occur unless a superuser tries to do something he shouldn't, so it doesn't seem worth back-patching.
*	Remove dead checks for invalid left page in ginDeletePage.	Heikki Linnakangas	2014-04-14
\| \| \| \| \|	In some places, the function assumes the left page is valid, and in others, it checks if it is valid. Remove all the checks.
*	GIN entry pages follow the standard page layout - tell XLogInsert.	Heikki Linnakangas	2014-04-14
\| \| \| \| \| \|	The entry B-tree pages all follow the standard page layout. The 9.3 code has this right. I inadvertently changed this at some point during the big refactorings in git master.
*	Fix bugs in GIN "fast scan" with partial match.	Heikki Linnakangas	2014-04-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There were a couple of bugs here. First, if the fuzzy limit was exceeded, the loop in entryGetItem might drop out too soon if a whole block needs to be skipped because it's < advancePast ("continue" in a while-loop checks the loop condition too). Secondly, the loop checked when stepping to a new page that there is at least one offset on the page < advancePast, but we cannot rely on that on subsequent calls of entryGetItem, because advancePast might change in between. That caused the skipping loop to read bogus items in the TbmIterateResult's offset array. First item and fix by Alexander Korotkov, second bug pointed out by Fabrízio de Royes Mello, by a small variation of Alexander's test query.
*	Fix typo in comment.	Heikki Linnakangas	2014-04-10
\| \| \| \|	Tomonari Katsumata
*	Fix hot standby bug with GiST scans.	Heikki Linnakangas	2014-04-08
\| \| \| \| \| \| \| \| \| \|	Don't reset the rightlink of a page when replaying a page update record. This was a leftover from pre-hot standby days, when it was not possible to have scans concurrent with WAL replay. Resetting the right-link was not necessary back then either, but it was done for the sake of tidiness. But with hot standby, it's wrong, because a concurrent scan might still need it. Backpatch all versions with hot standby, 9.0 and above.
*	Zero padding byte at end of GIN posting list.	Heikki Linnakangas	2014-04-07
\| \| \| \|	This isn't strictly necessary, but helps debugging.
*	Fix WAL replay bug in the new GIN incomplete-split code.	Heikki Linnakangas	2014-04-07
\| \| \| \| \| \| \| \|	Forgot to set the incomplete-split flag on the left page half, in redo of a page split. Spotted this by comparing the page contents on master and standby, after inserting/applying each WAL record.
*	Fix another palloc in critical section.	Heikki Linnakangas	2014-04-05
\| \| \| \| \| \| \| \|	Also add a regression test for a GIN index with enough items with the same key, so that a GIN posting tree gets created. Apparently none of the existing GIN tests were large enough for that. This code is new, no backpatching required.
*	Fix some compiler warnings that clang emits with -pedantic.	Robert Haas	2014-04-04
\| \| \| \|	Andres Freund
*	Move multixid allocation out of critical section.	Heikki Linnakangas	2014-04-04
\| \| \| \| \| \|	It can fail if you run out of memory. This call was added in 9.3, so backpatch to 9.3 only.
*	In checkpoint, move the check for in-progress xacts out of critical section.	Heikki Linnakangas	2014-04-04
\| \| \| \| \| \|	GetVirtualXIDsDelayingChkpt calls palloc, which isn't safe in a critical section. I thought I covered this case with the exemption for the checkpointer, but CreateCheckPoint is also called from the startup process.
*	Avoid allocations in critical sections.	Heikki Linnakangas	2014-04-04
\| \| \| \|	If a palloc in a critical section fails, it becomes a PANIC.
*	Avoid palloc in critical section in GiST WAL-logging.	Heikki Linnakangas	2014-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Memory allocation can fail if you run out of memory, and inside a critical section that will lead to a PANIC. Use conservatively-sized arrays in stack instead. There was previously no explicit limit on the number of pages a GiST split can produce, it was only limited by the number of LWLocks that can be held simultaneously (100 at the moment). This patch adds an explicit limit of 75 pages. That should be plenty, a typical split shouldn't produce more than 2-3 page halves. The bug has been there forever, but only backpatch down to 9.1. The code was changed significantly in 9.1, and it doesn't seem worth the risk or trouble to adapt this for 9.0 and 8.4.
*	Fix bug in the new GIN incomplete-split code.	Heikki Linnakangas	2014-04-01
\| \| \| \| \| \| \| \|	Inserting a downlink to an internal page clears the incomplete-split flag of the child's left sibling, so the left sibling's LSN also needs to be updated and it needs to be marked dirty. The codepath for an insertion got this right, but the case where the internal node is split because of inserting the new downlink missed that.
*	Remove dead check for backup block, replace with Assert.	Heikki Linnakangas	2014-04-01
\| \| \| \| \|	We don't use backup blocks with GIN vacuum records anymore, the page is always recreated from scratch.
*	Fix bug in the new B-tree incomplete-split code.	Heikki Linnakangas	2014-04-01
\| \| \| \| \| \|	Inserting a downlink to an internal page clears the incomplete-split flag of the child's left sibling, so the left sibling's LSN also needs to be updated.
*	Rewrite the way GIN posting lists are packed on a page, to reduce WAL volume.	Heikki Linnakangas	2014-03-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Inserting (in retail) into the new 9.4 format GIN posting tree created much larger WAL records than in 9.3. The previous strategy to WAL logging was basically to log the whole page on each change, with the exception of completely unmodified segments up to the first modified one. That was not too bad when appending to the end of the page, as only the last segment had to be WAL-logged, but per Fujii Masao's testing, even that produced 2x the WAL volume that 9.3 did. The new strategy is to keep track of changes to the posting lists in a more fine-grained fashion, and also make the repacking" code smarter to avoid decoding and re-encoding segments unnecessarily.
*	Rename GinLogicValue to GinTernaryValue.	Heikki Linnakangas	2014-03-31
\| \| \| \| \|	It's more descriptive. Also, get rid of the enum, and use #defines instead, per Greg Stark's suggestion.
*	Pass more than the first XLogRecData entry to rm_desc, with WAL_DEBUG.	Heikki Linnakangas	2014-03-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If you compile with WAL_DEBUG and enable it with wal_debug=on, we used to only pass the first XLogRecData entry to the rm_desc routine. I think the original assumprion was that the first XLogRecData entry contains all the necessary information for the rm_desc routine, but that's a pretty shaky assumption. At least standby_redo didn't get the memo. To fix, piece together all the data in a temporary buffer, and pass that to the rm_desc routine. It's been like this forever, but the patch didn't apply cleanly to back-branches. Probably wouldn't be hard to fix the conflicts, but it's not worth the trouble.
*	Don't forget to flush XLOG_PARAMETER_CHANGE record.	Fujii Masao	2014-03-26
\| \| \| \|	Backpatch to 9.0 where XLOG_PARAMETER_CHANGE record was instroduced.
*	Change ginMergeItemPointers to return a palloc'd array.	Heikki Linnakangas	2014-03-24
\| \| \| \| \|	That seems nicer than making it the caller's responsibility to pass a suitable-sized array. All the callers were just palloc'ing an array anyway.
*	Remove dead code and add comments.	Heikki Linnakangas	2014-03-24
\| \| \| \| \|	'cbuffer' variable was left over from an earlier version of the patch to rewrite the incomplete split handling.
*	Fix "the the" typos.	Heikki Linnakangas	2014-03-24
\| \| \| \|	Erik Rijkers
*	Address ccvalid/ccnoinherit in TupleDesc support functions.	Noah Misch	2014-03-23
\| \| \| \| \| \| \| \| \|	equalTupleDescs() neglected both of these ConstrCheck fields, and CreateTupleDescCopyConstr() neglected ccnoinherit. At this time, the only known behavior defect resulting from these omissions is constraint exclusion disregarding a CHECK constraint validated by an ALTER TABLE VALIDATE CONSTRAINT statement issued earlier in the same transaction. Back-patch to 9.2, where these fields were introduced.
*	Replace the XLogInsert slots with regular LWLocks.	Heikki Linnakangas	2014-03-21
\| \| \| \| \| \| \| \| \| \|	The special feature the XLogInsert slots had over regular LWLocks is the insertingAt value that was updated atomically with releasing backends waiting on it. Add new functions to the LWLock API to do that, and replace the slots with LWLocks. This reduces the amount of duplicated code. (There's still some duplication, but at least it's all in lwlock.c now.) Reviewed by Andres Freund.
*	Setup error context callback for transaction lock waits	Alvaro Herrera	2014-03-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this in place, a session blocking behind another one because of tuple locks will get a context line mentioning the relation name, tuple TID, and operation being done on tuple. For example: LOG: process 11367 still waiting for ShareLock on transaction 717 after 1000.108 ms DETAIL: Process holding the lock: 11366. Wait queue: 11367. CONTEXT: while updating tuple (0,2) in relation "foo" STATEMENT: UPDATE foo SET value = 3; Most usefully, the new line is displayed by log entries due to log_lock_waits, although of course it will be printed by any other log message as well. Author: Christian Kruse, some tweaks by Álvaro Herrera Reviewed-by: Amit Kapila, Andres Freund, Tom Lane, Robert Haas
*	Remove rm_safe_restartpoint machinery.	Heikki Linnakangas	2014-03-18
\| \| \| \| \| \| \| \| \|	It is no longer used, none of the resource managers have multi-record actions that would make it unsafe to perform a restartpoint. Also don't allow rm_cleanup to write WAL records, it's also no longer required. Move the call to rm_cleanup routines to make it more symmetric with rm_startup.
*	Make the handling of interrupted B-tree page splits more robust.	Heikki Linnakangas	2014-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Splitting a page consists of two separate steps: splitting the child page, and inserting the downlink for the new right page to the parent. Previously, we handled the case that you crash in between those steps with a cleanup routine after the WAL recovery had finished, which finished the incomplete split. However, that doesn't help if the page split is interrupted but the database doesn't crash, so that you don't perform WAL recovery. That could happen for example if you run out of disk space. Remove the end-of-recovery cleanup step. Instead, when a page is split, the left page is marked with a new INCOMPLETE_SPLIT flag, and when the downlink is inserted to the parent, the flag is cleared again. If an insertion sees a page with the flag set, it knows that the split was interrupted for some reason, and inserts the missing downlink before proceeding. I used the same approach to fix GIN and GiST split algorithms earlier. This was the last WAL cleanup routine, so we could get rid of that whole machinery now, but I'll leave that for a separate patch. Reviewed by Peter Geoghegan.
*	Fix thinko: have trueTriConsistentFn return GIN_TRUE.	Heikki Linnakangas	2014-03-17
\| \| \| \|	While we're at it, also improve comments in ginlogic.c.