aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
...
* Reset pg_stat_activity.xact_start during PREPARE TRANSACTION.Tom Lane2014-04-24
| | | | | | | | | | | | | | | Once we've completed a PREPARE, our session is not running a transaction, so its entry in pg_stat_activity should show xact_start as null, rather than leaving the value as the start time of the now-prepared transaction. I think possibly this oversight was triggered by faulty extrapolation from the adjacent comment that says PrepareTransaction should not call AtEOXact_PgStat, so tweak the wording of that comment. Noted by Andres Freund while considering bug #10123 from Maxim Boguk, although this error doesn't seem to explain that report. Back-patch to all active branches.
* Update obsolete comments.Heikki Linnakangas2014-04-23
| | | | We no longer have a TLI field in the page header.
* Fix typos in comment.Heikki Linnakangas2014-04-23
|
* Cleanup of new b-tree page deletion code.Heikki Linnakangas2014-04-23
| | | | | | | | | | | | When marking a branch as half-dead, a pointer to the top of the branch is stored in the leaf block's hi-key. During normal operation, the high key was left in place, and the block number was just stored in the ctid field of the high key tuple, but in WAL replay, the high key was recreated as a truncated tuple with zero columns. For the sake of easier debugging, also truncate the tuple in normal operation, so that the page is identical after WAL replay. Also, rename the 'downlink' field in the WAL record to 'topparent', as that seems like a more descriptive name. And make sure it's set to invalid when unlinking the leaf page.
* Fix broken logic in logical_heap_rewrite_flush_mappings().Tom Lane2014-04-22
| | | | | It's blatantly obvious that commit 4d0d607a454ee832574afd52a3c515099cc85eb3 wasn't tested. The leak's real enough, though.
* revert 4d0d607a454ee832574afd52a3c515099cc85eb3Bruce Momjian2014-04-22
| | | | Revert due to contrib/test_decoding regression failure
* release memory used while flushing logical mappingsBruce Momjian2014-04-22
| | | | Patch by Ants Aasma
* Fix bug in the new B-tree incomplete-split code.Heikki Linnakangas2014-04-22
| | | | | | Forgot to update LSN of left sibling's page, when creating a new root. I fixed this for regular insertions and page splits earlier, but missed new root creation.
* Fix Gin README.Heikki Linnakangas2014-04-22
| | | | | | | The README incorrectly claimed that GIN posting tree pages contain an array of uncompressed items in addition to compressed posting lists. Earlier versions of the GIN posting list compression patch worked that way, but not the one that was committed.
* Fix bug in new B-tree page deletion code.Heikki Linnakangas2014-04-22
| | | | | When modifying a page, must hold an exclusive lock. A shared lock is obviously not good enough.
* Retain original physical order of tuples in redo of b-tree splits.Heikki Linnakangas2014-04-22
| | | | | It makes no difference to the system, but minimizing the differences between a master and standby makes debugging simpler.
* Fix rm_desc routine of b-tree page delete records.Heikki Linnakangas2014-04-22
| | | | A couple of typos from my refactoring of the page deletion patch.
* Fix typo.Robert Haas2014-04-20
| | | | Etsuro Fujita
* Fix typoMagnus Hagander2014-04-18
| | | | Amit Langote
* report stat() error in trigger file checkBruce Momjian2014-04-17
| | | | | | | Permissions might prevent the existence of the trigger file from being checked. Per report from Andres Freund
* Use correctly-sized buffer when zero-filling a WAL file.Heikki Linnakangas2014-04-16
| | | | | | I mixed up BLCKSZ and XLOG_BLCKSZ when I changed the way the buffer is allocated a couple of weeks ago. With the default settings, they are both 8k, but they can be changed at compile-time.
* Set pd_lower on internal GIN posting tree pages.Heikki Linnakangas2014-04-14
| | | | | | | | | | | | | | | | This allows squeezing out the unused space in full-page writes. And more importantly, it can be a useful debugging aid. In hindsight we should've done this back when GIN was added - we wouldn't need the 'maxoff' field in the page opaque struct if we had used pd_lower and pd_upper like on normal pages. But as long as there can be pages in the index that have been binary-upgraded from pre-9.4 versions, we can't rely on that, and have to continue using 'maxoff'. Most of the code churn comes from renaming some macros, now that they're used on internal pages, too. This change is completely backwards-compatible, no effect on pg_upgrade.
* Fix bogus handling of bad strategy number in GIST consistent() functions.Tom Lane2014-04-14
| | | | | | | | | | | Make sure we throw an error instead of silently doing the wrong thing when fed a strategy number we don't recognize. Also, in the places that did already throw an error, spell the error message in a way more consistent with our message style guidelines. Per report from Paul Jones. Although this is a bug, it won't occur unless a superuser tries to do something he shouldn't, so it doesn't seem worth back-patching.
* Remove dead checks for invalid left page in ginDeletePage.Heikki Linnakangas2014-04-14
| | | | | In some places, the function assumes the left page is valid, and in others, it checks if it is valid. Remove all the checks.
* GIN entry pages follow the standard page layout - tell XLogInsert.Heikki Linnakangas2014-04-14
| | | | | | The entry B-tree pages all follow the standard page layout. The 9.3 code has this right. I inadvertently changed this at some point during the big refactorings in git master.
* Fix bugs in GIN "fast scan" with partial match.Heikki Linnakangas2014-04-10
| | | | | | | | | | | | | | There were a couple of bugs here. First, if the fuzzy limit was exceeded, the loop in entryGetItem might drop out too soon if a whole block needs to be skipped because it's < advancePast ("continue" in a while-loop checks the loop condition too). Secondly, the loop checked when stepping to a new page that there is at least one offset on the page < advancePast, but we cannot rely on that on subsequent calls of entryGetItem, because advancePast might change in between. That caused the skipping loop to read bogus items in the TbmIterateResult's offset array. First item and fix by Alexander Korotkov, second bug pointed out by Fabrízio de Royes Mello, by a small variation of Alexander's test query.
* Fix typo in comment.Heikki Linnakangas2014-04-10
| | | | Tomonari Katsumata
* Fix hot standby bug with GiST scans.Heikki Linnakangas2014-04-08
| | | | | | | | | | Don't reset the rightlink of a page when replaying a page update record. This was a leftover from pre-hot standby days, when it was not possible to have scans concurrent with WAL replay. Resetting the right-link was not necessary back then either, but it was done for the sake of tidiness. But with hot standby, it's wrong, because a concurrent scan might still need it. Backpatch all versions with hot standby, 9.0 and above.
* Zero padding byte at end of GIN posting list.Heikki Linnakangas2014-04-07
| | | | This isn't strictly necessary, but helps debugging.
* Fix WAL replay bug in the new GIN incomplete-split code.Heikki Linnakangas2014-04-07
| | | | | | | | Forgot to set the incomplete-split flag on the left page half, in redo of a page split. Spotted this by comparing the page contents on master and standby, after inserting/applying each WAL record.
* Fix another palloc in critical section.Heikki Linnakangas2014-04-05
| | | | | | | | Also add a regression test for a GIN index with enough items with the same key, so that a GIN posting tree gets created. Apparently none of the existing GIN tests were large enough for that. This code is new, no backpatching required.
* Fix some compiler warnings that clang emits with -pedantic.Robert Haas2014-04-04
| | | | Andres Freund
* Move multixid allocation out of critical section.Heikki Linnakangas2014-04-04
| | | | | | It can fail if you run out of memory. This call was added in 9.3, so backpatch to 9.3 only.
* In checkpoint, move the check for in-progress xacts out of critical section.Heikki Linnakangas2014-04-04
| | | | | | GetVirtualXIDsDelayingChkpt calls palloc, which isn't safe in a critical section. I thought I covered this case with the exemption for the checkpointer, but CreateCheckPoint is also called from the startup process.
* Avoid allocations in critical sections.Heikki Linnakangas2014-04-04
| | | | If a palloc in a critical section fails, it becomes a PANIC.
* Avoid palloc in critical section in GiST WAL-logging.Heikki Linnakangas2014-04-03
| | | | | | | | | | | | | | | | Memory allocation can fail if you run out of memory, and inside a critical section that will lead to a PANIC. Use conservatively-sized arrays in stack instead. There was previously no explicit limit on the number of pages a GiST split can produce, it was only limited by the number of LWLocks that can be held simultaneously (100 at the moment). This patch adds an explicit limit of 75 pages. That should be plenty, a typical split shouldn't produce more than 2-3 page halves. The bug has been there forever, but only backpatch down to 9.1. The code was changed significantly in 9.1, and it doesn't seem worth the risk or trouble to adapt this for 9.0 and 8.4.
* Fix bug in the new GIN incomplete-split code.Heikki Linnakangas2014-04-01
| | | | | | | | Inserting a downlink to an internal page clears the incomplete-split flag of the child's left sibling, so the left sibling's LSN also needs to be updated and it needs to be marked dirty. The codepath for an insertion got this right, but the case where the internal node is split because of inserting the new downlink missed that.
* Remove dead check for backup block, replace with Assert.Heikki Linnakangas2014-04-01
| | | | | We don't use backup blocks with GIN vacuum records anymore, the page is always recreated from scratch.
* Fix bug in the new B-tree incomplete-split code.Heikki Linnakangas2014-04-01
| | | | | | Inserting a downlink to an internal page clears the incomplete-split flag of the child's left sibling, so the left sibling's LSN also needs to be updated.
* Rewrite the way GIN posting lists are packed on a page, to reduce WAL volume.Heikki Linnakangas2014-03-31
| | | | | | | | | | | | | | Inserting (in retail) into the new 9.4 format GIN posting tree created much larger WAL records than in 9.3. The previous strategy to WAL logging was basically to log the whole page on each change, with the exception of completely unmodified segments up to the first modified one. That was not too bad when appending to the end of the page, as only the last segment had to be WAL-logged, but per Fujii Masao's testing, even that produced 2x the WAL volume that 9.3 did. The new strategy is to keep track of changes to the posting lists in a more fine-grained fashion, and also make the repacking" code smarter to avoid decoding and re-encoding segments unnecessarily.
* Rename GinLogicValue to GinTernaryValue.Heikki Linnakangas2014-03-31
| | | | | It's more descriptive. Also, get rid of the enum, and use #defines instead, per Greg Stark's suggestion.
* Pass more than the first XLogRecData entry to rm_desc, with WAL_DEBUG.Heikki Linnakangas2014-03-26
| | | | | | | | | | | | | | | If you compile with WAL_DEBUG and enable it with wal_debug=on, we used to only pass the first XLogRecData entry to the rm_desc routine. I think the original assumprion was that the first XLogRecData entry contains all the necessary information for the rm_desc routine, but that's a pretty shaky assumption. At least standby_redo didn't get the memo. To fix, piece together all the data in a temporary buffer, and pass that to the rm_desc routine. It's been like this forever, but the patch didn't apply cleanly to back-branches. Probably wouldn't be hard to fix the conflicts, but it's not worth the trouble.
* Don't forget to flush XLOG_PARAMETER_CHANGE record.Fujii Masao2014-03-26
| | | | Backpatch to 9.0 where XLOG_PARAMETER_CHANGE record was instroduced.
* Change ginMergeItemPointers to return a palloc'd array.Heikki Linnakangas2014-03-24
| | | | | That seems nicer than making it the caller's responsibility to pass a suitable-sized array. All the callers were just palloc'ing an array anyway.
* Remove dead code and add comments.Heikki Linnakangas2014-03-24
| | | | | 'cbuffer' variable was left over from an earlier version of the patch to rewrite the incomplete split handling.
* Fix "the the" typos.Heikki Linnakangas2014-03-24
| | | | Erik Rijkers
* Address ccvalid/ccnoinherit in TupleDesc support functions.Noah Misch2014-03-23
| | | | | | | | | equalTupleDescs() neglected both of these ConstrCheck fields, and CreateTupleDescCopyConstr() neglected ccnoinherit. At this time, the only known behavior defect resulting from these omissions is constraint exclusion disregarding a CHECK constraint validated by an ALTER TABLE VALIDATE CONSTRAINT statement issued earlier in the same transaction. Back-patch to 9.2, where these fields were introduced.
* Replace the XLogInsert slots with regular LWLocks.Heikki Linnakangas2014-03-21
| | | | | | | | | | The special feature the XLogInsert slots had over regular LWLocks is the insertingAt value that was updated atomically with releasing backends waiting on it. Add new functions to the LWLock API to do that, and replace the slots with LWLocks. This reduces the amount of duplicated code. (There's still some duplication, but at least it's all in lwlock.c now.) Reviewed by Andres Freund.
* Setup error context callback for transaction lock waitsAlvaro Herrera2014-03-19
| | | | | | | | | | | | | | | | | | With this in place, a session blocking behind another one because of tuple locks will get a context line mentioning the relation name, tuple TID, and operation being done on tuple. For example: LOG: process 11367 still waiting for ShareLock on transaction 717 after 1000.108 ms DETAIL: Process holding the lock: 11366. Wait queue: 11367. CONTEXT: while updating tuple (0,2) in relation "foo" STATEMENT: UPDATE foo SET value = 3; Most usefully, the new line is displayed by log entries due to log_lock_waits, although of course it will be printed by any other log message as well. Author: Christian Kruse, some tweaks by Álvaro Herrera Reviewed-by: Amit Kapila, Andres Freund, Tom Lane, Robert Haas
* Remove rm_safe_restartpoint machinery.Heikki Linnakangas2014-03-18
| | | | | | | | | It is no longer used, none of the resource managers have multi-record actions that would make it unsafe to perform a restartpoint. Also don't allow rm_cleanup to write WAL records, it's also no longer required. Move the call to rm_cleanup routines to make it more symmetric with rm_startup.
* Make the handling of interrupted B-tree page splits more robust.Heikki Linnakangas2014-03-18
| | | | | | | | | | | | | | | | | | | | | | Splitting a page consists of two separate steps: splitting the child page, and inserting the downlink for the new right page to the parent. Previously, we handled the case that you crash in between those steps with a cleanup routine after the WAL recovery had finished, which finished the incomplete split. However, that doesn't help if the page split is interrupted but the database doesn't crash, so that you don't perform WAL recovery. That could happen for example if you run out of disk space. Remove the end-of-recovery cleanup step. Instead, when a page is split, the left page is marked with a new INCOMPLETE_SPLIT flag, and when the downlink is inserted to the parent, the flag is cleared again. If an insertion sees a page with the flag set, it knows that the split was interrupted for some reason, and inserts the missing downlink before proceeding. I used the same approach to fix GIN and GiST split algorithms earlier. This was the last WAL cleanup routine, so we could get rid of that whole machinery now, but I'll leave that for a separate patch. Reviewed by Peter Geoghegan.
* Fix thinko: have trueTriConsistentFn return GIN_TRUE.Heikki Linnakangas2014-03-17
| | | | While we're at it, also improve comments in ginlogic.c.
* Fix typos in comments.Fujii Masao2014-03-17
| | | | Thom Brown
* Fix race condition in B-tree page deletion.Heikki Linnakangas2014-03-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In short, we don't allow a page to be deleted if it's the rightmost child of its parent, but that situation can change after we check for it. Problem ------- We check that the page to be deleted is not the rightmost child of its parent, and then lock its left sibling, the page itself, its right sibling, and the parent, in that order. However, if the parent page is split after the check but before acquiring the locks, the target page might become the rightmost child, if the split happens at the right place. That leads to an error in vacuum (I reproduced this by setting a breakpoint in debugger): ERROR: failed to delete rightmost child 41 of block 3 in index "foo_pkey" We currently re-check that the page is still the rightmost child, and throw the above error if it's not. We could easily just give up rather than throw an error, but that approach doesn't scale to half-dead pages. To recap, although we don't normally allow deleting the rightmost child, if the page is the *only* child of its parent, we delete the child page and mark the parent page as half-dead in one atomic operation. But before we do that, we check that the parent can later be deleted, by checking that it in turn is not the rightmost child of the grandparent (potentially recursing all the way up to the root). But the same situation can arise there - the grandparent can be split while we're not holding the locks. We end up with a half-dead page that we cannot delete. To make things worse, the keyspace of the deleted page has already been transferred to its right sibling. As the README points out, the keyspace at the grandparent level is "out-of-whack" until the half-dead page is deleted, and if enough tuples with keys in the transferred keyspace are inserted, the page might get split and a downlink might be inserted into the grandparent that is out-of-order. That might not cause any serious problem if it's transient (as the README ponders), but is surely bad if it stays that way. Solution -------- This patch changes the page deletion algorithm to avoid that problem. After checking that the topmost page in the chain of to-be-deleted pages is not the rightmost child of its parent, and then deleting the pages from bottom up, unlink the pages from top to bottom. This way, the intermediate stages are similar to the intermediate stages in page splitting, and there is no transient stage where the keyspace is "out-of-whack". The topmost page in the to-be-deleted chain doesn't have a downlink pointing to it, like a page split before the downlink has been inserted. This also allows us to get rid of the cleanup step after WAL recovery, if we crash during page deletion. The deletion will be continued at next VACUUM, but the tree is consistent for searches and insertions at every step. This bug is old, all supported versions are affected, but this patch is too big to back-patch (and changes the WAL record formats of related records). We have not heard any reports of the bug from users, so clearly it's not easy to bump into. Maybe backpatch later, after this has had some field testing. Reviewed by Kevin Grittner and Peter Geoghegan.
* C comments: remove odd blank lines after #ifdef WIN32 linesBruce Momjian2014-03-13
|