| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 4b0d28de06 has removed the prior checkpoint and related
facilities but has left WAL recycling based on the LSN of the prior
checkpoint, which causes incorrect calculations for WAL removal and
recycling for max_wal_size and min_wal_size. This commit changes things
so as the base calculation point is the last checkpoint generated.
Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/20180723.135748.42558387.horiguchi.kyotaro@lab.ntt.co.jp
Backpatch: 11-, where the prior checkpoint has been removed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PostgreSQL 9.4 introduces posting list compression in GIN. This feature
supports online upgrade, so that after pg_upgrade uncompressed posting
lists are compressed on-the-fly. Underlying code appears to always
expect at least one item on uncompressed posting list page. But there
could be completely empty pages, because VACUUM never deletes leftmost
and rightmost pages from posting trees. This commit fixes that.
Reported-by: Sivasubramanian Ramasubramanian
Discussion: https://postgr.es/m/1531867212836.63354%40amazon.com
Author: Sivasubramanian Ramasubramanian, Alexander Korotkov
Backpatch-through: 9.4
|
|
|
|
|
|
|
|
| |
A collection of typos I happened to spot while reading code, as well as
grepping for common mistakes.
Backpatch to all supported versions, as applicable, to avoid conflicts
when backporting other commits in the future.
|
|
|
|
|
|
|
|
|
|
|
| |
Since the old logic was completely unaware of subtransactions, a
change made in a subsequently-aborted subtransaction would still cause
workers to be stopped at toplevel transaction commit. Fix that by
managing a stack of worker lists rather than just one.
Amit Khandekar and Robert Haas
Discussion: http://postgr.es/m/CAJ3gD9eaG_mWqiOTA2LfAug-VRNn1hrhf50Xi1YroxL37QkZNg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When reading an existing FSM or VM page that was found to be corrupt by the
buffer manager, the code applied PageInit() to reinitialize the page, but
did so without any locking. There is thus a hazard that two backends might
concurrently do PageInit, which in itself would still be OK, but the slower
one might then zero over subsequent data changes applied by the faster one.
Even that is unlikely to be fatal; but it's not desirable, so add locking
to prevent it.
This does not add any locking overhead in the normal code path where the
page is OK. It's not immediately obvious that that's safe, but I believe
it is, for reasons explained in the added comments.
Problem noted by R P Asim. It's been like this for a long time, so
back-patch to all supported branches.
Discussion: https://postgr.es/m/CANXE4Te4G0TGq6cr0-TvwP0H4BNiK_-hB5gHe8mF+nz0mcYfMQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit fafa374f2 caused _bt_getbuf() to possibly emit a WAL record for
a page that it was about to recycle. However, it failed to distinguish
all-zero pages from dead pages, which is important because only the
latter have valid btpo.xact values, or indeed any special space at all.
Recycling an all-zero page with XLogStandbyInfoActive() enabled therefore
led to an Assert failure, or to emission of a WAL record containing a
bogus cutoff XID, which might lead to unnecessary query cancellations
on hot standby servers.
Per reports from Antonin Houska and 自己. Amit Kapila was first to
propose this fix, and Robert Haas, myself, and Kyotaro Horiguchi
reviewed it at various times.
This is an old bug, so back-patch to all supported branches.
Discussion: https://postgr.es/m/2628.1474272158@localhost
Discussion: https://postgr.es/m/48875502.f4a0.1635f0c27b0.Coremail.zoulx1982@163.com
|
|
|
|
|
|
|
|
|
| |
Commit fc49e24fa69a added an input argument after the existing output
argument. Flip those.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20180708182345.imdgovmkffgtihhk@alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A critical failure in some of the end-of-recovery actions before the
end-of-recovery record is written can cause PostgreSQL to react
inconsistently with the rest of the cluster in the event of a crash
before the final record is written. Two such failures are for example
an error while processing a two-phase state files or when operating on
recovery.conf. With this commit, the failures are still considered
FATAL, but the write of the timeline history file is delayed as much as
possible so as the window between the moment the file is written and the
end-of-recovery record is generated gets minimized. This way, in the
event of a crash or a failure, the new timeline decided at promotion
will not seem taken by other nodes in the cluster. It is not really
possible to reduce to zero this window, hence one could still see
failures if a crash happens between the history file write and the
end-of-recovery record, so any future code should be careful when
adding new end-of-recovery actions. The original report from Magnus
Hagander mentioned a renamed recovery.conf as original end-of-recovery
failure which caused a timeline to be seen as taken but the subsequent
processing on the now-missing recovery.conf cause the startup process to
issue stop on FATAL, which at follow-up startup made the system
inconsistent because of on-disk changes which already happened.
Processing of two-phase state files still needs some work as corrupted
entries are simply ignored now. This is left as a future item and this
commit fixes the original complain.
Reported-by: Magnus Hagander
Author: Heikki Linnakangas
Reviewed-by: Alexander Korotkov, Michael Paquier, David Steele
Discussion: https://postgr.es/m/CABUevEz09XY2EevA2dLjPCY-C5UO4Hq=XxmXLmF6ipNFecbShQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a standby crashes after promotion before having completed its first
post-recovery checkpoint, then the minimal recovery point which marks
the LSN position where the cluster is able to reach consistency may be
set to a position older than the first end-of-recovery checkpoint while
all the WAL available should be replayed. This leads to the instance
thinking that it contains inconsistent pages, causing a PANIC and a hard
instance crash even if all the WAL available has not been replayed for
certain sets of records replayed. When in crash recovery,
minRecoveryPoint is expected to always be set to InvalidXLogRecPtr,
which forces the recovery to replay all the WAL available, so this
commit makes sure that the local copy of minRecoveryPoint from the
control file is initialized properly and stays as it is while crash
recovery is performed. Once switching to archive recovery or if crash
recovery finishes, then the local copy minRecoveryPoint can be safely
updated.
Pavan Deolasee has reported and diagnosed the failure in the first
place, and the base fix idea to rely on the local copy of
minRecoveryPoint comes from Kyotaro Horiguchi, which has been expanded
into a full-fledged patch by me. The test included in this commit has
been written by Álvaro Herrera and Pavan Deolasee, which I have modified
to make it faster and more reliable with sleep phases.
Backpatch down to all supported versions where the bug appears, aka 9.3
which is where the end-of-recovery checkpoint is not run by the startup
process anymore. The test gets easily supported down to 10, still it
has been tested on all branches.
Reported-by: Pavan Deolasee
Diagnosed-by: Pavan Deolasee
Reviewed-by: Pavan Deolasee, Kyotaro Horiguchi
Author: Michael Paquier, Kyotaro Horiguchi, Pavan Deolasee, Álvaro
Herrera
Discussion: https://postgr.es/m/CABOikdPOewjNL=05K5CbNMxnNtXnQjhTx2F--4p4ruorCjukbA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When deleting pages the nbtree code has to walk through siblings of a
tree node. When those sibling links are corrupted that can lead to
endless loops - which are currently not interruptible. This is
especially problematic if autovacuum is repeatedly blocked on such
indexes, as it can be hard to get out of that situation without
resorting to single user mode.
Thus add interrupt checks to appropriate places in such
loops. Unfortunately in one of the cases it's it's not easy to do so.
Between 9.3 and 9.4 the page deletion (and page split) code changed
significantly. Before it was significantly less robust against
interruptions. Therefore don't backpatch to 9.3.
Author: Andres Freund
Discussion: https://postgr.es/m/20180627191629.wkunw2qbibnvlz53@alap3.anarazel.de
Backpatch: 9.4-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When multiple relations are deleted at the same transaction,
the files of those relations are deleted by one call to smgrdounlinkall(),
which leads to scan whole shared_buffers only one time. OTOH,
previously, during recovery, smgrdounlink() (not smgrdounlinkall()) was
called for each file to delete, which led to scan shared_buffers
multiple times. Obviously this could cause to increase the WAL replay
time very much especially when shared_buffers was huge.
To alleviate this situation, this commit changes the recovery so that
it also calls smgrdounlinkall() only one time to delete multiple
relation files.
This is just fix for oversight of commit 279628a0a7, not new feature.
So, per discussion on pgsql-hackers, we concluded to backpatch this
to all supported versions.
Author: Fujii Masao
Reviewed-by: Michael Paquier, Andres Freund, Thomas Munro, Kyotaro Horiguchi, Takayuki Tsunakawa
Discussion: https://postgr.es/m/CAHGQGwHVQkdfDqtvGVkty+19cQakAydXn1etGND3X0PHbZ3+6w@mail.gmail.com
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Changed the name of few structure members for the sake of clarity and
removed spurious whitespace.
Reported-by: Amit Kapila
Author: Amit Kapila, based on suggestion by Andrew Dunstan
Reviewed-by: Alvaro Herrera
Discussion: https://postgr.es/m/CAA4eK1K2znsFpC+NQ9A4vxT4uDxADN4RmvHX0L6Y=aHVo9gB4Q@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
6ca33a88 sets upper limit for vacuum_cleanup_index_scale_factor to
DBL_MAX. DBL_MAX appears to be platform-dependent. That causes
many buildfarm animals to fail, because we check boundaries of
vacuum_cleanup_index_scale_factor in regression tests.
This commit changes upper limit from DBL_MAX to just "large enough"
limit, which was arbitrary selected as 1e10.
Author: Alexander Korotkov
Reported-by: Tom Lane, Darafei Praliaskouski
Discussion: https://postgr.es/m/CAPpHfdvewmr4PcpRjrkstoNn1n2_6dL-iHRB21CCfZ0efZdBTg%40mail.gmail.com
Discussion: https://postgr.es/m/CAC8Q8tLYFOpKNaPS_E7V8KtPdE%3D_TnAn16t%3DA3LuL%3DXjfOO-BQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
Building a new nbtree index through incremental insertions would always
be slower than our actual approach of sorting using tuplesort,
assembling leaf pages from tuplesort output, and writing and WAL-logging
whole pages. Remove a comment block from the Berkeley days claiming
that incremental insertions might be slightly faster with presorted
input.
Discussion: https://postgr.es/m/CAH2-WzmKs4mLAoFgJ3yHMRYc849efc=dw+pNRb3NEog2oJoCNw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Upper limits for vacuum_cleanup_index_scale_factor GUC and reloption
were initially set to 100.0 in 857f9c36. However, after further
discussion, it appears that some users like to disable B-tree cleanup
index scan completely (assuming there are no deleted pages).
vacuum_cleanup_index_scale_factor is used barely to protect against
stalled index statistics. And after detailed consideration it appears
that risk of stalled index statistics is low. And it would be nice to
allow advanced users setting higher values of
vacuum_cleanup_index_scale_factor. So, set upper limit for these
GUC and reloption to DBL_MAX.
Author: Alexander Korotkov
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAC8Q8tJCb%3DgxhzcV7T6ctx7PY-Ux1oA-AsTJc6cAVNsQiYcCzA%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
System calls mixed up in error code paths are causing two issues which
several code paths have not correctly handled:
1) For write() calls, sometimes the system may return less bytes than
what has been written without errno being set. Some paths were careful
enough to consider that case, and assumed that errno should be set to
ENOSPC, other calls missed that.
2) errno generated by a system call is overwritten by other system calls
which may succeed once an error code path is taken, causing what is
reported to the user to be incorrect.
This patch uses the brute-force approach of correcting all those code
paths. Some refactoring could happen in the future, but this is let as
future work, which is not targeted for back-branches anyway.
Author: Michael Paquier
Reviewed-by: Ashutosh Sharma
Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz
|
|
|
|
| |
Author: Shao Bret
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a standby's WAL receiver stops reading WAL from a WAL stream, it
writes data to the current WAL segment without having priorily zero'ed
the page currently written to, which can cause the WAL reader to read
junk data from a past recycled segment and then it would try to get a
record from it. While sanity checks in place provide most of the
protection needed, in some rare circumstances, with chances increasing
when a record header crosses a page boundary, then the startup process
could fail violently on an allocation failure, as follows:
FATAL: invalid memory alloc request size XXX
This is confusing for the user and also unhelpful as this requires in
the worst case a manual restart of the instance, impacting potentially
the availability of the cluster, and this also makes WAL data look like
it is in a corrupted state.
The chances of seeing failures are higher if the connection between the
standby and its root node is unstable, causing WAL pages to be written
in the middle. A couple of approaches have been discussed, like
zero-ing new WAL pages within the WAL receiver itself but this has the
disadvantage of impacting performance of any existing instances as this
breaks the sequential writes done by the WAL receiver. This commit
deals with the problem with a more simple approach, which has no
performance impact without reducing the detection of the problem: if a
record is found with a length higher than 1GB for backends, then do not
try any allocation and report a soft failure which will force the
standby to retry reading WAL. It could be possible that the allocation
call passes and that an unnecessary amount of memory is allocated,
however follow-up checks on records would just fail, making this
allocation short-lived anyway.
This patch owes a great deal to Tsunakawa Takayuki for reporting the
failure first, and then discussing a couple of potential approaches to
the problem.
Backpatch down to 9.5, which is where palloc_extended has been
introduced.
Reported-by: Tsunakawa Takayuki
Reviewed-by: Tsunakawa Takayuki
Author: Michael Paquier
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Issues relate only to subtransactions that hold AccessExclusiveLocks
when replayed on standby.
Prior to PG10, aborting subtransactions that held an
AccessExclusiveLock failed to release the lock until top level commit or
abort. 49bff5300d527 fixed that.
However, 49bff5300d527 also introduced a similar bug where subtransaction
commit would fail to release an AccessExclusiveLock, leaving the lock to
be removed sometimes early and sometimes late. This commit fixes
that bug also. Backpatch to PG10 needed.
Tested by observation. Note need for multi-node isolationtester to improve
test coverage for this and other HS cases.
Reported-by: Simon Riggs
Author: Simon Riggs
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 1eb6d6527aae introduced zeroed alignment bytes in the GID field
of commit/abort WAL records. Fixup commit cf5a1890592b later changed
that representation into a regular cstring with a single terminating
zero byte, but it also introduced an off-by-one mistake. Fix that.
Author: Nikhil Sontakke
Reported-by: Nikhil Sontakke
Discussion: https://postgr.es/m/CAMGcDxey6dG1DP34_tJMoWPcp5sPJUAL4K5CayUUXLQSx2GQpA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
They already fail anyway, but prior to this patch they raise an ugly
error message about a lock that cannot be acquired. This just improves
the message.
Author: Masahiko Sawada
Reported-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoBZau4g4_NUf3BKNd=CdYK+xaPdtJCzvOC1TxGdTiJx_Q@mail.gmail.com
Reviewed-by: Kuntal Ghosh, Alexander Korotkov, Simon Riggs, Michaël Paquier, Álvaro Herrera
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This bug causes a lseek() failure to be reported as a "could not open"
failure in the error message, muddling bug reports. I introduced this
copy-and-pasteo in commit 78e122010422.
Noticed while reviewing code for bug report #15221, from lily liang. In
version 10 the affected function is only used by multixact.c and
commit_ts, and only in corner-case circumstances, neither of which are
involved in the reported bug (a pg_subtrans failure.)
Author: Álvaro Herrera
|
|
|
|
|
|
|
|
| |
Any changes on page should be done in critical section, so move
_bt_upgrademetapage into critical section. Improve comment. Found by Amit
Kapila during post-commit review of 857f9c36.
Author: Amit Kapila
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "l" (ell) width spec means something in the corresponding scanf usage,
but not here. While modern POSIX says that applying "l" to "f" and other
floating format specs is a no-op, SUSv2 says it's undefined. Buildfarm
experience says that some old compilers emit warnings about it, and at
least one old stdio implementation (mingw's "ANSI" option) actually
produces wrong answers and/or crashes.
Discussion: https://postgr.es/m/21670.1526769114@sss.pgh.pa.us
Discussion: https://postgr.es/m/c085e1da-0d64-1c15-242d-c921f32e0d5c@dunslane.net
|
|
|
|
|
| |
Instead of saying "error: success", indicate that we got a working read
but it was too short.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Change vacuum_cleanup_index_scale_factor GUC to PGC_USERSET.
vacuum_cleanup_index_scale_factor GUC was defined as PGC_SIGHUP. But this
GUC affects not only autovacuum. So it might be useful to change it from user
session in order to influence manually runned VACUUM.
- Add missing tab-complete support for vacuum_cleanup_index_scale_factor
reloption.
- Fix condition for B-tree index cleanup.
Zero value of vacuum_cleanup_index_scale_factor means that user wants B-tree
index cleanup to be never skipped.
- Documentation and comment improvements
Authors: Justin Pryzby, Alexander Korotkov, Liudmila Mantrova
Reviewed by: all authors and Robert Haas
Discussion: https://www.postgresql.org/message-id/flat/20180502023025.GD7631%40telsasoft.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a continuation record is split so that its first half has already been
removed from the master, and is only present in pg_wal, and there is a
recycled WAL segment in the standby server that looks like it would
contain the second half, recovery would get stuck. The code in
XLogPageRead() incorrectly started streaming at the beginning of the
WAL record, even if we had already read the first page.
Backpatch to 9.4. In principle, older versions have the same problem, but
without replication slots, there was no straightforward mechanism to
prevent the master from recycling old WAL that was still needed by standby.
Without such a mechanism, I think it's reasonable to assume that there's
enough slack in how many old segments are kept around to not run into this,
or you have a WAL archive.
Reported by Jonathon Nelson. Analysis and patch by Kyotaro HORIGUCHI, with
some extra comments by me.
Discussion: https://www.postgresql.org/message-id/CACJqAM3xVz0JY1XFDKPP%2BJoJAjoGx%3DGNuOAshEDWCext7BFvCQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dan Wood diagnosed a long-standing problem that pages containing tuples
that are locked by multixacts containing live lockers may spuriously end
up as candidates for getting their all-visible flag set. This has the
long-term effect that multixacts remain unfrozen; this may previously
pass undetected, but since commit XYZ it would be reported as
"ERROR: found multixact 134100944 from before relminmxid 192042633"
because when a later vacuum tries to freeze the page it detects that a
multixact that should have gotten frozen, wasn't.
Dan proposed a (correct) patch that simply sets a variable to its
correct value, after a bogus initialization. But, per discussion, it
seems better coding to avoid the bogus initializations altogether, since
they could give rise to more bugs later. Therefore this fix rewrites
the logic a little bit to avoid depending on the bogus initializations.
This bug was part of a family introduced in 9.6 by commit a892234f830e;
later, commit 38e9f90a227d fixed most of them, but this one was
unnoticed.
Authors: Dan Wood, Pavan Deolasee, Álvaro Herrera
Reviewed-by: Masahiko Sawada, Pavan Deolasee, Álvaro Herrera
Discussion: https://postgr.es/m/84EBAC55-F06D-4FBE-A3F3-8BDA093CE3E3@amazon.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
nbtsort.c does not need to truncate away non-key attributes for the
minimum key of the leftmost page on a level, since this is only used to
build a minus infinity downlink for the level's leftmost page.
Truncating away non-key attributes in advance of truncating away all
attributes in _bt_sortaddtup() does not affect the correctness of CREATE
INDEX, but it is misleading.
Author: Peter Geoghegan
Discussion: https://www.postgresql.org/message-id/CAH2-WzkAS2M3ussHG-s_Av=Zo6dPjOxyu5fNRkYnxQV+YzGQ4w@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The principle behind the locking was not very well thought-out, and not
documented. Add a section in the README to explain how it's supposed to
work, and change the code so that it actually works that way.
This fixes two bugs:
1. If fast update was turned on concurrently, subsequent inserts to the
pending list would not conflict with predicate locks that were acquired
earlier, on entry pages. The included 'predicate-gin-fastupdate' test
demonstrates that. To fix, make all scans acquire a predicate lock on
the metapage. That lock represents a scan of the pending list, whether
or not there is a pending list at the moment. Forget about the
optimization to skip locking/checking for locks, when fastupdate=off.
2. If a scan finds no match, it still needs to lock the entry page. The
point of predicate locks is to lock the gabs between values, whether
or not there is a match. The included 'predicate-gin-nomatch' test
tests that case.
In addition to those two bug fixes, this removes some unnecessary locking,
following the principle laid out in the README. Because all items in
a posting tree have the same key value, a lock on the posting tree root is
enough to cover all the items. (With a very large posting tree, it would
possibly be better to lock the posting tree leaf pages instead, so that a
"skip scan" with a query like "A & B", you could avoid unnecessary conflict
if a new tuple is inserted with A but !B. But let's keep this simple.)
Also, some spelling fixes.
Author: Heikki Linnakangas with some editorization by me
Review: Andrey Borodin, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/0b3ad2c2-2692-62a9-3a04-5724f2af9114@iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an interrupt arrives in the middle of FinishPreparedTransaction
and any callback decide to call CHECK_FOR_INTERRUPTS (e.g.
RemoveTwoPhaseFile can write a warning with ereport, which checks for
interrupts) then it's possible to leave current GXact undeleted.
Backpatch to all supported branches
Stas Kelvich
Discussion: ihttps://www.postgresql.org/message-id/3AD85097-A3F3-4EBA-99BD-C38EDF8D2949@postgrespro.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recent gcc can warn about switch-case fall throughs that are not
explicitly labeled as intentional. This seems like a good thing,
so clean up the warnings exposed thereby by labeling all such
cases with comments that gcc will recognize.
In files that already had one or more suitable comments, I generally
matched the existing style of those. Otherwise I went with
/* FALLTHROUGH */, which is one of the spellings approved at the
more-restrictive-than-default level -Wimplicit-fallthrough=4.
(At the default level you can also spell it /* FALL ?THRU */,
and it's not picky about case. What you can't do is include
additional text in the same comment, so some existing comments
containing versions of this aren't good enough.)
Testing with gcc 8.0.1 (Fedora 28's current version), I found that
I also had to put explicit "break"s after elog(ERROR) or ereport(ERROR);
apparently, for this purpose gcc doesn't recognize that those don't
return. That seems like possibly a gcc bug, but it's fine because
in most places we did that anyway; so this amounts to a visit from the
style police.
Discussion: https://postgr.es/m/15083.1525207729@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
| |
This change makes this module act more like most of our other low-level
resource management modules. It's a caller error if something is not
explicitly closed by the end of a successful transaction, so issue
a WARNING about it. This would not actually have caught the file leak
bug fixed in commit 231bcd080, because that was in a transaction-abort
path; but it still seems like a good, and pretty cheap, cross-check.
Discussion: https://postgr.es/m/152056616579.4966.583293218357089052@wrigleys.postgresql.org
|
|
|
|
| |
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After introducing usage of t_tid of inner or page high key for storing
number of attributes of tuple, validation of tuple's ItemPointer with
ItemPointerIsValid becomes incorrect, it's need to validate only blocknumber of
ItemPointer. Missing this causes a incorrect page deletion, fix that. Test is
added.
BTW, current contrib/amcheck doesn't fail on index corrupted by this way.
Also introduce BTreeTupleGetTopParent/BTreeTupleSetTopParent macroses to improve
code readability and to avoid possible confusion with page high key: high key
is used to store top-parent link for branch to remove.
Bug found by Michael Paquier, but bug doesn't exist in previous versions because
t_tid was set to P_HIKEY.
Author: Teodor Sigaev
Reviewer: Peter Geoghegan
Discussion: https://www.postgresql.org/message-id/flat/20180419052436.GA16000%40paquier.xyz
|
|
|
|
|
|
|
|
| |
Remove an obsolete reference to the 'afteritem' argument, which was
removed by commit bc292937. Add a comment that clarifies how
_bt_insertonpg() indirectly handles the insertion of high key items.
Author: Peter Geoghegan
|
|
|
|
|
|
|
|
| |
New WAL record XLOG_BTREE_META_CLEANUP introduced in 857f9c36 has no handling
in btree_desc() and btree_identify(). This patch implements corresponding
handling.
Alexander Korotkov
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add several assertions that ensure that we're dealing with a pivot tuple
without non-key attributes where that's expected. Also, remove the
assertion within _bt_isequal(), restoring the v10 function signature. A
similar check will be performed for the page highkey within
_bt_moveright() in most cases. Also avoid dropping all objects within
regression tests, to increase pg_dump test coverage for INCLUDE indexes.
Rather than using infrastructure that's generally intended to be used
with reference counted heap tuple descriptors during truncation, use the
same function that was introduced to store flat TupleDescs in shared
memory (we use a temp palloc'd buffer). This isn't strictly necessary,
but seems more future-proof than the old approach. It also lets us
avoid including rel.h within indextuple.c, which was arguably a
modularity violation. Also, we now call index_deform_tuple() with the
truncated TupleDesc, not the source TupleDesc, since that's more robust,
and saves a few cycles.
In passing, fix a memory leak by pfree'ing truncated pivot tuple memory
during CREATE INDEX. Also pfree during a page split, just to be
consistent.
Refactor _bt_check_natts() to be more readable.
Author: Peter Geoghegan with some editorization by me
Reviewed by: Alexander Korotkov, Teodor Sigaev
Discussion: https://www.postgresql.org/message-id/CAH2-Wz%3DkCWuXeMrBCopC-tFs3FbiVxQNjjgNKdG2sHxZ5k2y3w%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
Review of commit 1eb6d652: It's pointless to add padding to the GID fields,
when the code that follows assumes that there is no alignment, and uses
memcpy(). Remove the pointless padding.
Update comments to note the new fields in the WAL records.
Reviewed-by: Michael Paquier
Discussion: https://www.postgresql.org/message-id/33b787bf-dc20-1161-54e9-3f3b607bf59d%40iki.fi
|
|
|
|
|
| |
Author: Michael Paquier
Discussion: https://www.postgresql.org/message-id/20180411075223.GB19732%40paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
spg_text_leaf_consistent() supposed that it should compare only
Min(querylen, entrylen) bytes of the two strings, and then deal with
any excess bytes in one string or the other by assuming the longer
string is greater if the prefixes are equal. Quite aside from the
fact that that's just wrong in some locales (e.g., 'ch' is not less
than 'd' in cs_CZ), it also risked passing incomplete multibyte
characters to strcoll(), with ensuing bad results.
Instead, just pass the full strings to varstr_cmp, and let it decide
what to do about unequal-length strings.
Fortunately, this error doesn't imply any index corruption, it's just
that searches might return the wrong set of entries.
Per report from Emre Hasegeli, though this is not his patch.
Thanks to Peter Geoghegan for review and discussion.
This code was born broken, so back-patch to all supported branches.
In HEAD, I failed to resist the temptation to do a bit of cosmetic
cleanup/pgindent'ing on 710d90da1, too.
Discussion: https://postgr.es/m/CAE2gYzzb6K51VnTq5i5p52z+j9p2duEa-K1T3RrC_GQEynAKEg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
Commit 16828d5c forgot to check that it had a set of missing values
before trying to retrieve a value from it.
An additional query to add coverage for this code is added to the
regression test.
Per bug report from Andreas Seltenreich.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commits d204ef63776b8a00ca220adec23979091564e465,
83454e3c2b28141c0db01c7d2027e01040df5249 and a few more commits thereafter
(complete list at the end) related to MERGE feature.
While the feature was fully functional, with sufficient test coverage and
necessary documentation, it was felt that some parts of the executor and
parse-analyzer can use a different design and it wasn't possible to do that in
the available time. So it was decided to revert the patch for PG11 and retry
again in the future.
Thanks again to all reviewers and bug reporters.
List of commits reverted, in reverse chronological order:
f1464c5380 Improve parse representation for MERGE
ddb4158579 MERGE syntax diagram correction
530e69e59b Allow cpluspluscheck to pass by renaming variable
01b88b4df5 MERGE minor errata
3af7b2b0d4 MERGE fix variable warning in non-assert builds
a5d86181ec MERGE INSERT allows only one VALUES clause
4b2d44031f MERGE post-commit review
4923550c20 Tab completion for MERGE
aa3faa3c7a WITH support in MERGE
83454e3c2b New files for MERGE
d204ef6377 MERGE SQL Command following SQL:2016
Author: Pavan Deolasee
Reviewed-by: Michael Paquier
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The nextOid value is from the start of the checkpoint and may well be stale
compared to values from more recent XLOG_NEXTOID records. Previously, we
adopted it anyway, allowing the OID counter to go backwards during a crash.
While this should be harmless, it contributed to the severity of the bug
fixed in commit 0408e1ed5, by allowing duplicate TOAST OIDs to be assigned
immediately following a crash. Without this error, that issue would only
have arisen when TOAST objects just younger than a multiple of 2^32 OIDs
were deleted and then not vacuumed in time to avoid a conflict.
Pavan Deolasee
Discussion: https://postgr.es/m/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-=h4OLosXHf9w@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When selecting a new OID, we take care to avoid picking one that's already
in use in the target table, so as not to create duplicates after the OID
counter has wrapped around. However, up to now we used SnapshotDirty when
scanning for pre-existing entries. That ignores committed-dead rows, so
that we could select an OID matching a deleted-but-not-yet-vacuumed row.
While that mostly worked, it has two problems:
* If recently deleted, the dead row might still be visible to MVCC
snapshots, creating a risk for duplicate OIDs when examining the catalogs
within our own transaction. Such duplication couldn't be visible outside
the object-creating transaction, though, and we've heard few if any field
reports corresponding to such a symptom.
* When selecting a TOAST OID, deleted toast rows definitely *are* visible
to SnapshotToast, and will remain so until vacuumed away. This leads to
a conflict that will manifest in errors like "unexpected chunk number 0
(expected 1) for toast value nnnnn". We've been seeing reports of such
errors from the field for years, but the cause was unclear before.
The fix is simple: just use SnapshotAny to search for conflicting rows.
This results in a slightly longer window before object OIDs can be
recycled, but that seems unlikely to create any large problems.
Pavan Deolasee
Discussion: https://postgr.es/m/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-=h4OLosXHf9w@mail.gmail.com
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimization was introduced in commit 2b272734. The changes include
some additional comments and documentation, and also these more
substantive changes:
. ensure the optimization is only applied on the leaf node of a tree
whose root is on level 2 or more. It's of little value on small trees.
. Delay calling RelationSetTargetBlock() until after the critical
section of _bt_insertonpg
. ensure the optimization is also applied to unlogged tables.
Pavan Deolasee and Peter Geoghegan with some very light editing from me.
Discussion: https://postgr.es/m/CABOikdO8jhRarNC60nZLktZYhxt+TK8z_V97+Ny499YQdyAfug@mail.gmail.com
|
|
|
|
|
|
| |
The comment earlier in the function correctly states "and the insertion
key is strictly greater than the first key in this page". That is what
we check here, not "greater than or equal".
|