| Commit message (Collapse) | Author | Age |
... | |
|
|
|
| |
Etsuro Fujita
|
|
|
|
|
|
|
|
|
| |
It seems to make more sense to use "cutoff multixact" terminology
throughout the backend code; "freeze" is associated with replacing of an
Xid with FrozenTransactionId, which is not what we do for MultiXactIds.
Andres Freund
Some adjustments by Álvaro Herrera
|
|
|
|
| |
Bernd Helmle
|
|
|
|
|
|
|
| |
This allows a 32-bit field to represent an *optional* command ID
without a separate flag bit.
Andres Freund
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 269e780822abb2e44189afaccd6b0ee7aefa7ddd
and commit 5b571bb8c8d2bea610e01ae1ee7bc05adcfff528.
Unfortunately, the initial patch had insufficient performance testing,
and resulted in a regression.
Per report by Thom Brown.
|
|
|
|
|
|
|
| |
Performance testing shows that if the insertpos_lck spinlock and the fields
that it protects are on the same cache line with other variables that are
frequently accessed, the false sharing can hurt performance a lot. Keep
them apart by adding some padding.
|
|
|
|
|
|
|
|
|
|
| |
This keeps the usual trigger file name unchanged from 9.2, avoiding nasty
issues if you use a pre-9.3 pg_ctl binary with a 9.3 server or vice versa.
The fallback behavior of creating a full checkpoint before starting up is now
triggered by a file called "fallback_promote". That can be useful for
debugging purposes, but we don't expect any users to have to resort to that
and we might want to remove that in the future, which is why the fallback
mechanism is undocumented.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When upgrading from servers of versions 9.2 and older, and MultiXactIds
have been used in the old server beyond the first page (that is, 2048
multis or more in the default 8kB-page build), pg_upgrade would set the
next multixact offset to use beyond what has been allocated in the new
cluster. This would cause a failure the first time the new cluster
needs to use this value, because the pg_multixact/offsets/ file wouldn't
exist or wouldn't be large enough. To fix, ensure that the transient
server instances launched by pg_upgrade extend the file as necessary.
Per report from Jesse Denardo in
CANiVXAj4c88YqipsyFQPboqMudnjcNTdB3pqe8ReXqAFQ=HXyA@mail.gmail.com
|
| |
|
|
|
|
|
| |
Author: Andrew Gierth, David Fetter
Reviewers: Dean Rasheed, Jeevan Chalke, Stephen Frost
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
As far as I can determine, there's no code in the core distribution
that fails to explicitly set the snapshot of a scan or executor
state. If there is any such code, this will probably cause it to
seg fault; friendlier suggestions were discussed on pgsql-hackers,
but there was no consensus that anything more than this was
needed.
This is another step towards the hoped-for complete removal of
SnapshotNow.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, these functions took a HeapTupleHeader, but upcoming
patches for logical replication will introduce new a new snapshot
type under which the tuple's TID will be used to lookup (CMIN, CMAX)
for visibility determination purposes. This makes that information
available. Code churn is minimal since HeapTupleSatisfiesVisibility
took the HeapTuple anyway, and deferenced it before calling the
satisfies function.
Independently of logical replication, this allows t_tableOid and
t_self to be cross-checked via assertions in tqual.c. This seems
like a useful way to make sure that all callers are setting these
values properly, which has been previously put forward as
desirable.
Andres Freund, reviewed by Álvaro Herrera
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For simple views which are automatically updatable, this patch allows
the user to specify what level of checking should be done on records
being inserted or updated. For 'LOCAL CHECK', new tuples are validated
against the conditionals of the view they are being inserted into, while
for 'CASCADED CHECK' the new tuples are validated against the
conditionals for all views involved (from the top down).
This option is part of the SQL specification.
Dean Rasheed, reviewed by Pavel Stehule
|
|
|
|
|
|
|
| |
Also, in another comment, explain why holding an insertion slot is a
critical section.
Per review by Amit Kapila.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Initialization of the first XLOG buffer at end-of-recovery was broken for
the case that the last read WAL record ended at a page boundary. Instead of
trying to copy the last full xlog page to the buffer cache in that case,
just set shared state so that the next page is initialized when the first
WAL record after startup is inserted. (that's what we did in earlier
version, too)
To make the shared state required for that case less surprising, replace the
XLogCtl->curridx variable, which was the index of the latest initialized
buffer, with an XLogRecPtr of how far the buffers have been initialized.
That also allows us to get rid of the XLogRecEndPtrToBufIdx macro.
While we're at it, make a similar change for XLogCtl->Write.curridx, getting
rid of that variable and calculating the next buffer to write from
XLogCtl->LogwrtResult instead.
|
|
|
|
|
|
|
|
| |
Since this function assumed non-MVCC snapshots, it broke when commit
568d4138c646cd7cd8a837ac244ef2caf27c6bb8 switched its one caller from
SnapshotNow scans to MVCC-snapshot scans.
Reviewed by Robert Haas, Tom Lane and Andres Freund.
|
|
|
|
|
|
| |
Was broken by my xloginsert scaling patch. XLogCtl global variable needs
to be initialized in each process, as it's not inherited by fork() on
Windows.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces WALInsertLock with a number of WAL insertion slots,
allowing multiple backends to insert WAL records to the WAL buffers
concurrently. This is particularly useful for parallel loading large amounts
of data on a system with many CPUs.
This has one user-visible change: switching to a new WAL segment with
pg_switch_xlog() now fills the remaining unused portion of the segment with
zeros. This potentially adds some overhead, but it has been a very common
practice by DBA's to clear the "tail" of the segment with an external
pg_clearxlogtail utility anyway, to make the WAL files compress better.
With this patch, it's no longer necessary to do that.
This patch adds a new GUC, xloginsert_slots, to tune the number of WAL
insertion slots. Performance testing suggests that the default, 8, works
pretty well for all kinds of worklods, but I left the GUC in place to allow
others with different hardware to test that easily. We might want to remove
that before release.
Reviewed by Andres Freund.
|
|
|
|
|
|
|
|
|
| |
On some platforms, posix_fallocate() is available but may still return
EINVAL if the underlying filesystem does not support it. So, in case
of an error, fall through to the alternate implementation that just
writes zeros.
Per buildfarm failure and analysis by Tom Lane.
|
|
|
|
|
| |
All instances of the verbiage lagging the code. Back-patch to 9.3,
where materialized views were introduced.
|
|
|
|
|
|
|
|
| |
This function is more efficient than actually writing out zeroes to
the new file, per microbenchmarks by Jon Nelson. Also, it may reduce
the likelihood of WAL file fragmentation.
Jon Nelson, with review by Andres Freund, Greg Smith and me.
|
|
|
|
| |
Michael Paquier
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 9.3, there's no particular limit on the number of bgworkers;
instead, we just count up the number that are actually registered,
and use that to set MaxBackends. However, that approach causes
problems for Hot Standby, which needs both MaxBackends and the
size of the lock table to be the same on the standby as on the
master, yet it may not be desirable to run the same bgworkers in
both places. 9.3 handles that by failing to notice the problem,
which will probably work fine in nearly all cases anyway, but is
not theoretically sound.
A further problem with simply counting the number of registered
workers is that new workers can't be registered without a
postmaster restart. This is inconvenient for administrators,
since bouncing the postmaster causes an interruption of service.
Moreover, there are a number of applications for background
processes where, by necessity, the background process must be
started on the fly (e.g. parallel query). While this patch
doesn't actually make it possible to register new background
workers after startup time, it's a necessary prerequisite.
Patch by me. Review by Michael Paquier.
|
|
|
|
|
|
|
|
|
|
| |
Treat TOAST index just the same as normal one and get the OID
of TOAST index from pg_index but not pg_class.reltoastidxid.
This change allows us to handle multiple TOAST indexes, and
which is required infrastructure for upcoming
REINDEX CONCURRENTLY feature.
Patch by Michael Paquier, reviewed by Andres Freund and me.
|
|
|
|
|
|
|
|
|
|
|
| |
To that end, support tags rather than lengths for external datums.
As an example of how this can be used, add support or "indirect"
tuples which point to some externally allocated memory containing
a toast tuple. Similar infrastructure could be used for other
purposes, including, perhaps, support for alternative compression
algorithms.
Andres Freund, reviewed by Hitoshi Harada and myself
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't normally bother retrying when the number of bytes written by
write() is short of what was requested. It is generally assumed that a
write() to disk doesn't return short, unless you run out of disk space.
While writing the WAL, however, it seems prudent to try a bit harder,
because a failure leads to PANIC. The write() is also much larger than most
write()s in the backend (up to wal_buffers), so there's more room for
surprises.
Also retry on EINTR. All signals used in the backend are flagged SA_RESTART
nowadays, so it shouldn't happen, but better to be defensive.
|
|
|
|
|
| |
ginCompareItemPointers function is called heavily in gin index scans -
inlining it speeds up some kind of queries a lot.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Valgrind "client requests" in aset.c and mcxt.c teach Valgrind and its
Memcheck tool about the PostgreSQL allocator. This makes Valgrind
roughly as sensitive to memory errors involving palloc chunks as it is
to memory errors involving malloc chunks. Further client requests in
PageAddItem() and printtup() verify that all bits being added to a
buffer page or furnished to an output function are predictably-defined.
Those tests catch failures of C-language functions to fully initialize
the bits of a Datum, which in turn stymie optimizations that rely on
_equalConst(). Define the USE_VALGRIND symbol in pg_config_manual.h to
enable these additions. An included "suppression file" silences nominal
errors we don't plan to fix.
Reviewed in earlier versions by Peter Geoghegan and Korry Douglas.
|
|
|
|
|
|
|
| |
Every other core buffer page consumer initializes the bytes it furnishes
to PageAddItem(). For consistency, do the same here. No back-patch;
regardless, we couldn't count on the fix so long as binary upgrade can
carry forward affected index builds.
|
|
|
|
|
|
|
|
|
| |
Clang 3.3 correctly complains that a variable of type enum
MultiXactStatus cannot hold a value of -1, which makes sense. Change
the declared type of the variable to int instead, and apply casting as
necessary to avoid the warning.
Per notice from Andres Freund
|
|
|
|
|
|
|
|
|
| |
In some cases with higher numbers of subtransactions
it was possible for us to incorrectly initialize
subtrans leading to complaints of missing pages.
Bug report by Sergey Konoplev
Analysis and fix by Andres Freund
|
|
|
|
|
|
|
|
| |
Most of the documentation uses "single-user mode", so use that in the
code as well. Adjust the documentation to match the new error message
wording. Also add a documentation index entry for "single-user mode".
Based-on-patch-by: Jeff Janes <jeff.janes@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
MarkBufferDirtyHint() writes WAL, and should know if it's got a
standard buffer or not. Currently, the only callers where buffer_std
is false are related to the FSM.
In passing, rename XLOG_HINT to XLOG_FPI, which is more descriptive.
Back-patch to 9.3.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SP-GiST's original scheme for avoiding deadlocks during concurrent index
insertions doesn't work, as per report from Hailong Li, and there isn't any
evident way to make it work completely. We could possibly lock individual
inner tuples instead of their whole pages, but preliminary experimentation
suggests that the performance penalty would be huge. Instead, if we fail
to get a buffer lock while descending the tree, just restart the tree
descent altogether. We keep the old tuple positioning rules, though, in
hopes of reducing the number of cases where this can happen.
Teodor Sigaev, somewhat edited by Tom Lane
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
elog.c has historically treated LOG messages as low-priority during
bootstrap and standalone operation. This has led to confusion and even
masked a bug, because the normal expectation of code authors is that
elog(LOG) will put something into the postmaster log, and that wasn't
happening during initdb. So get rid of the special-case rule and make
the priority order the same as it is in normal operation. To keep from
cluttering initdb's output and the behavior of a standalone backend,
tweak the severity level of three messages routinely issued by xlog.c
during startup and shutdown so that they won't appear in these cases.
Per my proposal back in December.
|
|
|
|
|
|
|
|
| |
Since commit f21bb9cfb5646e1793dcc9c0ea697bab99afa523, this function
ignores the caller-provided length and loops until it finds a
terminator, which GetVirtualXIDsDelayingChkpt() never adds. Restore the
previous loop control logic. In passing, revert the addition of an
unused variable by the same commit, presumably a debugging relic.
|
| |
|
|
|
|
|
|
| |
A few more minor spelling corrections, no functional changes.
Thom Brown
|
|
|
|
|
|
|
|
| |
Seems cleaner to get the currently-replayed TLI in the same call to
GetXLogReplayRecPtr that we get the WAL position. Make it more clear in the
comment what the code does when recovery has already ended
(RecoveryInProgress() will set ThisTimeLineID in that case). Finally, make
resetting ThisTimeLineID afterwards more explicit.
|
|
|
|
|
|
| |
Fix a few spelling mistakes.
Per bug report #8193 from Lajos Veres.
|
|
|
|
|
|
|
|
|
|
| |
Make slightly better decisions about indentation than what pgindent
is capable of. Mostly breaking out long function calls into one
line per argument, with a few other minor adjustments.
No functional changes- all whitespace.
pgindent ran cleanly (didn't change anything) after.
Passes all regressions.
|
|
|
|
|
| |
This is the first run of the Perl-based pgindent script. Also update
pgindent instructions.
|
|
|
|
|
|
|
| |
Not necessary for correctness, just to make
log_checkpoints output look less singular.
Requested by Fujii Masao
|
|
|
|
|
|
|
|
|
|
|
|
| |
checkpointer needs to reset ThisTimeLineID after
a restartpoint to allow installing/recycling new
WAL files. If recovery has already ended this
would leave ThisTimeLineID set incorrectly and
so we must reset it otherwise later checkpoints
do not have the correct timeline.
Bug report by Heikki Linnakangas.
Further investigation by Heikki and myself.
|
|
|
|
|
|
|
|
| |
This simplifies the handling of crashes after fast promotion and various
minor cases that can exist in short timing windows around that case.
Broad fix to bug reported by Michael Paquier on -hackers,
approach prompted by Heikki Linnakangas
|
| |
|
|
|
|
| |
Michael Paquier
|
|
|
|
|
|
|
|
|
|
|
| |
If OID wraparound should occur while in standalone mode (unlikely but
possible), we want to advance the counter to FirstNormalObjectId not
FirstBootstrapObjectId. Otherwise, user objects might be created with OIDs
in the system-reserved range. That isn't immediately harmful but it poses
a risk of conflicts during future pg_upgrade operations.
Noted by Andres Freund. Back-patch to all supported branches, since all of
them are supported sources for pg_upgrade operations.
|