| Commit message (Collapse) | Author | Age |
... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recently-introduced code in reconstruct.c was using "unsigned"
to store the result of read(), pg_pread(), or write(). This is
completely bogus: it breaks subsequent tests for the result being
negative, as we're being reminded of by a chorus of buildfarm
warnings. Switch to "int" as was doubtless intended. (There are
several other uses of "unsigned" in this file that also look poorly
chosen to me, but for now I'm just trying to clean up the buildfarm.)
A larger problem is that "int" is not necessarily wide enough to hold
the result: per POSIX, all these functions return ssize_t. In places
where the requested read or write length clearly fits in int, that's
academic. It may be academic anyway as long as we constrain
individual data files to 1GB, since even a readv or writev-like
operation would then not be responsible for transferring more than
1GB. Nonetheless it seems like trouble waiting to happen, so I made
a pass over readv and writev calls and fixed the result variables
where that seemed appropriate. We might want to think about changing
some of the fd.c functions to return ssize_t too, for future-proofing;
but I didn't tackle that here.
Discussion: https://postgr.es/m/1672202.1703441340@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
| |
I don't think there's a real problem here, because if we reach
the loop over 'tles' then we will either find at least one
TimeLineHistoryEntry such that oldest_segno != 0, in which case
unsummarized_lsn will be initialized, or else unsummarized_tli
will remain 0 and an error will occur before unsummarized_lsn
is used for anything. But some compilers are complainining, as
reported on list by Nathan Bossart and off-list by Andrew Dunstan.
Discussion: http://postgr.es/m/20231223215147.GA69623@nathanxps13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
e0b1ee17dc introduced optimization for matching B-tree scan keys required for
the directional scan. However, it incorrectly assumed that all keys required
for opposite direction scan are satisfied by _bt_first(). It has been
illustrated that with multiple scan keys over the same column, a lesser one
(according to the scan direction) could win leaving the other one unsatisfied.
Instead of relying on _bt_first() this commit introduces code that memorizes
whether there was at least one match on the page. If that's true we know that
keys required for opposite-direction scan are satisfied as soon as
corresponding values are not NULLs.
Also, this commit simplifies the description for the optimization of keys
required for the current direction scan. Now the flag used for this is named
continuescanPrechecked and means exactly that *continuescan flag is known
to be true for the last item on the page.
Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzn0LeLcb1PdBnK0xisz8NpHkxRrMr3NWJ%2BKOK-WZ%2BQtTQ%40mail.gmail.com
Reviewed-by: Pavel Borisov
|
|
|
|
|
|
|
|
|
|
| |
It's not necessary to keep the firstPage flag as a field of BTScanOpaqueData.
This commit makes it an argument of the _bt_readpage() function. We can easily
distinguish first-time and repeated calls (within the scan) of this function.
Reported-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzk4SOsw%2BtHuTFiz8U9Jqj-R77rYPkhWKODCBb1mdHACXA%40mail.gmail.com
Reviewed-by: Pavel Borisov
|
|
|
|
|
|
|
|
|
| |
Follow up to dc2123400
Kyotaro Horiguchi
Discussion: https://postgr.es/m/20231222.154939.1509525390095583358.horikyota.ntt@gmail.com
Discussion: https://postgr.es/m/20231225.145124.1745560266993421173.horikyota.ntt@gmail.com
|
|
|
|
|
|
|
|
|
|
| |
There are a lot of situations when we share the same pointer to a Bitmapset
structure across different places. In order to evade undesirable side effects
replace_relid() function should always return a copy.
Reported-by: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4_wJthNtYBL%2BSsebpgF-5L2r5zFFk6xYbS0A78GKOTFHw%40mail.gmail.com
Reviewed-by: Richard Guo, Andres Freund, Ashutosh Bapat, Andrei Lepikhov
|
|
|
|
|
|
|
|
| |
This option forces each bitmapset modification to reallocate bitmapset. This
is useful for debugging hangling pointers to bitmapset's.
Discussion: https://postgr.es/m/CAMbWs4_wJthNtYBL%2BSsebpgF-5L2r5zFFk6xYbS0A78GKOTFHw%40mail.gmail.com
Reviewed-by: Richard Guo, Andres Freund, Ashutosh Bapat, Andrei Lepikhov
|
|
|
|
|
|
|
|
| |
New asserts validate that arguments are really bitmapsets. This should help
to early detect accesses to dangling pointers.
Discussion: https://postgr.es/m/CAMbWs4_wJthNtYBL%2BSsebpgF-5L2r5zFFk6xYbS0A78GKOTFHw%40mail.gmail.com
Reviewed-by: Richard Guo, Andres Freund, Ashutosh Bapat, Andrei Lepikhov
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
set_config_option() bails out early if it detects that the option to
be set is PGC_BACKEND or PGC_SU_BACKEND class and we're reading the
config file in a postmaster child; we don't want to apply any new
value in such a case. That's fine as far as it goes, but it fails
to consider the requirements of the pg_file_settings view: for that,
we need to check validity of the value even though we have no
intention to apply it. Because we didn't, even very silly values
for affected GUCs would be reported as valid by the view. There
are only half a dozen such GUCs, which perhaps explains why this
got overlooked for so long.
Fix by continuing when changeVal is false; this parallels the logic
in some other early-exit paths.
Also, the check added by commit 924bcf4f1 to prevent GUC changes in
parallel workers seems a few bricks shy of a load: it's evidently
assuming that ereport(elevel, ...) won't return. Make sure we
bail out if it does. The lack of trouble reports suggests that
this is only a latent bug, i.e. parallel workers don't actually
reach here with elevel < ERROR. (Per the code coverage report,
we never reach here at all in the regression suite.) But we clearly
don't want to risk proceeding if that does happen.
Per report from Rıdvan Korkmaz. These are ancient bugs, so back-patch
to all supported branches.
Discussion: https://postgr.es/m/2089235.1703617353@sss.pgh.pa.us
|
|
|
|
| |
Usage was removed in 6c5576075b but the definition was not removed.
|
|
|
|
|
|
| |
Discussion: https://postgr.es/m/18187-831da249cbd2ff8e%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
|
|
|
|
|
|
|
|
|
|
| |
Self-join removal appears to be safe to apply with placeholder variables
as long as we handle PlaceHolderVar in replace_varno_walker() and replace
relid in phinfo->ph_lateral.
Discussion: https://postgr.es/m/18187-831da249cbd2ff8e%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
|
|
|
|
|
|
|
|
|
| |
This commit also retires sje_walker. This increases the generalty of replacing
varno in the parse tree and simplifies the code.
Discussion: https://postgr.es/m/18187-831da249cbd2ff8e%40postgresql.org
Author: Richard Guo
Reviewed-by: Andrei Lepikhov
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bhis commit introduces enhancements to the pg_stat_checkpointer view by adding
three new columns: restartpoints_timed, restartpoints_req, and
restartpoints_done. These additions aim to improve the visibility and
monitoring of restartpoint processes on replicas.
Previously, it was challenging to differentiate between successful and failed
restartpoint requests. This limitation arises because restartpoints on replicas
are dependent on checkpoint records from the primary, and cannot occur more
frequently than these checkpoints.
The new columns allow for clear distinction and tracking of restartpoint
requests, their triggers, and successful completions. This enhancement aids
database administrators and developers in better understanding and diagnosing
issues related to restartpoint behavior, particularly in scenarios where
restartpoint requests may fail.
System catalog is changed. Catversion is bumped.
Discussion: https://postgr.es/m/99b2ccd1-a77a-962a-0837-191cdf56c2b9%40inbox.ru
Author: Anton A. Melnikov
Reviewed-by: Kyotaro Horiguchi, Alexander Korotkov
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a column is dropped, the fields attacl, attoptions, and
attfdwoptions were kept unchanged. This is probably harmless, but it
seems wasteful, and leaves potentially dangling data lying around (for
example, attacl could contain references to users that are later also
dropped).
Change this to set those fields to null when a column is marked as
dropped.
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://www.postgresql.org/message-id/flat/249d819d-1763-4580-8110-0bf91a0f08b7@eisentraut.org
|
|
|
|
|
|
| |
Per report from Alexander Lakhin.
Discussion: http://postgr.es/m/061cccf7-0cac-804f-4c2a-9d6da8e3848b@gmail.com
|
|
|
|
|
|
|
|
|
| |
Apparently, spell check would have been a really good idea.
Alexander Lakhin, with a few additions as per an off-list report
from Andres Freund.
Discussion: http://postgr.es/m/f08f7c60-1ad3-0b57-d580-54b11f07cddf@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is necessary when spgcanreturn() is invoked on a partitioned
index, and the failure might be reachable in other scenarios as
well. The rest of what spgGetCache() does is perfectly sensible
for a partitioned index, so we should allow it to go through.
I think the main takeaway from this is that we lack sufficient test
coverage for non-btree partitioned indexes. Therefore, I added
simple test cases for brin and gin as well as spgist (hash and
gist AMs were covered already in indexing.sql).
Per bug #18256 from Alexander Lakhin. Although the known test case
only fails since v16 (3c569049b), I've got no faith at all that there
aren't other ways to reach this problem; so back-patch to all
supported branches.
Discussion: https://postgr.es/m/18256-0b0e1b6e4a620f1b@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a bug during MERGE if a cross-partition update is attempted on a
partitioned table with a BEFORE DELETE ROW trigger that returns NULL,
to prevent the update. This would cause an error to be thrown, or an
assert failure in an assert-enabled build.
This was an oversight in 9321c79c86, which failed to properly
distinguish a DELETE prevented by a trigger from one prevented by a
concurrent update. Fix by having ExecDelete() return the TM_Result
status to ExecCrossPartitionUpdate(), so that it can distinguish the
two cases, and make ExecCrossPartitionUpdate() return the TM_Result
status to ExecUpdateAct(), so that it can return the correct status
from a concurrent update.
In addition, ensure that the command tag is correctly updated by
having ExecMergeMatched() pass canSetTag to ExecUpdateAct(), rather
than passing false, so that it updates the command tag if it does a
cross-partition update, making this code path in ExecMergeMatched()
consistent with ExecUpdate().
Per bug #18238 from Alexander Lakhin. Back-patch to v15, where MERGE
was introduced.
Dean Rasheed, reviewed by Richard Guo and Jian He.
Discussion: https://postgr.es/m/18238-2f2bdc7f720180b9%40postgresql.org
|
|
|
|
|
|
|
|
|
|
| |
The callers of this function assume that the first Oid in the list
returned by this function corresponds to the immediate parent and the
last on corresponds to the topmost parent. Make that explicit in the
function prologue.
Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAExHW5vCbATEmht861=G-BFPHNwLUqyeGa_=8-xibJ6Q1UxAeA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 6af179395 added isCatalogRel field to some WAL record types,
but this field was not shown in the rmgr descriptions. This commit
changes the several rmgr descriptions to display the isCatalogRel
field.
Author: Bertrand Drouvot
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/957dc8f9-2a02-4640-9c01-9dcbf97c4187%40gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To take an incremental backup, you use the new replication command
UPLOAD_MANIFEST to upload the manifest for the prior backup. This
prior backup could either be a full backup or another incremental
backup. You then use BASE_BACKUP with the INCREMENTAL option to take
the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST
option to trigger this behavior.
An incremental backup is like a regular full backup except that
some relation files are replaced with files with names like
INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains
additional lines identifying it as an incremental backup. The new
pg_combinebackup tool can be used to reconstruct a data directory
from a full backup and a series of incremental backups.
Patch by me. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub
Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to
Jakub for incredibly helpful and extensive testing.
Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When active, this process writes WAL summary files to
$PGDATA/pg_wal/summaries. Each summary file contains information for a
certain range of LSNs on a certain TLI. For each relation, it stores a
"limit block" which is 0 if a relation is created or destroyed within
a certain range of WAL records, or otherwise the shortest length to
which the relation was truncated during that range of WAL records, or
otherwise InvalidBlockNumber. In addition, it stores a list of blocks
which have been modified during that range of WAL records, but
excluding blocks which were removed by truncation after they were
modified and never subsequently modified again.
In other words, it tells us which blocks need to copied in case of an
incremental backup covering that range of WAL records. But this
doesn't yet add the capability to actually perform an incremental
backup; the next patch will do that.
A new parameter summarize_wal enables or disables this new background
process. The background process also automatically deletes summary
files that are older than wal_summarize_keep_time, if that parameter
has a non-zero value and the summarizer is configured to run.
Patch by me, with some design help from Dilip Kumar and Andres Freund.
Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter
Eisentraut, and Álvaro Herrera.
Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
First, mark the xlblocks member with InvalidXLogRecPtr, then issue a
write barrier, then initialize it. That ensures that the xlblocks
member doesn't appear valid while the contents are being initialized.
In preparation for reading WAL buffer contents without a lock.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACVfFMfqD5oLzZSQQZWfXiJqd-NdX0_317veP6FuB31QWA@mail.gmail.com
Reviewed-by: Andres Freund
|
|
|
|
|
|
|
|
|
|
| |
In preparation for reading the contents of WAL buffers without a
lock. Also, avoids the previously-needed comment in GetXLogBuffer()
explaining why it's safe from torn reads.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/CALj2ACVfFMfqD5oLzZSQQZWfXiJqd-NdX0_317veP6FuB31QWA@mail.gmail.com
Reviewed-by: Andres Freund
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit removes all the scripts located in src/tools/msvc/ to build
PostgreSQL with Visual Studio on Windows, meson becoming the recommended
way to achieve that. The scripts held some information that is still
relevant with meson, information kept and moved to better locations.
Comments that referred directly to the scripts are removed.
All the documentation still relevant that was in install-windows.sgml
has been moved to installation.sgml under a new subsection for Visual.
All the content specific to the scripts is removed. Some adjustments
for the documentation are planned in a follow-up set of changes.
Author: Michael Paquier
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://postgr.es/m/ZQzp_VMJcerM1Cs_@paquier.xyz
|
|
|
|
|
|
|
|
|
| |
The previous logic failed to work for anything other than the first
segment of a relation.
Report by Jakub Wartak. Patch by me.
Discussion: http://postgr.es/m/CAKZiRmwd3KTNMQhm9Bv4oR_1uMehXroO6kGyJQkiw9DfM8cMwQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's at least theoretically possible to overflow int32 when adding up
column width estimates to make a row width estimate. (The bug example
isn't terribly convincing as a real use-case, but perhaps wide joins
would provide a more plausible route to trouble.) This'd lead to
assertion failures or silly planner behavior. To forestall it, make
the relevant functions compute their running sums in int64 arithmetic
and then clamp to int32 range at the end. We can reasonably assume
that MaxAllocSize is a hard limit on actual tuple width, so clamping
to that is simply a correction for dubious input values, and there's
no need to go as far as widening width variables to int64 everywhere.
Per bug #18247 from RekGRpth. There've been no reports of this issue
arising in practical cases, so I feel no need to back-patch.
Richard Guo and Tom Lane
Discussion: https://postgr.es/m/18247-11ac477f02954422@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Remove MemoryContextAllocZeroAligned(). It was supposed to be a
faster version of MemoryContextAllocZero(), but modern compilers turn
the MemSetLoop() into a call to memset() anyway, making it more or
less identical to MemoryContextAllocZero(). That was the only user of
MemSetTest, MemSetLoop, so remove those too, as well as palloc0fast().
- Convert newNode() to a static inline function. When this was
originally originally written, it was written as a macro because
testing showed that gcc didn't inline the size check as we
intended. Modern compiler versions do, and now that it just calls
palloc0() there is no size-check to inline anyway.
One nice effect is that the palloc0() takes one less argument than
MemoryContextAllocZeroAligned(), which saves a few instructions in the
callers of newNode().
Reviewed-by: Peter Eisentraut, Tom Lane, John Naylor, Thomas Munro
Discussion: https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, we give a misleading error if the user omits to pass the
required parameter 'proto_version' in SQL API
pg_logical_slot_get_changes() or during START_REPLICATION protocol
message. The error raised is as follows which indicates that the wrong
version is passed.
ERROR: client sent proto_version=0 but server only supports protocol 1 or higher
Author: Emre Hasegeli
Reviewed-by: Peter Smith, Amit Kapila
Discussion: https://postgr.es/m/CAE2gYzwdwtUbs-tPSV-QBwgTubiyGD2ZGsSnAVsDfAGGLDrGOA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The value was double in the original implementation of this logic.
Commit da08a6598 pulled it out into a subroutine, but carelessly
declared the parameter as int when it should have been double.
On most platforms, the only ill effect would be to clamp the value
to be not more than INT_MAX, which would seldom be exceeded and
probably wouldn't change the estimates too much anyway. Nonetheless,
it's wrong and can cause complaints from ubsan.
While here, improve the comments and parameter names.
This is an ABI change in a globally exposed subroutine, so
back-patching would create some risk of breaking extensions.
The value of the fix doesn't seem high enough to warrant taking
that risk, so fix in HEAD only.
Per report from Alexander Lakhin.
Discussion: https://postgr.es/m/f5e15fe1-202d-1936-f47c-f0c69a936b72@gmail.com
|
|
|
|
|
|
|
|
|
| |
Commit dc3f9bc549 mainly targeted the JSONTYPE_NUMERIC code path.
This commit applies similar optimizations (e.g., removing
unnecessary runtime calls to strlen() and palloc()) to nearby code.
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20231208203708.GA4126315%40nathanxps13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
smgrreadv() and smgrwritev() and their md.c implementations call
FileReadV() and FileWriteV(). A range of disk blocks beginning at
'blocknum' and extending for 'nblocks' can be scattered to or gathered
from multiple buffers with a single system call. The traditional
smgrread() and smgrwrite() functions are implemented in terms of the new
functions.
Later commits will introduce calls with nblocks > 1, but the following
behavioral changes can be seen already:
* After a short transfer we'll now retry until we eventually read 0
bytes (= EOF) or get ENOSPC, EDQUOT, EFBIG etc, where previously we
would infer the reason. Retrying is consistent with xlog.c's
treatment of large WAL writes, and arguably also xlog.c and fd.c's
treatment of EINTR. Arbitrary short returns for larger transfers have
been observed on several OSes, and might in theory also happen for
transient reasons with our own pg_p*v() fallback code.
* After unexpected EOF or -1, the error thrown now talks about
a range even for the single block case, eg "blocks 42..42".
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
Originally, this routine relied on track_io_timing to check if a time
interval for an I/O operation stored in pg_stat_io should be initialized
or not. However, the addition of WAL statistics to pg_stat_io requires
that the initialization happens when track_wal_io_timing is enabled,
which is dependent on the code path where the I/O operation happens.
Author: Nazir Bilal Yavuz
Discussion: https://postgr.es/m/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com
|
|
|
|
|
|
|
|
|
| |
During the development that led to commit 357889eb17bb, for a time we
had the value LIMIT_OPTION_DEFAULT, which was mostly but not completely
removed later on, before commit. Complete the removal now.
Author: Zhang Mingli <avamingli@gmail.com>
Discussion: https://postgr.es/m/59d61a1a-3858-475a-964f-24468c97cc67@Spark
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously smgrprefetch() could issue POSIX_FADV_WILLNEED advice for a
single block at a time. Add an nblocks argument so that we can do the
same for a range of blocks. This usually produces a single system call,
but might need to loop if it crosses a segment boundary. Initially it
is only called with nblocks == 1, but proposed patches will make wider
calls.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> (earlier version)
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In v16 and up (since commit afbfc0298), large object ownership
checking has been broken because object_ownercheck() didn't take care
of the discrepancy between our object-address representation of large
objects (classId == LargeObjectRelationId) and the catalog where their
ownership info is actually stored (LargeObjectMetadataRelationId).
This resulted in failures such as "unrecognized class ID: 2613"
when trying to update blob properties as a non-superuser.
Poking around for related bugs, I found that AlterObjectOwner_internal
would pass the wrong classId to the PostAlterHook in the no-op code
path where the large object already has the desired owner. Also,
recordExtObjInitPriv checked for the wrong classId; that bug is only
latent because the stanza is dead code anyway, but as long as we're
carrying it around it should be less wrong. These bugs are quite old.
In HEAD, we can reduce the scope for future bugs of this ilk by
changing AlterObjectOwner_internal's API to let the translation happen
inside that function, rather than requiring callers to know about it.
A more bulletproof fix, perhaps, would be to start using
LargeObjectMetadataRelationId as the dependency and object-address
classId for blobs. However that has substantial risk of breaking
third-party code; even within our own code, it'd create hassles
for pg_dump which would have to cope with a version-dependent
representation. For now, keep the status quo.
Discussion: https://postgr.es/m/2650449.1702497209@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Dead tuples are ignored and are not marked as dead during recovery, as
it can lead to MVCC issues on a standby because its xmin may not match
with the primary. This information is tracked by a field called
"xactStartedInRecovery" in the transaction state data, switched on when
starting a transaction in recovery.
Unfortunately, this information was not correctly tracked when starting
a subtransaction, because the transaction state used for the
subtransaction did not update "xactStartedInRecovery" based on the state
of its parent. This would cause index scans done in subtransactions to
return inconsistent data, depending on how the xmin of the primary
and/or the standby evolved.
This is broken since the introduction of hot standby in efc16ea52067, so
backpatch all the way down.
Author: Fei Changhong
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/tencent_C4D907A5093C071A029712E73B43C6512706@qq.com
Backpatch-through: 12
|
|
|
|
|
|
|
|
| |
Commit 98e675ed7af accidentally mistyped IDENTIFY_SYSTEM as
IDENTIFY_SERVER. Backpatch to all supported branches.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/68138521-5345-8780-4390-1474afdcba1f@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FileReadV() and FileWriteV() adapt pg_preadv() and pg_pwritev() for
fd.c's virtual file descriptors. The simple FileRead() and FileWrite()
functions are now implemented in terms of the vectored functions, to
avoid code duplication, and they are converted back to the corresponding
simple system calls further down (commit 15c9ac36). Later work will
make more interesting multi-iovec calls.
The traditional behavior of reporting a "fake" ENOSPC error is
simplified. It's now always set for non-failing writes, for the benefit
of callers that expect to log a meaningful "%m" if they determine that
the write was short. (Perhaps we should consider getting rid of that
expectation one day.)
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenSSL will sometimes return SSL_ERROR_SYSCALL without having set
errno; this is apparently a reflection of recv(2)'s habit of not
setting errno when reporting EOF. Ensure that we treat such cases
the same as read EOF. Previously, we'd frequently report them like
"could not accept SSL connection: Success" which is confusing, or
worse report them with an unrelated errno left over from some
previous syscall.
To fix, ensure that errno is zeroed immediately before the call,
and report its value only when it's not zero afterwards; otherwise
report EOF.
For consistency, I've applied the same coding pattern in libpq's
pqsecure_raw_read(). Bare recv(2) shouldn't really return -1 without
setting errno, but in case it does we might as well cope.
Per report from Andres Freund. Back-patch to all supported versions.
Discussion: https://postgr.es/m/20231208181451.deqnflwxqoehhxpe@awork3.anarazel.de
|
|
|
|
|
|
|
|
|
| |
This removes the production json_encoding_clause_opt, instead merging
it into json_format_clause. Also remove the auxiliary
makeJsonEncoding() function.
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/202312071841.u2gueb5dsrbk%40alvherre.pgsql
|
|
|
|
|
|
|
|
|
|
|
|
| |
This GUC was intended as a debugging help in the 9.0 area when hot
standby and streaming replication were being developped, able to offer
more information at LOG level rather than DEBUGn. There are more tools
available these days that are able to offer rather equivalent
information, like pg_waldump introduced in 9.3. It is not obvious how
this facility is useful these days, so let's remove it.
Author: Bharath Rupireddy
Discussion: https://postgr.es/m/ZXEXEAUVFrvpquSd@paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The apply worker needs to update the state of the subscription tables to
'READY' during the synchronization phase which requires locking the
corresponding subscription. The apply worker also waits for the
subscription tables to reach the 'SYNCDONE' state after holding the locks
on the subscription and the wait is done using WaitLatch. The 'SYNCDONE'
state is changed by tablesync workers again by locking the corresponding
subscription. Both the state updates use AccessShareLock mode to lock the
subscription, so they can't block each other. However, a backend can
simultaneously try to acquire a lock on the same subscription using
AccessExclusiveLock mode to alter the subscription. Now, the backend's
wait on a lock can sneak in between the apply worker and table sync worker
causing deadlock.
In other words, apply_worker waits for tablesync worker which waits for
backend, and backend waits for apply worker. This is not detected by the
deadlock detector because apply worker uses WaitLatch.
The fix is to release existing locks in apply worker before it starts to
wait for tablesync worker to change the state.
Reported-by: Tomas Vondra
Author: Shlok Kyal
Reviewed-by: Amit Kapila, Peter Smith
Backpatch-through: 12
Discussion: https://postgr.es/m/d291bb50-12c4-e8af-2af2-7bb9bb4d8e3e@enterprisedb.com
|
|
|
|
|
|
|
|
|
|
| |
Remove comments that supposed that holding a pin was a useful interlock
for _bt_walk_left(). There are times when _bt_walk_left() doesn't hold
either a lock or a pin on any page, so clearly this can't be true.
_bt_walk_left() is even prepared to deal with concurrent deletion of
both the original page and any pages to its left.
Oversight in commit 2ed5b87f96.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit does the following:
* In datum_to_json_internal(), the call to IsValidJsonNumber() is
replaced with simplified validation code. This avoids an extra
call to strlen() in this path, and it avoids validating the
entire string (which is okay since we know we're dealing with a
numeric data type's output).
* In datum_to_json_internal(), the call to escape_json() in the
JSONTYPE_NUMERIC path is replaced with code that just surrounds
the string with quotes. In passing, some other nearby calls to
appendStringInfo() have been replaced with similar code to avoid
unnecessary calls to vsnprintf().
* In composite_to_json(), the length of the separator is now
determined at compile time to avoid unnecessary calls to
strlen().
On my machine, this speeds up a benchmark for the proposed COPY TO
(FORMAT json) command with many integers by upwards of 20%. There
are likely other code paths that could be given a similar
treatment, but that is left as a future exercise.
Reviewed-by: Jeff Davis, Tom Lane, David Rowley, John Naylor
Discussion: https://postgr.es/m/20231207231251.GB3359478%40nathanxps13
|
|
|
|
|
|
|
|
|
|
| |
When setting GUCs from proconfig, performance is important, and hash
lookups in the GUC table are significant.
Per suggestion from Robert Haas.
Discussion: https://postgr.es/m/CA+TgmoYpKxhR3HOD9syK2XwcAUVPa0+ba0XPnwWBcYxtKLkyxA@mail.gmail.com
Reviewed-by: John Naylor
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Teach _bt_binsrch (and related helper routines like _bt_search and
_bt_compare) about the initial positioning requirements of backward
scans. Routines like _bt_binsrch already know all about "nextkey"
searches, so it seems natural to teach them about "goback"/backward
searches, too. These concepts are closely related, and are much easier
to understand when discussed together.
Now that certain implementation details are hidden from _bt_first, it's
straightforward to add a new optimization: backward scans using the <
strategy now avoid extra leaf page accesses in certain "boundary cases".
Consider the following example, which uses the tenk1 table (and its
tenk1_hundred index) from the standard regression tests:
SELECT * FROM tenk1 WHERE hundred < 12 ORDER BY hundred DESC LIMIT 1;
Before this commit, nbtree would scan two leaf pages, even though it was
only really necessary to scan one leaf page. We'll now descend straight
to the leaf page containing a (12, -inf) high key instead. The scan
will locate matching non-pivot tuples with "hundred" values starting
from the value 11. The scan won't waste a page access on the right
sibling leaf page, which cannot possibly contain any matching tuples.
You can think of the optimization added by this commit as disabling an
optimization (the _bt_compare "!pivotsearch" behavior that was added to
Postgres 12 in commit dd299df8) for a small subset of cases where it was
always counterproductive.
Equivalently, you can think of the new optimization as extending the
"pivotsearch" behavior that page deletion by VACUUM has long required
(since the aforementioned Postgres 12 commit went in) to other, similar
cases. Obviously, this isn't strictly necessary for these new cases
(unlike VACUUM, _bt_first is prepared to move the scan to the left once
on the leaf level), but the underlying principle is the same.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=XPzM8HzaLPq278Vms420mVSHfgs9wi5tjFKHcapZCEw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow using multiple worker processes to build BRIN index, which until
now was supported only for BTREE indexes. For large tables this often
results in significant speedup when the build is CPU-bound.
The work is split in a simple way - each worker builds BRIN summaries on
a subset of the table, determined by the regular parallel scan used to
read the data, and feeds them into a shared tuplesort which sorts them
by blkno (start of the range). The leader then reads this sorted stream
of ranges, merges duplicates (which may happen if the parallel scan does
not align with BRIN pages_per_range), and adds the resulting ranges into
the index.
The number of duplicate results produced by workers (requiring merging
in the leader process) should be fairly small, thanks to how parallel
scans assign chunks to workers. The likelihood of duplicate results may
increase for higher pages_per_range values, but then there are fewer
page ranges in total. In any case, we expect the merging to be much
cheaper than summarization, so this should be a win.
Most of the parallelism infrastructure is a simplified copy of the code
used by BTREE indexes, omitting the parts irrelevant for BRIN indexes
(e.g. uniqueness checks).
This also introduces a new index AM flag amcanbuildparallel, determining
whether to attempt to start parallel workers for the index build.
Original patch by me, with reviews and substantial reworks by Matthias
van de Meent, certainly enough to make him a co-author.
Author: Tomas Vondra, Matthias van de Meent
Reviewed-by: Matthias van de Meent
Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com
|