aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Show isCatalogRel in several rmgr descriptions.Masahiko Sawada2023-12-21
| | | | | | | | | | | Commit 6af179395 added isCatalogRel field to some WAL record types, but this field was not shown in the rmgr descriptions. This commit changes the several rmgr descriptions to display the isCatalogRel field. Author: Bertrand Drouvot Reviewed-by: Michael Paquier, Masahiko Sawada Discussion: https://postgr.es/m/957dc8f9-2a02-4640-9c01-9dcbf97c4187%40gmail.com
* Add support for incremental backup.Robert Haas2023-12-20
| | | | | | | | | | | | | | | | | | | | | | To take an incremental backup, you use the new replication command UPLOAD_MANIFEST to upload the manifest for the prior backup. This prior backup could either be a full backup or another incremental backup. You then use BASE_BACKUP with the INCREMENTAL option to take the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST option to trigger this behavior. An incremental backup is like a regular full backup except that some relation files are replaced with files with names like INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains additional lines identifying it as an incremental backup. The new pg_combinebackup tool can be used to reconstruct a data directory from a full backup and a series of incremental backups. Patch by me. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to Jakub for incredibly helpful and extensive testing. Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
* Add a new WAL summarizer process.Robert Haas2023-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When active, this process writes WAL summary files to $PGDATA/pg_wal/summaries. Each summary file contains information for a certain range of LSNs on a certain TLI. For each relation, it stores a "limit block" which is 0 if a relation is created or destroyed within a certain range of WAL records, or otherwise the shortest length to which the relation was truncated during that range of WAL records, or otherwise InvalidBlockNumber. In addition, it stores a list of blocks which have been modified during that range of WAL records, but excluding blocks which were removed by truncation after they were modified and never subsequently modified again. In other words, it tells us which blocks need to copied in case of an incremental backup covering that range of WAL records. But this doesn't yet add the capability to actually perform an incremental backup; the next patch will do that. A new parameter summarize_wal enables or disables this new background process. The background process also automatically deletes summary files that are older than wal_summarize_keep_time, if that parameter has a non-zero value and the summarizer is configured to run. Patch by me, with some design help from Dilip Kumar and Andres Freund. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter Eisentraut, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com
* Additional write barrier in AdvanceXLInsertBuffer().Jeff Davis2023-12-19
| | | | | | | | | | | | First, mark the xlblocks member with InvalidXLogRecPtr, then issue a write barrier, then initialize it. That ensures that the xlblocks member doesn't appear valid while the contents are being initialized. In preparation for reading WAL buffer contents without a lock. Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACVfFMfqD5oLzZSQQZWfXiJqd-NdX0_317veP6FuB31QWA@mail.gmail.com Reviewed-by: Andres Freund
* Use 64-bit atomics for xlblocks array elements.Jeff Davis2023-12-19
| | | | | | | | | | In preparation for reading the contents of WAL buffers without a lock. Also, avoids the previously-needed comment in GetXLogBuffer() explaining why it's safe from torn reads. Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACVfFMfqD5oLzZSQQZWfXiJqd-NdX0_317veP6FuB31QWA@mail.gmail.com Reviewed-by: Andres Freund
* Remove MSVC scriptsMichael Paquier2023-12-20
| | | | | | | | | | | | | | | | | This commit removes all the scripts located in src/tools/msvc/ to build PostgreSQL with Visual Studio on Windows, meson becoming the recommended way to achieve that. The scripts held some information that is still relevant with meson, information kept and moved to better locations. Comments that referred directly to the scripts are removed. All the documentation still relevant that was in install-windows.sgml has been moved to installation.sgml under a new subsection for Visual. All the content specific to the scripts is removed. Some adjustments for the documentation are planned in a follow-up set of changes. Author: Michael Paquier Reviewed-by: Peter Eisentraut, Andres Freund Discussion: https://postgr.es/m/ZQzp_VMJcerM1Cs_@paquier.xyz
* Fix brown paper bag bug in 5c47c6546c413d5eb51c1626070a807026e6139d.Robert Haas2023-12-19
| | | | | | | | | The previous logic failed to work for anything other than the first segment of a relation. Report by Jakub Wartak. Patch by me. Discussion: http://postgr.es/m/CAKZiRmwd3KTNMQhm9Bv4oR_1uMehXroO6kGyJQkiw9DfM8cMwQ@mail.gmail.com
* Prevent integer overflow when forming tuple width estimates.Tom Lane2023-12-19
| | | | | | | | | | | | | | | | | | | | It's at least theoretically possible to overflow int32 when adding up column width estimates to make a row width estimate. (The bug example isn't terribly convincing as a real use-case, but perhaps wide joins would provide a more plausible route to trouble.) This'd lead to assertion failures or silly planner behavior. To forestall it, make the relevant functions compute their running sums in int64 arithmetic and then clamp to int32 range at the end. We can reasonably assume that MaxAllocSize is a hard limit on actual tuple width, so clamping to that is simply a correction for dubious input values, and there's no need to go as far as widening width variables to int64 everywhere. Per bug #18247 from RekGRpth. There've been no reports of this issue arising in practical cases, so I feel no need to back-patch. Richard Guo and Tom Lane Discussion: https://postgr.es/m/18247-11ac477f02954422@postgresql.org
* Simplify newNode() by removing special casesHeikki Linnakangas2023-12-19
| | | | | | | | | | | | | | | | | | | | | - Remove MemoryContextAllocZeroAligned(). It was supposed to be a faster version of MemoryContextAllocZero(), but modern compilers turn the MemSetLoop() into a call to memset() anyway, making it more or less identical to MemoryContextAllocZero(). That was the only user of MemSetTest, MemSetLoop, so remove those too, as well as palloc0fast(). - Convert newNode() to a static inline function. When this was originally originally written, it was written as a macro because testing showed that gcc didn't inline the size check as we intended. Modern compiler versions do, and now that it just calls palloc0() there is no size-check to inline anyway. One nice effect is that the palloc0() takes one less argument than MemoryContextAllocZeroAligned(), which saves a few instructions in the callers of newNode(). Reviewed-by: Peter Eisentraut, Tom Lane, John Naylor, Thomas Munro Discussion: https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi
* pgoutput: Raise an error for missing protocol version parameter.Amit Kapila2023-12-19
| | | | | | | | | | | | | Currently, we give a misleading error if the user omits to pass the required parameter 'proto_version' in SQL API pg_logical_slot_get_changes() or during START_REPLICATION protocol message. The error raised is as follows which indicates that the wrong version is passed. ERROR: client sent proto_version=0 but server only supports protocol 1 or higher Author: Emre Hasegeli Reviewed-by: Peter Smith, Amit Kapila Discussion: https://postgr.es/m/CAE2gYzwdwtUbs-tPSV-QBwgTubiyGD2ZGsSnAVsDfAGGLDrGOA@mail.gmail.com
* compute_bitmap_pages' loop_count parameter should be double not int.Tom Lane2023-12-18
| | | | | | | | | | | | | | | | | | | | | The value was double in the original implementation of this logic. Commit da08a6598 pulled it out into a subroutine, but carelessly declared the parameter as int when it should have been double. On most platforms, the only ill effect would be to clamp the value to be not more than INT_MAX, which would seldom be exceeded and probably wouldn't change the estimates too much anyway. Nonetheless, it's wrong and can cause complaints from ubsan. While here, improve the comments and parameter names. This is an ABI change in a globally exposed subroutine, so back-patching would create some risk of breaking extensions. The value of the fix doesn't seem high enough to warrant taking that risk, so fix in HEAD only. Per report from Alexander Lakhin. Discussion: https://postgr.es/m/f5e15fe1-202d-1936-f47c-f0c69a936b72@gmail.com
* Micro-optimize datum_to_json_internal() some more.Nathan Bossart2023-12-18
| | | | | | | | | Commit dc3f9bc549 mainly targeted the JSONTYPE_NUMERIC code path. This commit applies similar optimizations (e.g., removing unnecessary runtime calls to strlen() and palloc()) to nearby code. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20231208203708.GA4126315%40nathanxps13
* Provide vectored variants of smgrread() and smgrwrite().Thomas Munro2023-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | smgrreadv() and smgrwritev() and their md.c implementations call FileReadV() and FileWriteV(). A range of disk blocks beginning at 'blocknum' and extending for 'nblocks' can be scattered to or gathered from multiple buffers with a single system call. The traditional smgrread() and smgrwrite() functions are implemented in terms of the new functions. Later commits will introduce calls with nblocks > 1, but the following behavioral changes can be seen already: * After a short transfer we'll now retry until we eventually read 0 bytes (= EOF) or get ENOSPC, EDQUOT, EFBIG etc, where previously we would infer the reason. Retrying is consistent with xlog.c's treatment of large WAL writes, and arguably also xlog.c and fd.c's treatment of EINTR. Arbitrary short returns for larger transfers have been observed on several OSes, and might in theory also happen for transient reasons with our own pg_p*v() fallback code. * After unexpected EOF or -1, the error thrown now talks about a range even for the single block case, eg "blocks 42..42". Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
* Refactor pgstat_prepare_io_time() with an input argument instead of a GUCMichael Paquier2023-12-16
| | | | | | | | | | | Originally, this routine relied on track_io_timing to check if a time interval for an I/O operation stored in pg_stat_io should be initialized or not. However, the addition of WAL statistics to pg_stat_io requires that the initialization happens when track_wal_io_timing is enabled, which is dependent on the code path where the I/O operation happens. Author: Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com
* Remove useless LIMIT_OPTION_DEFAULT value from LimitOptionAlvaro Herrera2023-12-16
| | | | | | | | | During the development that led to commit 357889eb17bb, for a time we had the value LIMIT_OPTION_DEFAULT, which was mostly but not completely removed later on, before commit. Complete the removal now. Author: Zhang Mingli <avamingli@gmail.com> Discussion: https://postgr.es/m/59d61a1a-3858-475a-964f-24468c97cc67@Spark
* Provide multi-block smgrprefetch().Thomas Munro2023-12-16
| | | | | | | | | | | | Previously smgrprefetch() could issue POSIX_FADV_WILLNEED advice for a single block at a time. Add an nblocks argument so that we can do the same for a range of blocks. This usually produces a single system call, but might need to loop if it crosses a segment boundary. Initially it is only called with nblocks == 1, but proposed patches will make wider calls. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> (earlier version) Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
* Fix bugs in manipulation of large objects.Tom Lane2023-12-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In v16 and up (since commit afbfc0298), large object ownership checking has been broken because object_ownercheck() didn't take care of the discrepancy between our object-address representation of large objects (classId == LargeObjectRelationId) and the catalog where their ownership info is actually stored (LargeObjectMetadataRelationId). This resulted in failures such as "unrecognized class ID: 2613" when trying to update blob properties as a non-superuser. Poking around for related bugs, I found that AlterObjectOwner_internal would pass the wrong classId to the PostAlterHook in the no-op code path where the large object already has the desired owner. Also, recordExtObjInitPriv checked for the wrong classId; that bug is only latent because the stanza is dead code anyway, but as long as we're carrying it around it should be less wrong. These bugs are quite old. In HEAD, we can reduce the scope for future bugs of this ilk by changing AlterObjectOwner_internal's API to let the translation happen inside that function, rather than requiring callers to know about it. A more bulletproof fix, perhaps, would be to start using LargeObjectMetadataRelationId as the dependency and object-address classId for blobs. However that has substantial risk of breaking third-party code; even within our own code, it'd create hassles for pg_dump which would have to cope with a version-dependent representation. For now, keep the status quo. Discussion: https://postgr.es/m/2650449.1702497209@sss.pgh.pa.us
* Prevent tuples to be marked as dead in subtransactions on standbysMichael Paquier2023-12-12
| | | | | | | | | | | | | | | | | | | | | | | Dead tuples are ignored and are not marked as dead during recovery, as it can lead to MVCC issues on a standby because its xmin may not match with the primary. This information is tracked by a field called "xactStartedInRecovery" in the transaction state data, switched on when starting a transaction in recovery. Unfortunately, this information was not correctly tracked when starting a subtransaction, because the transaction state used for the subtransaction did not update "xactStartedInRecovery" based on the state of its parent. This would cause index scans done in subtransactions to return inconsistent data, depending on how the xmin of the primary and/or the standby evolved. This is broken since the introduction of hot standby in efc16ea52067, so backpatch all the way down. Author: Fei Changhong Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/tencent_C4D907A5093C071A029712E73B43C6512706@qq.com Backpatch-through: 12
* Fix typo in commentDaniel Gustafsson2023-12-12
| | | | | | | | Commit 98e675ed7af accidentally mistyped IDENTIFY_SYSTEM as IDENTIFY_SERVER. Backpatch to all supported branches. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/68138521-5345-8780-4390-1474afdcba1f@gmail.com
* Provide vectored variants of FileRead() and FileWrite().Thomas Munro2023-12-12
| | | | | | | | | | | | | | | | | | FileReadV() and FileWriteV() adapt pg_preadv() and pg_pwritev() for fd.c's virtual file descriptors. The simple FileRead() and FileWrite() functions are now implemented in terms of the vectored functions, to avoid code duplication, and they are converted back to the corresponding simple system calls further down (commit 15c9ac36). Later work will make more interesting multi-iovec calls. The traditional behavior of reporting a "fake" ENOSPC error is simplified. It's now always set for non-failing writes, for the benefit of callers that expect to log a meaningful "%m" if they determine that the write was short. (Perhaps we should consider getting rid of that expectation one day.) Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com
* Be more wary about OpenSSL not setting errno on error.Tom Lane2023-12-11
| | | | | | | | | | | | | | | | | | | | | | OpenSSL will sometimes return SSL_ERROR_SYSCALL without having set errno; this is apparently a reflection of recv(2)'s habit of not setting errno when reporting EOF. Ensure that we treat such cases the same as read EOF. Previously, we'd frequently report them like "could not accept SSL connection: Success" which is confusing, or worse report them with an unrelated errno left over from some previous syscall. To fix, ensure that errno is zeroed immediately before the call, and report its value only when it's not zero afterwards; otherwise report EOF. For consistency, I've applied the same coding pattern in libpq's pqsecure_raw_read(). Bare recv(2) shouldn't really return -1 without setting errno, but in case it does we might as well cope. Per report from Andres Freund. Back-patch to all supported versions. Discussion: https://postgr.es/m/20231208181451.deqnflwxqoehhxpe@awork3.anarazel.de
* Simplify productions for FORMAT JSON [ ENCODING name ]Alvaro Herrera2023-12-11
| | | | | | | | | This removes the production json_encoding_clause_opt, instead merging it into json_format_clause. Also remove the auxiliary makeJsonEncoding() function. Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/202312071841.u2gueb5dsrbk%40alvherre.pgsql
* Remove trace_recovery_messagesMichael Paquier2023-12-11
| | | | | | | | | | | | This GUC was intended as a debugging help in the 9.0 area when hot standby and streaming replication were being developped, able to offer more information at LOG level rather than DEBUGn. There are more tools available these days that are able to offer rather equivalent information, like pg_waldump introduced in 9.3. It is not obvious how this facility is useful these days, so let's remove it. Author: Bharath Rupireddy Discussion: https://postgr.es/m/ZXEXEAUVFrvpquSd@paquier.xyz
* Fix an undetected deadlock due to apply worker.Amit Kapila2023-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The apply worker needs to update the state of the subscription tables to 'READY' during the synchronization phase which requires locking the corresponding subscription. The apply worker also waits for the subscription tables to reach the 'SYNCDONE' state after holding the locks on the subscription and the wait is done using WaitLatch. The 'SYNCDONE' state is changed by tablesync workers again by locking the corresponding subscription. Both the state updates use AccessShareLock mode to lock the subscription, so they can't block each other. However, a backend can simultaneously try to acquire a lock on the same subscription using AccessExclusiveLock mode to alter the subscription. Now, the backend's wait on a lock can sneak in between the apply worker and table sync worker causing deadlock. In other words, apply_worker waits for tablesync worker which waits for backend, and backend waits for apply worker. This is not detected by the deadlock detector because apply worker uses WaitLatch. The fix is to release existing locks in apply worker before it starts to wait for tablesync worker to change the state. Reported-by: Tomas Vondra Author: Shlok Kyal Reviewed-by: Amit Kapila, Peter Smith Backpatch-through: 12 Discussion: https://postgr.es/m/d291bb50-12c4-e8af-2af2-7bb9bb4d8e3e@enterprisedb.com
* Fix nbtree backward scan race condition comments.Peter Geoghegan2023-12-08
| | | | | | | | | | Remove comments that supposed that holding a pin was a useful interlock for _bt_walk_left(). There are times when _bt_walk_left() doesn't hold either a lock or a pin on any page, so clearly this can't be true. _bt_walk_left() is even prepared to deal with concurrent deletion of both the original page and any pages to its left. Oversight in commit 2ed5b87f96.
* Micro-optimize JSONTYPE_NUMERIC code path in json.c.Nathan Bossart2023-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit does the following: * In datum_to_json_internal(), the call to IsValidJsonNumber() is replaced with simplified validation code. This avoids an extra call to strlen() in this path, and it avoids validating the entire string (which is okay since we know we're dealing with a numeric data type's output). * In datum_to_json_internal(), the call to escape_json() in the JSONTYPE_NUMERIC path is replaced with code that just surrounds the string with quotes. In passing, some other nearby calls to appendStringInfo() have been replaced with similar code to avoid unnecessary calls to vsnprintf(). * In composite_to_json(), the length of the separator is now determined at compile time to avoid unnecessary calls to strlen(). On my machine, this speeds up a benchmark for the proposed COPY TO (FORMAT json) command with many integers by upwards of 20%. There are likely other code paths that could be given a similar treatment, but that is left as a future exercise. Reviewed-by: Jeff Davis, Tom Lane, David Rowley, John Naylor Discussion: https://postgr.es/m/20231207231251.GB3359478%40nathanxps13
* Cache opaque handle for GUC option to avoid repeasted lookups.Jeff Davis2023-12-08
| | | | | | | | | | When setting GUCs from proconfig, performance is important, and hash lookups in the GUC table are significant. Per suggestion from Robert Haas. Discussion: https://postgr.es/m/CA+TgmoYpKxhR3HOD9syK2XwcAUVPa0+ba0XPnwWBcYxtKLkyxA@mail.gmail.com Reviewed-by: John Naylor
* Optimize nbtree backward scan boundary cases.Peter Geoghegan2023-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach _bt_binsrch (and related helper routines like _bt_search and _bt_compare) about the initial positioning requirements of backward scans. Routines like _bt_binsrch already know all about "nextkey" searches, so it seems natural to teach them about "goback"/backward searches, too. These concepts are closely related, and are much easier to understand when discussed together. Now that certain implementation details are hidden from _bt_first, it's straightforward to add a new optimization: backward scans using the < strategy now avoid extra leaf page accesses in certain "boundary cases". Consider the following example, which uses the tenk1 table (and its tenk1_hundred index) from the standard regression tests: SELECT * FROM tenk1 WHERE hundred < 12 ORDER BY hundred DESC LIMIT 1; Before this commit, nbtree would scan two leaf pages, even though it was only really necessary to scan one leaf page. We'll now descend straight to the leaf page containing a (12, -inf) high key instead. The scan will locate matching non-pivot tuples with "hundred" values starting from the value 11. The scan won't waste a page access on the right sibling leaf page, which cannot possibly contain any matching tuples. You can think of the optimization added by this commit as disabling an optimization (the _bt_compare "!pivotsearch" behavior that was added to Postgres 12 in commit dd299df8) for a small subset of cases where it was always counterproductive. Equivalently, you can think of the new optimization as extending the "pivotsearch" behavior that page deletion by VACUUM has long required (since the aforementioned Postgres 12 commit went in) to other, similar cases. Obviously, this isn't strictly necessary for these new cases (unlike VACUUM, _bt_first is prepared to move the scan to the left once on the leaf level), but the underlying principle is the same. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=XPzM8HzaLPq278Vms420mVSHfgs9wi5tjFKHcapZCEw@mail.gmail.com
* Allow parallel CREATE INDEX for BRIN indexesTomas Vondra2023-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow using multiple worker processes to build BRIN index, which until now was supported only for BTREE indexes. For large tables this often results in significant speedup when the build is CPU-bound. The work is split in a simple way - each worker builds BRIN summaries on a subset of the table, determined by the regular parallel scan used to read the data, and feeds them into a shared tuplesort which sorts them by blkno (start of the range). The leader then reads this sorted stream of ranges, merges duplicates (which may happen if the parallel scan does not align with BRIN pages_per_range), and adds the resulting ranges into the index. The number of duplicate results produced by workers (requiring merging in the leader process) should be fairly small, thanks to how parallel scans assign chunks to workers. The likelihood of duplicate results may increase for higher pages_per_range values, but then there are fewer page ranges in total. In any case, we expect the merging to be much cheaper than summarization, so this should be a win. Most of the parallelism infrastructure is a simplified copy of the code used by BTREE indexes, omitting the parts irrelevant for BRIN indexes (e.g. uniqueness checks). This also introduces a new index AM flag amcanbuildparallel, determining whether to attempt to start parallel workers for the index build. Original patch by me, with reviews and substantial reworks by Matthias van de Meent, certainly enough to make him a co-author. Author: Tomas Vondra, Matthias van de Meent Reviewed-by: Matthias van de Meent Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com
* Add empty BRIN ranges during CREATE INDEXTomas Vondra2023-12-08
| | | | | | | | | | | | | | | | | | | | | | | When building BRIN indexes, the brinbuildCallback only advances to the next page range when seeing a tuple that doesn't belong to the current one. This means that the index may end up missing ranges at the end of the table, if those pages do not contain any indexable tuples. We tend not to have completely empty pages at the end of a relation, but this also applies to partial indexes, where the tuples may simply not match the index predicate. This results in inefficient scans using the affected BRIN index - without the summaries, the page ranges have to be read and processed, which consumes I/O and possibly also CPU time. The existing code already added empty ranges for earlier parts of the table, this commit makes sure we add them for the ranges at the end of the table too. Patch by Matthias van de Meent, with review/improvements by me. Author: Matthias van de Meent Reviewed-by: Tomas Vondra Discussion: https://postgr.es/m/CAEze2WiMsPZg%3DxkvSF_jt4%3D69k6K7gz5B8V2wY3gCGZ%2B1BzCbQ%40mail.gmail.com
* Remove some unnecessary #includes of postmaster/interrupt.hHeikki Linnakangas2023-12-08
| | | | | Commit 44fc6e259b removed a bunch of references to SignalHandlerForCrashExit, leaving these #includes unneeded.
* Rename ShmemVariableCache to TransamVariablesHeikki Linnakangas2023-12-08
| | | | | | | | The old name was misleading: It's not a cache, the values kept in the struct are the authoritative source. Reviewed-by: Tristan Partin, Richard Guo Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi
* Initialize ShmemVariableCache like other shmem areasHeikki Linnakangas2023-12-08
| | | | | | | For sake of consistency. Reviewed-by: Tristan Partin, Richard Guo Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi
* Don't try to open visibilitymap when analyzing a foreign tableHeikki Linnakangas2023-12-08
| | | | | | | | It's harmless, visibilitymap_count() returns 0 if the file doesn't exist. But it's also very pointless. I noticed this when I added an assertion in smgropen() that the relnumber is valid. Discussion: https://www.postgresql.org/message-id/621a52fd-3cd8-4f5d-a561-d510b853bbaf@iki.fi
* Fix potential pointer overflow in xlogreader.c.Thomas Munro2023-12-08
| | | | | | | | | | | | | | | | | | | While checking if a record could fit in the circular WAL decoding buffer, the coding from commit 3f1ce973 used arithmetic that could overflow. 64 bit systems were unaffected for various technical reasons, which probably explains the lack of problem reports. Likewise for 32 bit systems running known 32 bit kernels. The systems at risk of problems appear to be 32 bit processes running on 64 bit kernels, with unlucky placement in memory. Per complaint from GCC -fsanitize=undefined -m32, while testing variations of 039_end_of_wal.pl. Back-patch to 15. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGKH0oRPOX7DhiQ_b51sM8HqcPp2J3WA-Oen%3DdXog%2BAGGQ%40mail.gmail.com
* Verify that attribute counts match in ExecCopySlotDavid Rowley2023-12-07
| | | | | | | | | | | | | | | | | | | | | | | | tts_virtual_copyslot() contained an Assert that checked that the srcslot contained <= attributes than the dstslot. This seems to be backwards as if the srcslot contained fewer attributes then the dstslot could be left with stale Datum values from the previously stored tuple. It might make more sense to allow the source to contain additional attributes and only copy the leading ones that also exist in the destination, however, that's not what we're doing here. Here we just remove the Assert from tts_virtual_copyslot() and add an Assert to ExecCopySlot() to verify the attribute counts match. There does not seem to be any places where the destination contains fewer attributes, so instead of going to the trouble of making the code properly handle this, let's just ensure the attribute counts match. If this Assert fails then that will indicate that we do have cases that require us to handle the srcslot with more attributes than the dstslot. It seems better to only write this code if there's a genuine requirement for it rather than write it now only to leave it untested. Thanks to Andres Freund for helping with the analysis of this. Discussion: https://postgr.es/m/CAApHDvpMAvBL0T+TRORquyx1iqFQKMVTXtqNocOw0Pa2uh1heg@mail.gmail.com
* Improve some error messages with invalid indexes for REINDEX CONCURRENTLYMichael Paquier2023-12-07
| | | | | | | | | | | | | | An invalid index is skipped when doing REINDEX CONCURRENTLY at table level, with INDEX_CORRUPTED used as errcode. This is confusing, because an invalid index could exist after an interruption. The errcode is switched to OBJECT_NOT_IN_PREREQUISITE_STATE instead, as per a suggestion from Andres Freund. While on it, the error messages are reworded, and a hint is added, telling how to rebuild an invalid index in this case. This has been suggested by Noah Misch. Discussion: https://postgr.es/m/20231118230958.4fm3fhk4ypshxopa@awork3.anarazel.de
* Fix issues in binary_upgrade_logical_slot_has_caught_up().Amit Kapila2023-12-07
| | | | | | | | | | | | | | | | | The commit 29d0a77fa6 labelled binary_upgrade_logical_slot_has_caught_up() as a non-strict function to allow providing a better error message to callers in case the passed slot_name is NULL. On further discussion, it seems that it is not helpful to have a different error message for NULL input in this function, so this patch marks the function as strict. This patch also removes the explicit permission check to use replication slots as this function is invoked only by superusers and instead adds an Assert. Reported-by: Masahiko Sawada Author: Hayato Kuroda Reviewed-by: Vignesh C Discussion: https://postgr.es/m/CAD21AoDSyiBKkMXBxN_gUayZZUCOgyHnG8Ge8rcPXNP3Tf6B4g@mail.gmail.com
* Fix assertion failure with REINDEX and event triggersMichael Paquier2023-12-07
| | | | | | | | | | | | | | | | | | | | | | | A REINDEX CONCURRENTLY run on a table with no indexes would always pop the topmost snapshot from the active snapshot stack, making the snapshot handling inconsistent between the multiple-relation and single-relation cases. This commit slightly changes the snapshot stack handling so as a snapshot is popped only ReindexMultipleInternal() in this case after a relation has been reindexed, fixing a problem where an event trigger function may need a snapshot but does not have one. This also keeps the places where PopActiveSnapshot() is called closer to each other. While on it, this expands the existing tests to cover all the cases that could be faced with REINDEX commands and such event triggers, for one or more relations, with or without indexes. This behavior is inconsistent since 5dc92b844e68, but we've never had a need for an active snapshot at the end of a REINDEX until now. Thanks also to Jian He for the input. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/cb538743-484c-eb6a-a8c5-359980cd3a17@gmail.com
* Fix compilation on Windows with WAL_DEBUGMichael Paquier2023-12-06
| | | | | | | | | | This has been broken since b060dbe0001a that has reworked the callback mechanism of XLogReader, most likely unnoticed because any form of development involving WAL happens on platforms where this compiles fine. Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACVF14WKQMFwcJ=3okVDhiXpuK5f7YdT+BdYXbbypMHqWA@mail.gmail.com Backpatch-through: 13
* Fix indentationDaniel Gustafsson2023-12-05
| | | | | | When preparing commit 98e675ed7af I accidentally forgot to run pgindent, which did produce a diff. Fix by adding the required whitespace per the koel buildfarm failure.
* Fix incorrect error message for IDENTIFY_SYSTEMDaniel Gustafsson2023-12-05
| | | | | | | | | | | | | | | | Commit 5a991ef8692e accidentally reversed the order of the tuples and fields parameters, making the error message incorrectly refer to 3 tuples with 1 field when IDENTIFY_SYSTEM returns 1 tuple and 3 or 4 fields. Fix by changing the order of the parameters. This also adds a comment describing why we check for < 3 when postgres since 9.4 has been sending 4 fields. Backpatch all the way since the bug is almost a decade old. Author: Tomonari Katsumata <t.katsumata1122@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Bug: #18224 Backpatch-through: v12
* Optimize SearchPathCache by saving the last entry.Jeff Davis2023-12-04
| | | | | | | Repeated lookups are common, so it's worth it to check the last entry before doing another hash lookup. Discussion: https://postgr.es/m/04c8592dbd694e4114a3ed87139a7a04e4363030.camel%40j-davis.com
* Teach convert() and friends to avoid copying when possible.Nathan Bossart2023-12-04
| | | | | | | | | | | | | Presently, pg_convert() allocates a new bytea and copies the result regardless of whether any conversion actually happened. This commit adjusts this function to return the source pointer as-is if no conversion occurred. This optimization isn't expected to make a tremendous difference, but it still seems worthwhile to avoid unnecessary memory allocations. Author: Yurii Rashkovskii Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/CA%2BRLCQyknBPSWXRBQGOi6aYEcdQ9RpH9Kch4GjoeY8dQ3D%2Bvhw%40mail.gmail.com
* Remove now-unnecessary Autovacuum[Launcher|Worker]IAm functionsHeikki Linnakangas2023-12-04
| | | | | | | After commit fd5e8b440d, InitProcess() is called later in the EXEC_BACKEND startup sequence, so it's enough to set the am_autovacuum_[launcher|worker] variables at the same place as in the !EXEC_BACKEND case.
* Add support for REINDEX in event triggersMichael Paquier2023-12-04
| | | | | | | | | | | | | | This commit adds support for REINDEX in event triggers, making this command react for the events ddl_command_start and ddl_command_end. The indexes rebuilt are collected with the ReindexStmt emitted by the caller, for the concurrent and non-concurrent paths. Thanks to that, it is possible to know a full list of the indexes that a single REINDEX command has worked on. Author: Garrett Thornburg, Jian He Reviewed-by: Jim Jones, Michael Paquier Discussion: https://postgr.es/m/CAEEqfk5bm32G7sbhzHbES9WejD8O8DCEOaLkxoBP7HNWxjPpvg@mail.gmail.com
* Refactor how InitProcess is calledHeikki Linnakangas2023-12-03
| | | | | | | | | | | The order of process initialization steps is now more consistent between !EXEC_BACKEND and EXEC_BACKEND modes. InitProcess() is called at the same place in either mode. We can now also move the AttachSharedMemoryStructs() call into InitProcess() itself. This reduces the number of "#ifdef EXEC_BACKEND" blocks. Reviewed-by: Tristan Partin, Andres Freund, Alexander Lakhin Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
* Pass BackgroundWorker entry in the parameter file in EXEC_BACKEND modeHeikki Linnakangas2023-12-03
| | | | | | | | | | | | The BackgroundWorker struct is now passed the same way as the Port struct. Seems more consistent. This makes it possible to move InitProcess later in SubPostmasterMain (in next commit), as we no longer need to access shared memory to read background worker entry. Reviewed-by: Tristan Partin, Andres Freund, Alexander Lakhin Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
* Refactor CreateSharedMemoryAndSemaphoresHeikki Linnakangas2023-12-03
| | | | | | | | | | | | For clarity, have separate functions for *creating* the shared memory and semaphores at postmaster or single-user backend startup, and for *attaching* to existing shared memory structures in EXEC_BACKEND case. CreateSharedMemoryAndSemaphores() is now called only at postmaster startup, and a new AttachSharedMemoryStructs() function is called at backend startup in EXEC_BACKEND mode. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
* Silence Valgrind complaint with EXEC_BACKENDHeikki Linnakangas2023-12-01
| | | | | | | | | | The padding bytes written to the backend params file were uninitialized. That's harmless because we don't access the padding bytes when we read the file back in, but Valgrind doesn't know that. In any case, clear the padding bytes to make Valgrind happy. Reported-by: Alexander Lakhin Discussion: https://www.postgresql.org/message-id/014768ed-8b39-c44f-b07c-098c87b1644c@gmail.com