postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
...
*	Support retrieval of results in chunks with libpq.	Tom Lane	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch generalizes libpq's existing single-row mode to allow individual partial-result PGresults to contain up to N rows, rather than always one row. This reduces malloc overhead compared to plain single-row mode, and it is very useful for psql's FETCH_COUNT feature, since otherwise we'd have to add code (and cycles) to either merge single-row PGresults into a bigger one or teach psql's results-printing logic to accept arrays of PGresults. To avoid API breakage, PQsetSingleRowMode() remains the same, and we add a new function PQsetChunkedRowsMode() to invoke the more general case. Also, PGresults obtained the old way continue to carry the PGRES_SINGLE_TUPLE status code, while if PQsetChunkedRowsMode() is used then their status code is PGRES_TUPLES_CHUNK. The underlying logic is the same either way, though. Daniel Vérité, reviewed by Laurenz Albe and myself (and whacked around a bit by me, so any remaining bugs are my fault) Discussion: https://postgr.es/m/CAKZiRmxsVTkO928CM+-ADvsMyePmU3L9DQCa9NwqjvLPcEe5QA@mail.gmail.com
*	Change BitmapAdjustPrefetchIterator to accept BlockNumber	Tomas Vondra	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \|	BitmapAdjustPrefetchIterator() only used the blockno member of the passed in TBMIterateResult to ensure that the prefetch iterator and regular iterator stay in sync. Pass it the BlockNumber only, so that we can move away from using the TBMIterateResult outside of table AM specific code. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
*	BitmapHeapScan: Use correct recheck flag for skip_fetch	Tomas Vondra	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As of 7c70996ebf0949b142a9, BitmapPrefetch() used the recheck flag for the current block to determine whether or not it should skip prefetching the proposed prefetch block. As explained in the comment, this assumed the index AM will report the same recheck value for the future page as it did for the current page - but there's no guarantee. This only affects prefetching - if the recheck flag changes, we may prefetch blocks unecessarily and not prefetch blocks that will be needed. But we don't need to rely on that assumption - we know the recheck flag for the block we're considering prefetching, so we can use that. The impact is very limited in practice - the opclass would need to assign different recheck flags to different blocks, but none of the built-in opclasses seems to do that. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Tom Lane Discussion: https://postgr.es/m/1939305.1712415547%40sss.pgh.pa.us
*	BitmapHeapScan: Push skip_fetch optimization into table AM	Tomas Vondra	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 7c70996ebf0949b142 introduced an optimization to allow bitmap scans to operate like index-only scans by not fetching a block from the heap if none of the underlying data is needed and the block is marked all visible in the visibility map. With the introduction of table AMs, a FIXME was added to this code indicating that the skip_fetch logic should be pushed into the table AM-specific code, as not all table AMs may use a visibility map in the same way. This commit resolves this FIXME for the current block. The layering violation is still present in BitmapHeapScans's prefetching code, which uses the visibility map to decide whether or not to prefetch a block. However, this can be addressed independently. Author: Melanie Plageman Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
*	Implement ALTER TABLE ... SPLIT PARTITION ... command	Alexander Korotkov	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new DDL command splits a single partition into several parititions. Just like ALTER TABLE ... MERGE PARTITIONS ... command, new patitions are created using createPartitionTable() function with parent partition as the template. This commit comprises quite naive implementation which works in single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations including the tuple routing. This is why this new DDL command can't be recommended for large partitioned tables under a high load. However, this implementation come in handy in certain cases even as is. Also, it could be used as a foundation for future implementations with lesser locking and possibly parallel. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires
*	Implement ALTER TABLE ... MERGE PARTITIONS ... command	Alexander Korotkov	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new DDL command merges several partitions into the one partition of the target table. The target partition is created using new createPartitionTable() function with parent partition as the template. This commit comprises quite naive implementation which works in single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations including the tuple routing. This is why this new DDL command can't be recommended for large partitioned tables under a high load. However, this implementation come in handy in certain cases even as is. Also, it could be used as a foundation for future implementations with lesser locking and possibly parallel. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires
*	BitmapHeapScan: postpone setting can_skip_fetch	Tomas Vondra	2024-04-06
\| \| \| \| \| \| \| \| \| \|	Set BitmapHeapScanState->can_skip_fetch in BitmapHeapNext() instead of in ExecInitBitmapHeapScan(). This is a preliminary step to pushing the skip fetch optimization into heap AM code. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
*	Call WaitLSNCleanup() in AbortTransaction()	Alexander Korotkov	2024-04-07
\| \| \| \| \| \| \| \| \| \| \|	Even though waiting for replay LSN happens without explicit transaction, AbortTransaction() is responsible for the cleanup of the shared memory if the error is thrown in a stored procedure. So, we need to do WaitLSNCleanup() there to clean up after some unexpected error happened while waiting for replay LSN. Discussion: https://postgr.es/m/202404051815.eri4u5q6oj26%40alvherre.pgsql Author: Alvaro Herrera
*	Clarify what is protected by WaitLSNLock	Alexander Korotkov	2024-04-07
\| \| \| \| \| \| \| \| \|	Not just WaitLSNState.waitersHeap, but also WaitLSNState.procInfos and updating of WaitLSNState.minWaitedLSN is protected by WaitLSNLock. There is one now documented exclusion on fast-path checking of WaitLSNProcInfo.inHeap flag. Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql
*	Use an LWLock instead of a spinlock in waitlsn.c	Alexander Korotkov	2024-04-07
\| \| \| \| \| \| \|	This should prevent busy-waiting when number of waiting processes is high. Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql Author: Alvaro Herrera
*	BitmapHeapScan: begin scan after bitmap creation	Tomas Vondra	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \|	Change the order so that the table scan is initialized only after initializing the index scan and building the bitmap. This is mostly a cosmetic change for now, but later commits will need to pass parameters to table_beginscan_bm() that are unavailable in ExecInitBitmapHeapScan(). Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
*	Backport IPC::Run optimization to src/test/perl.	Noah Misch	2024-04-06
\| \| \| \| \| \| \|	This one-liner makes the TAP portion of "make check-world" 7% faster on a non-Windows machine. Discussion: https://postgr.es/m/20240331050310.09@rfd.leadboat.com
*	Enhance nbtree ScalarArrayOp execution.	Peter Geoghegan	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 9e8da0f7 taught nbtree to handle ScalarArrayOpExpr quals natively. This works by pushing down the full context (the array keys) to the nbtree index AM, enabling it to execute multiple primitive index scans that the planner treats as one continuous index scan/index path. This earlier enhancement enabled nbtree ScalarArrayOp index-only scans. It also allowed scans with ScalarArrayOp quals to return ordered results (with some notable restrictions, described further down). Take this general approach a lot further: teach nbtree SAOP index scans to decide how to execute ScalarArrayOp scans (when and where to start the next primitive index scan) based on physical index characteristics. This can be far more efficient. All SAOP scans will now reliably avoid duplicative leaf page accesses (just like any other nbtree index scan). SAOP scans whose array keys are naturally clustered together now require far fewer index descents, since we'll reliably avoid starting a new primitive scan just to get to a later offset from the same leaf page. The scan's arrays now advance using binary searches for the array element that best matches the next tuple's attribute value. Required scan key arrays (i.e. arrays from scan keys that can terminate the scan) ratchet forward in lockstep with the index scan. Non-required arrays (i.e. arrays from scan keys that can only exclude non-matching tuples) "advance" without the process ever rolling over to a higher-order array. Naturally, only required SAOP scan keys trigger skipping over leaf pages (non-required arrays cannot safely end or start primitive index scans). Consequently, even index scans of a composite index with a high-order inequality scan key (which we'll mark required) and a low-order SAOP scan key (which we won't mark required) now avoid repeating leaf page accesses -- that benefit isn't limited to simpler equality-only cases. In general, all nbtree index scans now output tuples as if they were one continuous index scan -- even scans that mix a high-order inequality with lower-order SAOP equalities reliably output tuples in index order. This allows us to remove a couple of special cases that were applied when building index paths with SAOP clauses during planning. Bugfix commit 807a40c5 taught the planner to avoid generating unsafe path keys: path keys on a multicolumn index path, with a SAOP clause on any attribute beyond the first/most significant attribute. These cases are now all safe, so we go back to generating path keys without regard for the presence of SAOP clauses (just like with any other clause type). Affected queries can now exploit scan output order in all the usual ways (e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early). Also undo changes from follow-up bugfix commit a4523c5a, which taught the planner to produce alternative index paths, with path keys, but without low-order SAOP index quals (filter quals were used instead). We'll no longer generate these alternative paths, since they can no longer offer any meaningful advantages over standard index qual paths. Affected queries thereby avoid all of the disadvantages that come from using filter quals within index scan nodes. They can avoid extra heap page accesses from using filter quals to exclude non-matching tuples (index quals will never have that problem). They can also skip over irrelevant sections of the index in more cases (though only when nbtree determines that starting another primitive scan actually makes sense). There is a theoretical risk that removing restrictions on SAOP index paths from the planner will break compatibility with amcanorder-based index AMs maintained as extensions. Such an index AM could have the same limitations around ordered SAOP scans as nbtree had up until now. Adding a pro forma incompatibility item about the issue to the Postgres 17 release notes seems like a good idea. Author: Peter Geoghegan <pg@bowt.ie> Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CAH2-Wz=ksvN_sjcnD1+Bt-WtifRA5ok48aDYnq3pkKhxgMQpcw@mail.gmail.com
*	Remove obsolete comment in CopyReadLineText().	Tom Lane	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \|	When this bit of commentary was written, it was alluding to the fact that we looked for newlines and EOD markers in the raw (not yet encoding-converted) input data. We don't do that anymore, preferring to batch the conversion of larger chunks of input and split it into lines later. Hence there's no longer any need for assumptions about the relevant characters being encoding-invariant, and we should remove this comment saying we assume that. Discussion: https://postgr.es/m/1461688.1712347668@sss.pgh.pa.us
*	Speed up tail processing when hashing aligned C strings, take two	John Naylor	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After encountering the NUL terminator, the word-at-a-time loop exits and we must hash the remaining bytes. Previously we calculated the terminator's position and re-loaded the remaining bytes from the input string. This was slower than the unaligned case for very short strings. We already have all the data we need in a register, so let's just mask off the bytes we need and hash them immediately. In addition to endianness issues, the previous attempt upset valgrind in the way it computed the mask. Whether by accident or by wisdom, the author's proposed method passes locally with valgrind 3.22. Ants Aasma, with cosmetic adjustments by me Discussion: https://postgr.es/m/CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1%2BPC0a_w1fwY4iGMqUA%40mail.gmail.com
*	Teach fasthash_accum to use platform endianness for bytewise loads	John Naylor	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \|	This function previously used a mix of word-wise loads and bytewise loads. The bytewise loads happened to be little-endian regardless of platform. This in itself is not a problem. However, a future commit will require the same result whether A) the input is loaded as a word with the relevent bytes masked-off, or B) the input is loaded one byte at a time. While at it, improve debuggability of the internal hash state. Discussion: https://postgr.es/m/CANWCAZZpuV1mES1mtSpAq8tWJewbrv4gEz6R_k4gzNG8GZ5gag%40mail.gmail.com
*	Increase default vacuum_buffer_usage_limit to 2MB.	Thomas Munro	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The BAS_VACUUM ring size has been 256kB since commit d526575f introduced the mechanism 17 years ago. Commit 1cbbee03 recently made it configurable but retained the traditional default. The correct default size has been debated for years, but 256kB is certainly very small. VACUUM soon needs to write back data it dirtied only 32 blocks ago, which usually requires flushing the WAL. New experiments in prefetching pages for VACUUM exacerbated the problem by crashing into dirty data even sooner. Let's make the default 2MB. That's 1.6% of the default toy buffer pool size, and 0.2% of 1GB, which would be a considered a small shared_buffers setting for a real system these days. Users are still free to set the GUC to a different value. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20240403221257.md4gfki3z75cdyf6%40awork3.anarazel.de Discussion: https://postgr.es/m/CA%2BhUKGLY4Q4ZY4f1rvnFtv6%2BPkjNf8MejdPkcju3Qii9DYqqcQ%40mail.gmail.com
*	Allow BufferAccessStrategy to limit pin count.	Thomas Munro	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	While pinning extra buffers to look ahead, users of strategies are in danger of using too many buffers. For some strategies, that means "escaping" from the ring, and in others it means forcing dirty data to disk very frequently with associated WAL flushing. Since external code has no insight into any of that, allow individual strategy types to expose a clamp that should be applied when deciding how many buffers to pin at once. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_aJXnqsyZt6HwFLnxYEBgE17oypkxbKbT1t1geE_wvH2Q%40mail.gmail.com
*	Convert uses of hash_string_pointer to fasthash equivalent	John Naylor	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove duplicate hash_string_pointer() function definitions by creating a new inline function hash_string() for this purpose. This has the added advantage of avoiding strlen() calls when doing hash lookup. It's not clear how many of these are perfomance-sensitive enough to benefit from that, but the simplification is worth it on its own. Reviewed by Jeff Davis Discussion: https://postgr.es/m/CANWCAZbg_XeSeY0a_PqWmWqeRATvzTzUNYRLeT%2Bbzs%2BYQdC92g%40mail.gmail.com
*	Add macro to disable address safety instrumentation	John Naylor	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \|	fasthash_accum_cstring_aligned() uses a technique, found in various strlen() implementations, to detect a string's NUL terminator by reading a word at at time. That triggers failures when testing with "-fsanitize=address", at least with frontend code. To enable using this function anywhere, add a function attribute macro to disable such testing. Reviewed by Jeff Davis Discussion: https://postgr.es/m/CANWCAZbwvp7oUEkbw-xP4L0_S_WNKq-J-ucP4RCNDPJnrakUPw%40mail.gmail.com
*	Fix incorrect return type	John Naylor	2024-04-06
\| \| \| \| \| \| \| \| \|	fasthash32() calculates a 32-bit hashcode, but the return type was uint64. Change to uint32. Noted by Jeff Davis Discussion: https://postgr.es/m/b16c93e6c736a422d4de668343515375664eb05d.camel%40j-davis.com
*	Improve read_stream.c's fast path.	Thomas Munro	2024-04-06
\| \| \| \| \| \| \| \| \| \| \|	The "fast path" for well cached scans that don't do any I/O was accidentally coded in a way that could only be triggered by pg_prewarm's usage pattern, which starts out with a higher distance because of the flags it passes in. We want it to work for streaming sequential scans too, once that patch is committed. Adjust. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGKXZALJ%3D6aArUsXRJzBm%3Dqvc4AWp7%3DiJNXJQqpbRLnD_w%40mail.gmail.com
*	Fix headerscheck violation introduced in f8ce4ed78ca	Andres Freund	2024-04-05
\| \| \| \|	Per ci.
*	Silence some compiler warnings in commit 3311ea86ed	Andrew Dunstan	2024-04-05
\| \| \| \|	Per report from Nathan Bossart
*	Fix incorrect calculation in BlockRefTableEntryGetBlocks.	Robert Haas	2024-04-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous formula was incorrect in the case where the function's nblocks argument was a multiple of BLOCKS_PER_CHUNK, which happens whenever a relation segment file is exactly 512MB or exactly 1GB in length. In such cases, the formula would calculate a stop_offset of 0 rather than 65536, resulting in modified blocks in the second half of a 1GB file, or all the modified blocks in a 512MB file, being omitted from the incremental backup. Reported off-list by Tomas Vondra and Jakub Wartak. Discussion: http://postgr.es/m/CA+TgmoYwy_KHp1-5GYNmVa=zdeJWhNH1T0SBmEuvqQNJEHj1Lw@mail.gmail.com
*	Check HAVE_COPY_FILE_RANGE before calling copy_file_range	Tomas Vondra	2024-04-05
\| \| \| \| \| \| \| \|	Fix a mistake in ac8110155132 - write_reconstructed_file() called copy_file_range() without properly checking HAVE_COPY_FILE_RANGE. Reported by several macOS machines. Also reported by cfbot, but I missed that issue before commit.
*	Allow using copy_file_range in write_reconstructed_file	Tomas Vondra	2024-04-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit allows using copy_file_range() for efficient combining of data from multiple files, instead of simply reading/writing the blocks. Depending on the filesystem and other factors (size of the increment, distribution of modified blocks etc.) this may be faster than the block-by-block copy, but more importantly it enables various features provided by CoW filesystems. If a checksum needs to be calculated for the file, the same strategy as when copying whole files is used - copy_file_range is used to copy the blocks, but the file is also read for the checksum calculation. While the checksum calculation is rarely needed when cloning whole files, when reconstructing the files from multiple backups it needs to happen almost always (the only exception is when the user specified --no-manifest). Author: Tomas Vondra Reviewed-by: Thomas Munro, Jakub Wartak, Robert Haas Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com
*	Make libpqsrv_cancel's return const char , not char	Alvaro Herrera	2024-04-05
\| \| \| \| \| \|	Per headerscheck's C++ check. Discussion: https://postgr.es/m/372769.1712179784@sss.pgh.pa.us
*	Remove unused variable in checksum_file()	Tomas Vondra	2024-04-05
\| \| \| \| \| \|	The 'offset' variable was set but otherwise unused. Per buildfarm animals with clang, e.g. sifaka and longlin.
*	Allow copying files using clone/copy_file_range	Tomas Vondra	2024-04-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds --clone/--copy-file-range options to pg_combinebackup, to allow copying files using file cloning or copy_file_range(). These methods may be faster than the standard block-by-block copy, but the main advantage is that they enable various features provided by CoW filesystems. This commit only uses these copy methods for files that did not change and can be copied as a whole from a single backup. These new copy methods may not be available on all platforms, in which case the command throws an error (immediately, even if no files would be copied as a whole). This early failure seems better than failing later when trying to copy the first file, after performing a lot of work on earlier files. If the requested copy method is available, but a checksum needs to be recalculated (e.g. because of a different checksum type), the file is still copied using the requested method, but it is also read for the checksum calculation. Depending on the filesystem this may be more expensive than just performing the simple copy, but it does enable the CoW benefits. Initial patch by Jakub Wartak, various reworks and improvements by me. Author: Tomas Vondra, Jakub Wartak Reviewed-by: Thomas Munro, Jakub Wartak, Robert Haas Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com
*	Suppress "variable may be used uninitialized" warning.	Tom Lane	2024-04-05
\| \| \| \| \| \| \| \|	Buildfarm member caiman is showing this, which surprises me because it's very late-model gcc (14.0.1) and ought to be smart enough to know that elog(ERROR) doesn't return. But we're likely to see the same from stupider compilers too, so add a dummy initialization in our usual style.
*	docs: Merge separate chapters on built-in index AMs into one.	Robert Haas	2024-04-05
\| \| \| \| \| \| \| \| \| \| \| \| \|	The documentation index is getting very long, which makes it hard to find things. Since these chapters are all very similar in structure and content, merging them is a natural way of reducing the size of the toplevel index. Rather than actually combining all of the SGML into a single file, keep one file per <sect1>, and add a glue file that includes all of them. Discussion: http://postgr.es/m/CA+Tgmob7_uoYuS2=rVwpVXaRwP-UXz+++saYTC-BCZ42QzSNKQ@mail.gmail.com
*	Align blocks in incremental backups to BLCKSZ	Tomas Vondra	2024-04-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Align blocks stored in incremental files to BLCKSZ, so that the incremental backups work well with CoW filesystems. The header of the incremental file is padded with \0 to a multiple of BLCKSZ, so that the block data (also BLCKSZ) is aligned to BLCKSZ. The padding is added only to files containing block data, so files with just the header remain small. This adds a bit of extra space, but as the number of blocks increases the overhead gets negligible very quickly. And as the padding is \0 bytes, it does compress extremely well. The alignment is important for CoW filesystems that usually require the blocks to be aligned to filesystem page size for features like block sharing, deduplication etc. to work well. With the variable sized header the blocks in the increments were not aligned at all, negating the benefits of the CoW filesystems. This matters even for non-CoW filesystems, for example when placed on a RAID array. If the block is not aligned, it may easily span multiple devices, causing read and write amplification. It might be better to align the blocks to the filesystem page, not BLCKSZ, but we have no good way to determine that. Even if we determine the page size at the time of taking the backup, the backup may move. For now the BLCKSZ seems sufficient - the filesystem page is usually 4K, so the default BLCKSZ (8K by default) is aligned to that. Author: Tomas Vondra Reviewed-by: Robert Haas, Jakub Wartak Discussion: https://postgr.es/m/3024283a-7491-4240-80d0-421575f6bb23%40enterprisedb.com
*	Operate XLogCtl->log{Write,Flush}Result with atomics	Alvaro Herrera	2024-04-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This removes the need to hold both the info_lck spinlock and WALWriteLock to update them. We use stock atomic write instead, with WALWriteLock held. Readers can use atomic read, without any locking. This allows for some code to be reordered: some places were a bit contorted to avoid repeated spinlock acquisition, but that's no longer a concern, so we can turn them to more natural coding. Some further changes are possible (maybe to performance wins), but in this commit I did rather minimal ones only, to avoid increasing the blast radius. Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Jeff Davis <pgsql@j-davis.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Discussion: https://postgr.es/m/20200831182156.GA3983@alvherre.pgsql
*	Allow synced slots to have their inactive_since.	Amit Kapila	2024-04-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit does two things: 1) Maintains inactive_since for sync slots whenever the slot is released just like any other regular slot. 2) Ensures the value is set to the current timestamp during the promotion of standby to help correctly interpret the time after promotion. We don't want the slots to appear inactive for a long time after promotion if they haven't been synchronized recently. This would also avoid the invalidation of such slots immediately after promotion if tomorrow we have a feature that invalidates slots based on their inactivity time. Whoever acquires the slot i.e. makes the slot active will reset it to NULL. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik, Masahiko Sawada Discussion: https://postgr.es/m/CAA4eK1KrPGwfZV9LYGidjxHeW+rxJ=E2ThjXvwRGLO=iLNuo=Q@mail.gmail.com Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com
*	Add "ABI_compatibility" regions to wait_event_names.txt	Michael Paquier	2024-04-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current design behind the automatic generation of the C code and documentation related to wait events introduced in fa88928470b5 does not offer a way to attach new wait events without breaking ABI compatibility, as all the events are forcibly reordered for each section in the input file wait_event_names.txt. Adding new wait events to stable branches is something that has happened in the past, 0b6517a3b79a being a recent example of that with VERSION_FILE_SYNC, so we need a way to generate any C code for wait events while maintaining compatibility on stable branches already released. This commit solves this issue by adding a new region called "ABI_compatibility" (keyword could be updated to something else if someone had a better idea) to each section of wait_event_names.txt, so as one can add new wait events to stable branches in wait_event_names.txt while keeping the code ABI-compatible. "ABI_compatibility" has no impact on the documentation generated: all the wait events of one section are still alphabetically ordered. LWLock and Lock sections generate their C code elsewhere, so they do not need an "ABI_compatibility" region. For example, let's imagine a wait_event_names.txt like that: Section: ClassName - Foo FOO_1 "Waiting in Foo 1" FOO_2 "Waiting in Foo 2" ABI_compatibility: NEW_FOO_1 "Waiting in New Foo 1" NEW_BAR_1 "Waiting in New Bar 1" This results in the following enum, where the events in the ABI region are listed last with the same ordering as in wait_event_names.txt: typedef enum { WAIT_EVENT_FOO_1, WAIT_EVENT_FOO_2, WAIT_EVENT_NEW_FOO_1, WAIT_EVENT_NEW_BAR_1 } WaitEventFoo; New wait events added in stable branches should be added at the end of each ABI_compatibility region, and ABI_compatibility should remain empty on HEAD and unreleased stable branches. This design has been suggested by Noah Misch and me. Reported-by: Noah Misch Author: Bertrand Drouvot Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/20240317183114.16@rfd.leadboat.com
*	Fix test failures when language environment is not UTF-8.	Jeff Davis	2024-04-04
\| \| \| \| \| \| \| \|	For tests that depend on UTF-8 encoding, force LC_COLLATE=C and LC_CTYPE=C to avoid an encoding mismatch. Reported-by: Thomas Munro Discussion: https://postgr.es/m/CA+hUKGK-ZqV1njkG_=xcCqXh2fcMkz85FTMnhS2opm4ZerH=xw@mail.gmail.com
*	Fix old, misleading comment for PGRES_POLLING_ACTIVE.	Robert Haas	2024-04-04
\| \| \| \| \| \| \| \| \| \|	The comment implies that we can eventually remove this, but per discussion, we actually don't want to do that ever, in order to maintain compatibility. Jelte Fennema-Nio, reviewed by Tristan Partin Discussion: http://postgr.es/m/CAGECzQTO72jKed5461W8cytV2Msh_e+WUZjOyX_RUQCbjk4LRA@mail.gmail.com
*	Remove reachable call to pg_unreachable().	Robert Haas	2024-04-04
\| \| \| \| \| \| \| \| \| \|	The loop just before this uses break, not return, so this line is reachable. Commit cafe1056558fe07cdc52b95205588fcd80870362 introduced this issue. Jelte Fennema-Nio, reviewed by Tristan Partin Discussion: http://postgr.es/m/CAGECzQTO72jKed5461W8cytV2Msh_e+WUZjOyX_RUQCbjk4LRA@mail.gmail.com
*	Fix ecpg's mechanism for detecting unsupported cases in the grammar.	Tom Lane	2024-04-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ecpg wants to emit a warning if it parses a SQL construct that the backend can parse but will immediately throw a FEATURE_NOT_SUPPORTED error for. The way it was testing for this was to see if the string ERRCODE_FEATURE_NOT_SUPPORTED appeared anywhere in the gram.y code. This is, of course, not nearly good enough, as there are plenty of rules in gram.y that throw that error only conditionally. There was a hack dating to 2008 to suppress the warning in one rule that doesn't even exist anymore, but nothing for other cases we've created since then. End result was that you could get "unsupported feature will be passed to server" warnings while compiling perfectly good SQL code in ecpg. Somehow we'd not heard complaints about this, but it was exposed by the recent addition of an ecpg test for a SQL/JSON construct. To fix, suppress the warning if the rule contains any "if" statement. Manual comparison of gram.y with the generated preproc.y file shows that the warning is now emitted only in rules where it's sensible. This problem has existed for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/603615.1712245382@sss.pgh.pa.us
*	Further cleanup for recent JSON-related commits.	Tom Lane	2024-04-04
\| \| \| \| \| \| \| \| \|	The link commands in test_json_parser/Makefile were a long way shy of a load, as evidenced by buildfarm failures. Model them on pgxs.mk's PROGRAM rule. (Probably we should have put these two test programs in different subdirectories so we could actually use the PROGRAM rule. But I won't question that decision today.)
*	Further cleanup for recent JSON-related commits.	Tom Lane	2024-04-04
\| \| \| \| \| \| \| \|	Add overlooked .gitignore entries. Fix test_json_parser/Makefile to use the pgxs.mk clean rule instead of fighting it. Suppresses a warning from make, at least for me.
*	Tidy up after incremental JSON parser patch	Andrew Dunstan	2024-04-04
\| \| \| \| \| \|	Remove junk left over from non-vpath builds. Try to remedy gettext error on some platforms.
*	Fix warnings re typedef redefinition in ea7b4e9a2a and 3311ea86ed	Andrew Dunstan	2024-04-04
\| \| \| \|	Per gripe from Tom Lane and the buildfarm
*	Add missing initialization in transformJsonFuncExpr()	Amit Langote	2024-04-04
\| \| \| \| \| \| \| \|	de3600452b added some code for the new JSON_TABLE_OP to that function but missed to initialize the default_format variable. Reported-by: Erik Rijkers <er@xs4all.nl> Discussion: https://postgr.es/m/254b2fa2-2f6b-a30a-20ee-21f8a2c12a50@xs4all.nl
*	Fix typo introduced in 6185c9737	Amit Langote	2024-04-04
\| \| \| \| \|	Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxGHiU0p0usjh5hnR0_ByZn4tq1FC3eKAtrQgJeKU6W9kw@mail.gmail.com
*	Add basic JSON_TABLE() functionality	Amit Langote	2024-04-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JSON_TABLE() allows JSON data to be converted into a relational view and thus used, for example, in a FROM clause, like other tabular data. Data to show in the view is selected from a source JSON object using a JSON path expression to get a sequence of JSON objects that's called a "row pattern", which becomes the source to compute the SQL/JSON values that populate the view's output columns. Column values themselves are computed using JSON path expressions applied to each of the JSON objects comprising the "row pattern", for which the SQL/JSON query functions added in 6185c9737cf4 are used. To implement JSON_TABLE() as a table function, this augments the TableFunc and TableFuncScanState nodes that are currently used to support XMLTABLE() with some JSON_TABLE()-specific fields. Note that the JSON_TABLE() spec includes NESTED COLUMNS and PLAN clauses, which are required to provide more flexibility to extract data out of nested JSON objects, but they are not implemented here to keep this commit of manageable size. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Author: Jian He <jian.universality@gmail.com> Reviewers have included (in no particular order): Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
*	pg_upgrade: Fix typo in message	Peter Eisentraut	2024-04-04
\|
*	Use incremental parsing of backup manifests.	Andrew Dunstan	2024-04-04
\| \| \| \| \| \| \| \| \| \| \| \|	This changes the three callers to json_parse_manifest() to use json_parse_manifest_incremental_chunk() if appropriate. In the case of the backend caller, since we don't know the size of the manifest in advance we always call the incremental parser. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net
*	Add support for incrementally parsing backup manifests	Andrew Dunstan	2024-04-04
\| \| \| \| \| \| \| \| \| \| \| \|	This adds the infrastructure for using the new non-recursive JSON parser in processing manifests. It's important that callers make sure that the last piece of json handed to the incremental manifest parser contains the entire last few lines of the manifest, including the checksum. Author: Andrew Dunstan Reviewed-By: Jacob Champion Discussion: https://postgr.es/m/7b0a51d6-0d9d-7366-3a1a-f74397a02f55@dunslane.net