postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
...
*	Remove references to old function name	Andres Freund	2024-04-07
\| \| \| \| \| \| \| \| \| \| \|	In a97bbe1f1df I accidentally referenced heapgetpage(), both in a function name and a comment. But since 44086b09753 the relevant function is named heap_prepare_pagescan(). Rename the new function to page_collect_tuples(). Reported-by: Melanie Plageman <melanieplageman@gmail.com> Reported-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/20240407172615.cocrsvboqm3ttqe4@awork3.anarazel.de Discussion: https://postgr.es/m/CAApHDvp4SniHopTrVeKWcEvNXFtdki0utAvO=5R7H6TNhtULRQ@mail.gmail.com
*	Add pg_buffercache_evict() function for testing.	Thomas Munro	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When testing buffer pool logic, it is useful to be able to evict arbitrary blocks. This function can be used in SQL queries over the pg_buffercache view to set up a wide range of buffer pool states. Of course, buffer mappings might change concurrently so you might evict a block other than the one you had in mind, and another session might bring it back in at any time. That's OK for the intended purpose of setting up developer testing scenarios, and more complicated interlocking schemes to give stronger guararantees about that would likely be less flexible for actual testing work anyway. Superuser-only. Author: Palak Chaturvedi <chaturvedipalak1911@gmail.com> Author: Thomas Munro <thomas.munro@gmail.com> (docs, small tweaks) Reviewed-by: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Cary Huang <cary.huang@highgo.ca> Reviewed-by: Cédric Villemain <cedric.villemain+pgsql@abcsql.com> Reviewed-by: Jim Nasby <jim.nasby@gmail.com> Reviewed-by: Maxim Orlov <orlovmg@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CALfch19pW48ZwWzUoRSpsaV9hqt0UPyaBPC4bOZ4W+c7FF566A@mail.gmail.com
*	Fix alignment of stack variable	John Naylor	2024-04-08
\| \| \| \| \| \| \| \|	Declare with union similar to PGAlignedBlock. Report and fix by Andres Freund Discussion: https://postgr.es/m/20240407190731.izm3mdazednrsiqk%40awork3.anarazel.de
*	Add more tab completion support for ALTER DEFAULT PRIVILEGES in psql.	Masahiko Sawada	2024-04-08
\| \| \| \| \| \| \| \| \| \|	This adds tab completion of "GRANT" and "REVOKE [GRANT OPTION FOR]" for ALTER DEFAULT PRIVILEGES, and adds "WITH GRANT OPTION" for ALTER DEFAULT PRIVILEGES ... GRANT ... TO role. Author: Vignesh C, with cosmetic adjustments by me Reviewed-by: Shubham Khanna, Masahiko Sawada Discussion: https://postgr.es/m/CALDaNm1aEdJb-QJi%3DGWStkfj_%2BEDUK_VtDkn%2BTjQ2z7HyU0MBw%40mail.gmail.com
*	Remove redundant nbtree preprocessing assertions.	Peter Geoghegan	2024-04-07
\| \| \| \| \| \| \| \|	One of the assertions was the subject of a false positive complaint from Coverity, but none of the assertions added much, so get rid of them. Reported-By: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3000247.1712537309@sss.pgh.pa.us
*	simplehash: Free collisions array in SH_STAT	Andres Freund	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \|	While SH_STAT() is only used for debugging, the allocated array can be large, and therefore should be freed. It's unclear why coverity started warning now. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Coverity Discussion: https://postgr.es/m/3005248.1712538233@sss.pgh.pa.us Backpatch: 12-
*	Fix check for 'outlen' return from SSL_select_next_proto()	Heikki Linnakangas	2024-04-08
\| \| \| \| \| \|	Fixes compiler warning reported by Andres Freund. Discusssion: https://www.postgresql.org/message-id/20240408015055.xsuahullywpfwyvu@awork3.anarazel.de
*	Silence perlcritic warnings in new libpq tests	Heikki Linnakangas	2024-04-08
\| \| \| \|	Per buildfarm member 'koel'.
*	Send ALPN in TLS handshake, require it in direct SSL connections	Heikki Linnakangas	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \|	libpq now always tries to send ALPN. With the traditional negotiated SSL connections, the server accepts the ALPN, and refuses the connection if it's not what we expect, but connecting without ALPN is still OK. With the new direct SSL connections, ALPN is mandatory. NOTE: This uses "TBD-pgsql" as the protocol ID. We must register a proper one with IANA before the release! Author: Greg Stark, Heikki Linnakangas Reviewed-by: Matthias van de Meent, Jacob Champion
*	Support TLS handshake directly without SSLRequest negotiation	Heikki Linnakangas	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By skipping SSLRequest, you can eliminate one round-trip when establishing a TLS connection. It is also more friendly to generic TLS proxies that don't understand the PostgreSQL protocol. This is disabled by default in libpq, because the direct TLS handshake will fail with old server versions. It can be enabled with the sslnegotation=direct option. It will still fall back to the negotiated TLS handshake if the server rejects the direct attempt, either because it is an older version or the server doesn't support TLS at all, but the fallback can be disabled with the sslnegotiation=requiredirect option. Author: Greg Stark, Heikki Linnakangas Reviewed-by: Matthias van de Meent, Jacob Champion
*	Refactor libpq state machine for negotiating encryption	Heikki Linnakangas	2024-04-08
\| \| \| \| \| \| \| \|	This fixes the few corner cases noted in commit 705843d294, as shown by the changes in the test. Author: Heikki Linnakangas, Matthias van de Meent Reviewed-by: Jacob Champion
*	Use streaming I/O in ANALYZE.	Thomas Munro	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ANALYZE command prefetches and reads sample blocks chosen by a BlockSampler algorithm. Instead of calling [Prefetch\|Read]Buffer() for each block, ANALYZE now uses the streaming API introduced in b5a9b18cd0. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/flat/CAN55FZ0UhXqk9v3y-zW_fp4-WCp43V8y0A72xPmLkOM%2B6M%2BmJg%40mail.gmail.com
*	injection_points: Introduce runtime conditions	Michael Paquier	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a new SQL function injection_points_set_local() that can be used to force injection points to be run only in the process where they are attached. This is handy for SQL tests to: - Detach automatically injection points when the process exits. - Allow tests with injection points to run concurrently with other test suites, so as such modules do not have to be marked with NO_INSTALLCHECK. Currently, the only condition that can be registered is for a PID. This could be extended to more kinds later, if required, like database names/OIDs, roles, or more concepts I did not consider. Using a single function for SQL scripts is an idea from Heikki Linnakangas. Reviewed-by: Andrey Borodin Discussion: https://postgr.es/m/ZfP7IDs9TvrKe49x@paquier.xyz
*	Enhance libpq encryption negotiation tests with new GUC	Heikki Linnakangas	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new "log_connection_negotiation" server option causes the server to print messages to the log when it receives a SSLRequest or GSSENCRequest packet from the client. Together with "log_connections", it gives a trace of how a connection and encryption is negotiatated. Use the option in the libpq_encryption test, to verify in more detail how libpq negotiates encryption with different gssencmode and sslmode options. This revealed a couple of cases where libpq retries encryption or authentication, when it should already know that it cannot succeed. I marked them with XXX comments in the test tables. They only happen when the connection was going to fail anyway, and only with rare combinations of options, so they're not serious. Discussion: https://www.postgresql.org/message-id/CAEze2Wja8VUoZygCepwUeiCrWa4jP316k0mvJrOW4PFmWP0Tcw@mail.gmail.com
*	With gssencmode='require', check credential cache before connecting	Heikki Linnakangas	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, libpq would establish the TCP connection, and then immediately disconnect if the credentials were not available. The same thing happened if you tried to use a Unix domain socket with gssencmode=require. Check those conditions before establishing the TCP connection. This is a very minor issue, but my motivation to do this now is that I'm about to add more detail to the tests for encryption negotiation. This makes the case of gssencmode=require but no credentials configured fail at the same stage as with gssencmode=require and GSSAPI support not compiled at all. That avoids having to deal with variations in expected output depending on build options. Discussion: https://www.postgresql.org/message-id/CAEze2Wja8VUoZygCepwUeiCrWa4jP316k0mvJrOW4PFmWP0Tcw@mail.gmail.com
*	Add tests for libpq gssencmode and sslmode options	Heikki Linnakangas	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test all combinations of gssencmode, sslmode, whether the server supports SSL and/or GSSAPI encryption, and whether they are accepted by pg_hba.conf. This is in preparation for refactoring that code in libpq, and for adding a new option for "direct SSL" connections, which adds another dimension to the logic. If we add even more options in the future, testing all combinations will become unwieldy and we'll need to rethink this, but for now an exhaustive test is nice. Author: Heikki Linnakangas, Matthias van de Meent Reviewed-by: Jacob Champion Discussion: https://www.postgresql.org/message-id/a3af4070-3556-461d-aec8-a8d794f94894@iki.fi
*	Move Kerberos module	Heikki Linnakangas	2024-04-08
\| \| \| \| \| \| \|	So that we can reuse it in new tests. Discussion: https://www.postgresql.org/message-id/a3af4070-3556-461d-aec8-a8d794f94894@iki.fi Reviewed-by: Jacob Champion, Matthias van de Meent
*	Make GIN test using injection points repeatable	Michael Paquier	2024-04-08
\| \| \| \| \| \| \| \| \| \| \|	As written, the test would fail when run repeatedly because one of the injection points attached was not detached. This would not matter if the test is rewritten to be concurrently safe, but let's be clean and it is a good practice. Oversight in 6a1ea02c491d. Discussion: https://postgr.es/m/ZfP7IDs9TvrKe49x@paquier.xyz
*	Fix incorrect KeeperBlock macro in bump.c	David Rowley	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \|	The macro was missing a MAXALIGN around the sizeof(BumpContext) which would cause problems detecting the keeper block on 32-bit systems that have a MAXALIGN value of 8. Thank you to Andres Freund, Tomas Vondra and Tom Lane for investigating and testing. Reported-by: Melanie Plageman, Tomas Vondra Discussion: https://postgr.es/m/CAAKRu_Y6dZjiJEZghgNZp0Gjar1JVq-CH7XGDqExDVHnPgDjuw@mail.gmail.com Discussion: https://postgr.es/m/a4a10b89-6ba8-4abd-b449-019aafff04fc@enterprisedb.com
*	Fix usage of same ListCell transform_or_to_any()'s in nested loops	Alexander Korotkov	2024-04-08
\| \| \| \| \|	Discussion: https://postgr.es/m/CAAKRu_b4SXNW4GAM0bv3e6wcL5ODSXg1ZdRCn6uyLLjSPbveBg%40mail.gmail.com Author: Melanie Plageman
*	Transform OR clauses to ANY expression	Alexander Korotkov	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) on the preliminary stage of optimization when we are still working with the expression tree. Here Cn is a n-th constant expression, 'expr' is non-constant expression, 'op' is an operator which returns boolean result and has a commuter (for the case of reverse order of constant and non-constant parts of the expression, like 'Cn op expr'). Sometimes it can lead to not optimal plan. This is why there is a or_to_any_transform_limit GUC. It specifies a threshold value of length of arguments in an OR expression that triggers the OR-to-ANY transformation. Generally, more groupable OR arguments mean that transformation will be more likely to win than to lose. Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru Author: Alena Rybakina <lena.ribackina@yandex.ru> Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com>
*	Change debug printing to log filename	Daniel Gustafsson	2024-04-08
\| \| \| \| \| \| \| \| \| \|	When restarting the cluster fails the code introduced in 33774978c78 printed the full log contents to aid debugging. For cases when the logfile is large this adds unnecessary overhead. Reduce to printing the logfile path instead. Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20240406214439.2n4zf2w7ukhf7dsy@awork3.anarazel.de
*	Remove useless duplicate call of defGetBoolean().	Tom Lane	2024-04-07
\| \| \| \| \|	Seems to be a copy-and-paste error dating to dc2123400. Noted while reviewing a related documentation patch.
*	Use conditional variable to wait for next MultiXact offset	Alvaro Herrera	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In one multixact.c edge case, we need a mechanism to wait for one multixact offset to be written before being allowed to read the next one. We used to handle this case by sleeping for one millisecond and retrying, but such sleeps have been reported as problematic in production cases. We can avoid the problem by using a condition variable: readers sleep on it and then every creator of multixacts broadcasts into the CV when creation is sufficiently far along. Author: Kyotaro Horiguchi <horikyotajntt@gmail.com> Reviewed-by: Andrey Borodin <amborodin@acm.org> Discussion: https://postgr.es/m/47A598F4-B4E7-4029-8FEC-A06A6C3CB4B5@yandex-team.ru Discussion: https://postgr.es/m/20200515.090333.24867479329066911.horikyota.ntt
*	Avoid extra lookups with nbtree array inequalities.	Peter Geoghegan	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nbtree index scans with SAOP inequalities (but no SAOP equalities) performed extra ORDER proc lookups for any remaining equality strategy scan keys. This could waste cycles, and caused assertion failures. Keeping around a separate ORDER proc is only necessary for a scan's non-array/non-SAOP equality scan keys when the scan has at least one other SAOP equality strategy key (a SAOP inequality shouldn't count). To fix, replace _bt_preprocess_array_keys_final's assertion with a test that makes the function return early when the scan has no SAOP equality scan keys. Oversight in commit 1b134ca5, which enhanced nbtree ScalarArrayOp execution. Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/0539d3d3-a402-0a49-ed5e-26429dffc4bd@gmail.com
*	Don't clobber test exit code at cleanup in LDAP/Kerberors tests	Heikki Linnakangas	2024-04-07
\| \| \| \| \| \| \| \| \| \|	If the test script die()d before running the first test, the whole test was interpreted as SKIPped rather than failed. The PostgreSQL::Cluster module got this right. Backpatch to all supported versions. Discussion: https://www.postgresql.org/message-id/fb898a70-3a88-4629-88e9-f2375020061d@iki.fi
*	Improve check in LDAP test to find the OpenLDAP installation	Heikki Linnakangas	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the OpenLDAP installation directory is not found, set $setup to 0 so that the LDAP tests are skipped. The macOS checks were already doing that, but the checks on other OS's were not. While we're at it, improve the error message when the tests are skipped, to specify whether the OS is supported at all, or if we just didn't find the installation directory. This was accidentally "working" without this, i.e. we were skipping the tests if the OpenLDAP installation was not found, because of a bug in the LdapServer test module: the END block clobbered the exit code so if the script die()s before running the first subtest, the whole test script was marked as SKIPped. The next commit will fix that bug, but we need to fix the setup code first. These checks should probably go into configure/meson, but this is better than nothing and allows fixing the bug in the END block. Backpatch to all supported versions. Discussion: https://www.postgresql.org/message-id/fb898a70-3a88-4629-88e9-f2375020061d@iki.fi
*	Use streaming I/O in sequential scans.	Thomas Munro	2024-04-08
\| \| \| \| \| \| \| \| \| \|	Instead of calling ReadBuffer() for each block, heap sequential scans and TID range scans now use the streaming API introduced in b5a9b18cd0. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/flat/CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg%3DgEQ%40mail.gmail.com
*	Use bump memory context for tuplesorts	David Rowley	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	29f6a959c added a bump allocator type for efficient compact allocations. Here we make use of this for non-bounded tuplesorts to store tuples. This is very space efficient when storing narrow tuples due to bump.c not having chunk headers. This means we can fit more tuples in work_mem before spilling to disk, or perform an in-memory sort touching fewer cacheline. Author: David Rowley Reviewed-by: Nathan Bossart Reviewed-by: Matthias van de Meent Reviewed-by: Tomas Vondra Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
*	Add XLogCtl->logInsertResult	Alvaro Herrera	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This tracks the position of WAL that's been fully copied into WAL buffers by all processes emitting WAL. (For some reason we call that "WAL insertion"). This is updated using atomic monotonic advance during WaitXLogInsertionsToFinish, which is not when the insertions actually occur, but it's the only place where we know where have all the insertions have completed. This value is useful in WALReadFromBuffers, which can verify that callers don't try to read past what has been inserted. (However, more infrastructure is needed in order to actually use WAL after the flush point, since it could be lost.) The value is also useful in WaitXLogInsertionsToFinish() itself, since we can now exit quickly when all WAL has been already inserted, without even having to take any locks.
*	Introduce a bump memory allocator	David Rowley	2024-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This introduces a bump MemoryContext type. The bump context is best suited for short-lived memory contexts which require only allocations of memory and never a pfree or repalloc, which are unsupported. Memory palloc'd into a bump context has no chunk header. This makes bump a useful context type when lots of small allocations need to be done without any need to pfree those allocations. Allocation sizes are rounded up to the next MAXALIGN boundary, so with this and no chunk header, allocations are very compact indeed. Allocations are also very fast as bump does not check any freelists to try and make use of previously free'd chunks. It just checks if there is enough room on the current block, and if so it bumps the freeptr beyond this chunk and returns the value that the freeptr was previously pointing to. Simple and fast. A new block is malloc'd when there's not enough space in the current block. Code using the bump allocator must take care never to call any functions which could try to call realloc() (or variants), pfree(), GetMemoryChunkContext() or GetMemoryChunkSpace() on a bump allocated chunk. Due to lack of chunk headers, these operations are unsupported. To increase the chances of catching such issues, when compiled with MEMORY_CONTEXT_CHECKING, bump allocated chunks are given a header and any attempt to perform an unsupported operation will result in an ERROR. Without MEMORY_CONTEXT_CHECKING, code attempting an unsupported operation could result in a segfault. A follow-on commit will implement the first user of bump. Author: David Rowley Reviewed-by: Nathan Bossart Reviewed-by: Matthias van de Meent Reviewed-by: Tomas Vondra Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
*	Enlarge bit-space for MemoryContextMethodID	David Rowley	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reserve 4 bits for MemoryContextMethodID rather than 3. 3 bits did technically allow a maximum of 8 memory context types, however, we've opted to reserve some bit patterns which left us with only 4 slots, all of which were used. Here we add another bit which frees up 8 slots for future memory context types. In passing, adjust the enum names in MemoryContextMethodID to make it more clear which ones can be used and which ones are reserved. Author: Matthias van de Meent, David Rowley Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
*	Avoid needless large memcpys in libpq socket writing	David Rowley	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Until now, when calling pq_putmessage to write new data to a libpq socket, all writes are copied into a buffer and that buffer gets flushed when full to avoid having to perform small writes to the socket. There are cases where we must write large amounts of data to the socket, sometimes larger than the size of the buffer. In this case, it's wasteful to memcpy this data into the buffer and flush it out, instead, we can send it directly from the memory location that the data is already stored in. Here we adjust internal_putbytes() so that after having just flushed the buffer to the socket, if the remaining bytes to send is as big or bigger than the buffer size, we just send directly rather than needlessly copying into the PqSendBuffer buffer first. Examples of operations that write large amounts of data in one message are; outputting large tuples with SELECT or COPY TO STDOUT and pg_basebackup. Author: Melih Mutlu Reviewed-by: Heikki Linnakangas Reviewed-by: Jelte Fennema-Nio Reviewed-by: David Rowley Reviewed-by: Ranier Vilela Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CAGPVpCR15nosj0f6xe-c2h477zFR88q12e6WjEoEZc8ZYkTh3Q@mail.gmail.com
*	Reduce branches in heapgetpage()'s per-tuple loop	Andres Freund	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Until now, heapgetpage()'s loop over all tuples performed some conditional checks for each tuple, even though condition did not change across the loop. This commit fixes that by moving the loop into an inline function. By calling it with different constant arguments, the compiler can generate an optimized loop for the different conditions, at the price of two per-page checks. For cases of all-visible tables and an isolation level other than serializable, speedups of up to 25% have been measured. Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Zhang Mingli <zmlpostgres@gmail.com> Tested-by: Quan Zongliang <quanzongliang@yeah.net> Discussion: https://postgr.es/m/20230716015656.xjvemfbp5fysjiea@awork3.anarazel.de Discussion: https://postgr.es/m/2ef7ff1b-3d18-2283-61b1-bbd25fc6c7ce@yeah.net
*	Optimize visibilitymap_count() with AVX-512 instructions.	Nathan Bossart	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 792752af4e added infrastructure for using AVX-512 intrinsic functions, and this commit uses that infrastructure to optimize visibilitymap_count(). Specificially, a new pg_popcount_masked() function is introduced that applies a bitmask to every byte in the buffer prior to calculating the population count, which is used to filter out the all-visible or all-frozen bits as needed. Platforms without AVX-512 support should also see a nice speedup due to the reduced number of calls to a function pointer. Co-authored-by: Ants Aasma Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
*	Optimize pg_popcount() with AVX-512 instructions.	Nathan Bossart	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Presently, pg_popcount() processes data in 32-bit or 64-bit chunks when possible. Newer hardware that supports AVX-512 instructions can use 512-bit chunks, which provides a nice speedup, especially for larger buffers. This commit introduces the infrastructure required to detect compiler and CPU support for the required AVX-512 intrinsic functions, and it adds a new pg_popcount() implementation that uses these functions. If CPU support for this optimized implementation is detected at runtime, a function pointer is updated so that it is used by subsequent calls to pg_popcount(). Most of the existing in-tree calls to pg_popcount() should benefit from these instructions, and calls with smaller buffers should at least not regress compared to v16. The new infrastructure introduced by this commit can also be used to optimize visibilitymap_count(), but that is left for a follow-up commit. Co-authored-by: Paul Amonson, Ants Aasma Reviewed-by: Matthias van de Meent, Tom Lane, Noah Misch, Akash Shankaran, Alvaro Herrera, Andres Freund, David Rowley Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
*	Fix if/while thinko in read_stream.c edge case.	Thomas Munro	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we determine that a wanted block can't be combined with the current pending read, it's time to start that read to get it out of the way. An "if" in that code path should have been a "while", because it might take more than one go in case of partial reads. This was only broken for smaller ranges, as the more common case of io_combine_limit-sized ranges is handled earlier in the code and knows how to loop, hiding the bug for a while. Discovered while testing large parallel sequential scans of partially cached tables. The ramp-up-and-down block allocator for parallel scans could hit the problem case and skip some blocks near the end that should have been streamed. Defect in commit b5a9b18c. Discussion: https://postgr.es/m/CA%2BhUKG%2Bh8Whpv0YsJqjMVkjYX%2B80fTVc6oi-V%2BzxJvykLpLHYQ%40mail.gmail.com
*	Disable parallel query in psql error-with-FETCH_COUNT test.	Tom Lane	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \|	The buildfarm members using debug_parallel_query = regress are mostly unhappy with this test. I guess what is happening is that rows generated by a parallel worker are buffered, and might or might not get to the leader before the expected error occurs. We did not see any variability in the old version of this test because each FETCH would succeed or fail atomically, leading to a predictable number of rows emitted before failure. I don't find this to be a bug, just unspecified behavior, so let's disable parallel query for this one test case to make the results stable.
*	Re-implement psql's FETCH_COUNT feature atop libpq's chunked mode.	Tom Lane	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \|	Formerly this was done with a cursor, which is problematic since not all result-set-returning query types can be put into a cursor. The new implementation is better integrated into other psql features, too. Daniel Vérité, reviewed by Laurenz Albe and myself (and whacked around a bit by me, so any remaining bugs are my fault) Discussion: https://postgr.es/m/CAKZiRmxsVTkO928CM+-ADvsMyePmU3L9DQCa9NwqjvLPcEe5QA@mail.gmail.com
*	Support retrieval of results in chunks with libpq.	Tom Lane	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch generalizes libpq's existing single-row mode to allow individual partial-result PGresults to contain up to N rows, rather than always one row. This reduces malloc overhead compared to plain single-row mode, and it is very useful for psql's FETCH_COUNT feature, since otherwise we'd have to add code (and cycles) to either merge single-row PGresults into a bigger one or teach psql's results-printing logic to accept arrays of PGresults. To avoid API breakage, PQsetSingleRowMode() remains the same, and we add a new function PQsetChunkedRowsMode() to invoke the more general case. Also, PGresults obtained the old way continue to carry the PGRES_SINGLE_TUPLE status code, while if PQsetChunkedRowsMode() is used then their status code is PGRES_TUPLES_CHUNK. The underlying logic is the same either way, though. Daniel Vérité, reviewed by Laurenz Albe and myself (and whacked around a bit by me, so any remaining bugs are my fault) Discussion: https://postgr.es/m/CAKZiRmxsVTkO928CM+-ADvsMyePmU3L9DQCa9NwqjvLPcEe5QA@mail.gmail.com
*	Change BitmapAdjustPrefetchIterator to accept BlockNumber	Tomas Vondra	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \|	BitmapAdjustPrefetchIterator() only used the blockno member of the passed in TBMIterateResult to ensure that the prefetch iterator and regular iterator stay in sync. Pass it the BlockNumber only, so that we can move away from using the TBMIterateResult outside of table AM specific code. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
*	BitmapHeapScan: Use correct recheck flag for skip_fetch	Tomas Vondra	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As of 7c70996ebf0949b142a9, BitmapPrefetch() used the recheck flag for the current block to determine whether or not it should skip prefetching the proposed prefetch block. As explained in the comment, this assumed the index AM will report the same recheck value for the future page as it did for the current page - but there's no guarantee. This only affects prefetching - if the recheck flag changes, we may prefetch blocks unecessarily and not prefetch blocks that will be needed. But we don't need to rely on that assumption - we know the recheck flag for the block we're considering prefetching, so we can use that. The impact is very limited in practice - the opclass would need to assign different recheck flags to different blocks, but none of the built-in opclasses seems to do that. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Tom Lane Discussion: https://postgr.es/m/1939305.1712415547%40sss.pgh.pa.us
*	BitmapHeapScan: Push skip_fetch optimization into table AM	Tomas Vondra	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 7c70996ebf0949b142 introduced an optimization to allow bitmap scans to operate like index-only scans by not fetching a block from the heap if none of the underlying data is needed and the block is marked all visible in the visibility map. With the introduction of table AMs, a FIXME was added to this code indicating that the skip_fetch logic should be pushed into the table AM-specific code, as not all table AMs may use a visibility map in the same way. This commit resolves this FIXME for the current block. The layering violation is still present in BitmapHeapScans's prefetching code, which uses the visibility map to decide whether or not to prefetch a block. However, this can be addressed independently. Author: Melanie Plageman Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
*	Implement ALTER TABLE ... SPLIT PARTITION ... command	Alexander Korotkov	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new DDL command splits a single partition into several parititions. Just like ALTER TABLE ... MERGE PARTITIONS ... command, new patitions are created using createPartitionTable() function with parent partition as the template. This commit comprises quite naive implementation which works in single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations including the tuple routing. This is why this new DDL command can't be recommended for large partitioned tables under a high load. However, this implementation come in handy in certain cases even as is. Also, it could be used as a foundation for future implementations with lesser locking and possibly parallel. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires
*	Implement ALTER TABLE ... MERGE PARTITIONS ... command	Alexander Korotkov	2024-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new DDL command merges several partitions into the one partition of the target table. The target partition is created using new createPartitionTable() function with parent partition as the template. This commit comprises quite naive implementation which works in single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the operations including the tuple routing. This is why this new DDL command can't be recommended for large partitioned tables under a high load. However, this implementation come in handy in certain cases even as is. Also, it could be used as a foundation for future implementations with lesser locking and possibly parallel. Discussion: https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru Author: Dmitry Koval Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires
*	BitmapHeapScan: postpone setting can_skip_fetch	Tomas Vondra	2024-04-06
\| \| \| \| \| \| \| \| \| \|	Set BitmapHeapScanState->can_skip_fetch in BitmapHeapNext() instead of in ExecInitBitmapHeapScan(). This is a preliminary step to pushing the skip fetch optimization into heap AM code. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
*	Call WaitLSNCleanup() in AbortTransaction()	Alexander Korotkov	2024-04-07
\| \| \| \| \| \| \| \| \| \| \|	Even though waiting for replay LSN happens without explicit transaction, AbortTransaction() is responsible for the cleanup of the shared memory if the error is thrown in a stored procedure. So, we need to do WaitLSNCleanup() there to clean up after some unexpected error happened while waiting for replay LSN. Discussion: https://postgr.es/m/202404051815.eri4u5q6oj26%40alvherre.pgsql Author: Alvaro Herrera
*	Clarify what is protected by WaitLSNLock	Alexander Korotkov	2024-04-07
\| \| \| \| \| \| \| \| \|	Not just WaitLSNState.waitersHeap, but also WaitLSNState.procInfos and updating of WaitLSNState.minWaitedLSN is protected by WaitLSNLock. There is one now documented exclusion on fast-path checking of WaitLSNProcInfo.inHeap flag. Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql
*	Use an LWLock instead of a spinlock in waitlsn.c	Alexander Korotkov	2024-04-07
\| \| \| \| \| \| \|	This should prevent busy-waiting when number of waiting processes is high. Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql Author: Alvaro Herrera
*	BitmapHeapScan: begin scan after bitmap creation	Tomas Vondra	2024-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \|	Change the order so that the table scan is initialized only after initializing the index scan and building the bitmap. This is mostly a cosmetic change for now, but later commits will need to pass parameters to table_beginscan_bm() that are unavailable in ExecInitBitmapHeapScan(). Author: Melanie Plageman Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas Discussion: https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com