aboutsummaryrefslogtreecommitdiff
path: root/src/include/port/pg_bitutils.h
Commit message (Collapse)AuthorAge
* Optimize popcount functions with ARM SVE intrinsics.Nathan Bossart2025-03-28
| | | | | | | | | | | | | | | | | | This commit introduces SVE implementations of pg_popcount{32,64}. Unlike the Neon versions, we need an additional configure-time check to determine if the compiler supports SVE intrinsics, and we need a runtime check to determine if the current CPU supports SVE instructions. Our testing showed that the SVE implementations are much faster for larger inputs and are comparable to the status quo for smaller inputs. Author: "Devanga.Susmitha@fujitsu.com" <Devanga.Susmitha@fujitsu.com> Co-authored-by: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com> Co-authored-by: "Malladi, Rama" <ramamalladi@hotmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com Discussion: https://postgr.es/m/OSZPR01MB84990A9A02A3515C6E85A65B8B2A2%40OSZPR01MB8499.jpnprd01.prod.outlook.com
* Optimize popcount functions with ARM Neon intrinsics.Nathan Bossart2025-03-28
| | | | | | | | | | | | | | | This commit introduces Neon implementations of pg_popcount{32,64}, pg_popcount(), and pg_popcount_masked(). As in simd.h, we assume that all available AArch64 hardware supports Neon, so we don't need any new configure-time or runtime checks. Some compilers already emit Neon instructions for these functions, but our hand-rolled implementations for pg_popcount() and pg_popcount_masked() performed better in testing, likely due to better instruction-level parallelism. Author: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
* Rename TRY_POPCNT_FAST to TRY_POPCNT_X86_64.Nathan Bossart2025-03-28
| | | | | | | | | | | | | This macro protects x86_64-specific code, and a subsequent commit will introduce AArch64-specific versions of that code. To prevent confusion, let's rename it to clearly indicate that it's for x86_64. We should likely move this code to its own file (perhaps merging it with the AVX-512 popcount code), but that is left as a future exercise. Reviewed-by: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
* Fix comment about AVX-512 popcount support.Nathan Bossart2025-01-22
| | | | | | | | Since commit f78667bd91, we've used __attribute__((target(...))) instead of extra compiler flags for AVX-512 support, but this comment still says that we put the code in a separate file because it might require extra compiler flags. Let's just remove that part of the comment.
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Use <stdint.h> and <inttypes.h> for c.h integers.Thomas Munro2024-12-04
| | | | | | | | | | | | | | | | | | Redefine our exact width types with standard C99 types and macros, including int64_t, INT64_MAX, INT64_C(), PRId64 etc. We were already using <stdint.h> types in a few places. One complication is that Windows' <inttypes.h> uses format strings like "%I64d", "%I32", "%I" for PRI*64, PRI*32, PTR*PTR, instead of mapping to other standardized format strings like "%lld" etc as seen on other known systems. Teach our snprintf.c to understand them. This removes a lot of configure clutter, and should also allow 64-bit numbers and other standard types to be used in localized messages without casting. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/ME3P282MB3166F9D1F71F787929C0C7E7B6312%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM
* Optimize visibilitymap_count() with AVX-512 instructions.Nathan Bossart2024-04-06
| | | | | | | | | | | | | | Commit 792752af4e added infrastructure for using AVX-512 intrinsic functions, and this commit uses that infrastructure to optimize visibilitymap_count(). Specificially, a new pg_popcount_masked() function is introduced that applies a bitmask to every byte in the buffer prior to calculating the population count, which is used to filter out the all-visible or all-frozen bits as needed. Platforms without AVX-512 support should also see a nice speedup due to the reduced number of calls to a function pointer. Co-authored-by: Ants Aasma Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
* Optimize pg_popcount() with AVX-512 instructions.Nathan Bossart2024-04-06
| | | | | | | | | | | | | | | | | | | | | | Presently, pg_popcount() processes data in 32-bit or 64-bit chunks when possible. Newer hardware that supports AVX-512 instructions can use 512-bit chunks, which provides a nice speedup, especially for larger buffers. This commit introduces the infrastructure required to detect compiler and CPU support for the required AVX-512 intrinsic functions, and it adds a new pg_popcount() implementation that uses these functions. If CPU support for this optimized implementation is detected at runtime, a function pointer is updated so that it is used by subsequent calls to pg_popcount(). Most of the existing in-tree calls to pg_popcount() should benefit from these instructions, and calls with smaller buffers should at least not regress compared to v16. The new infrastructure introduced by this commit can also be used to optimize visibilitymap_count(), but that is left for a follow-up commit. Co-authored-by: Paul Amonson, Ants Aasma Reviewed-by: Matthias van de Meent, Tom Lane, Noah Misch, Akash Shankaran, Alvaro Herrera, Andres Freund, David Rowley Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
* Inline pg_popcount() for small buffers.Nathan Bossart2024-04-03
| | | | | | | | | | | | | | If there aren't many bytes to process, the function call overhead of the optimized implementation isn't worth taking, so instead we inline a loop that consults pg_number_of_ones in that case. If there are many bytes to process, we accept the function call overhead because the optimized versions are likely to be faster. The threshold at which we use the optimized implementation is set to the smallest amount of data required to use special popcount instructions. Reviewed-by: Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/20240402155301.GA2750455%40nathanxps13
* Inline pg_popcount{32,64} into pg_popcount().Nathan Bossart2024-03-19
| | | | | | | | | | | | | | On some systems, calls to pg_popcount{32,64} are indirected through a function pointer. This commit converts pg_popcount() to a function pointer on those systems so that we can inline the appropriate pg_popcount{32,64} implementations into each of the pg_popcount() implementations. Since pg_popcount() may call pg_popcount{32,64} several times, this can significantly improve its performance. Suggested-by: David Rowley Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAApHDvrb7MJRB6JuKLDEY2x_LKdFHwVbogpjZBCX547i5%2BrXOQ%40mail.gmail.com
* Fix link error for test_radixtree module on WindowsJohn Naylor2024-03-08
| | | | | | | | | | | Add PGDLLIMPORT to pg_popcount32/64. In passing, fix a typo. Diagnosis by Masahiko Sawada, patch by David Rowley Per buildfarm members drongo and fairywren Discussion: https://postgr.es/m/CAD21AoAMm1mQd%3Dw4PrfrKK%3DOMP8j8%3D7ntJRPF8%2B%3D10iUuvwiCA%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvov7724UrD1Ug0D1eV%2B9Pd_x5VEQmw-6HVG9w1WdCxXPA%40mail.gmail.com
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Bring some MSVC asserts in line with other platformsJohn Naylor2023-07-31
| | | | | | | | | | | | | MSVC's _BitScan* functions return a boolean indicating whether any bits were set in the input, and we were previously asserting that they returned true, per our API. This is correct. However, other platforms simply assert that the input is non-zero, so do that to be more consistent. Noted while investigating a hypothesis from Ranier Vilela about undefined behavior, but this is not his proposed patch. Discussion: https://www.postgresql.org/message-id/CAEudQAoDhUZyKGJ1vbMGcgVUOcsixe-%3DjcVaDWarqkUg163D2w%40mail.gmail.com
* Don't use _BitScanForward64/_BitScanReverse64 on 32-bit MSVC buildsDavid Rowley2023-06-08
| | | | | | | | | | | | | | 677319746 added support for making use of MSVC's bit scanning functions. However, that commit failed to consider 32-bit MSVC builds where the 64-bit versions of these functions are unavailable. This resulted in compilation failures on 32-bit MSVC. Here we adjust the code so we fall back on the manual way of finding the bit positions for 64-bit integers when building on 32-bit MSVC. Bug: #17967 Reported-by: Youmiu Mo Discussion: https://postgr.es/m/17967-cd21e34a314141b2@postgresql.org
* Remove newly added asserts from pg_bitutils.hJohn Naylor2023-02-22
| | | | | | | | | | These were valuable during development, but are unlikely to tell us anything going forward. This reverts 204b0cbec and adjusts the content of 677319746 to more closely match the more-readable original style. Per review from Tom Lane Discussion: https://www.postgresql.org/message-id/3567481.1676906261%40sss.pgh.pa.us
* Add MSVC support for pg_leftmost_one_pos32() and friendsJohn Naylor2023-02-20
| | | | | | | | | | | | | | | To allow testing for general support for fast bitscan intrinsics, add symbols HAVE_BITSCAN_REVERSE and HAVE_BITSCAN_FORWARD. Also do related cleanup in AllocSetFreeIndex(): Previously, we tested for HAVE__BUILTIN_CLZ and copied the relevant internals of pg_leftmost_one_pos32(), with a special fallback that does less work than the general fallback for that function. Now that we have a more general test, we just call pg_leftmost_one_pos32() directly for platforms with intrinsic support. On gcc at least, there is no difference in the binary for non-assert builds. Discussion: https://www.postgresql.org/message-id/CAFBsxsEPc%2BFnX_0vmmQ5DHv60sk4rL_RZJ%2BMD6ei%3D76L0kFMvA%40mail.gmail.com
* Add assert checking to pg_leftmost_one_pos32() and friendsJohn Naylor2023-02-20
| | | | Discussion: https://www.postgresql.org/message-id/CAFBsxsEPc%2BFnX_0vmmQ5DHv60sk4rL_RZJ%2BMD6ei%3D76L0kFMvA%40mail.gmail.com
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Extend size_t support in pg_bitutils.h.Thomas Munro2022-07-22
| | | | | | | | Use a more compact notation that allows us to add more size_t variants as required. This will be used by a later commit. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKG%2B7dSX1XF8yFGmYk-%3D48dbjH2kmzZj16XvhbrWP-9BzRg%40mail.gmail.com
* Use bitwise rotate functions in more placesJohn Naylor2022-02-20
| | | | | | | | | | | | | There were a number of places in the code that used bespoke bit-twiddling expressions to do bitwise rotation. While we've had pg_rotate_right32() for a while now, we hadn't gotten around to standardizing on that. Do so now. Since many potential call sites look more natural with the "left" equivalent, add that function too. Reviewed by Tom Lane and Yugo Nagata Discussion: https://www.postgresql.org/message-id/CAFBsxsH7c1LC0CGZ0ADCBXLHU5-%3DKNXx-r7tHYPAW51b2HK4Qw%40mail.gmail.com
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Simplify declaring variables exported from libpgcommon and libpgport.Tom Lane2021-11-29
| | | | | | | | | | | | | | | | This reverts commits c2d1eea9e and 11b500072, as well as similar hacks elsewhere, in favor of setting up the PGDLLIMPORT macro so that it can just be used unconditionally. That can work because in frontend code, we need no marking in either the defining or consuming files for a variable exported from these libraries; and frontend code has no need to access variables exported from the core backend, either. While at it, write some actual documentation about the PGDLLIMPORT and PGDLLEXPORT macros. Patch by me, based on a suggestion from Robert Haas. Discussion: https://postgr.es/m/1160385.1638165449@sss.pgh.pa.us
* Use direct function calls for pg_popcount{32,64} on non-x86 platformsJohn Naylor2021-08-16
| | | | | | | | | | | | Previously, all pg_popcount{32,64} calls were indirected through a function pointer, even though we had no fast implementation for non-x86 platforms. Instead, for those platforms use wrappers around the pg_popcount{32,64}_slow functions. Review and additional hacking by David Rowley Reviewed by Álvaro Herrera Discussion: https://www.postgresql.org/message-id/flat/CAFBsxsE7otwnfA36Ly44zZO%2Bb7AEWHRFANxR1h1kxveEV%3DghLQ%40mail.gmail.com
* Get rid of artificial restriction on hash table sizes on Windows.Tom Lane2021-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The point of introducing the hash_mem_multiplier GUC was to let users reproduce the old behavior of hash aggregation, i.e. that it could use more than work_mem at need. However, the implementation failed to get the job done on Win64, where work_mem is clamped to 2GB to protect various places that calculate memory sizes using "long int". As written, the same clamp was applied to hash_mem. This resulted in severe performance regressions for queries requiring a bit more than 2GB for hash aggregation, as they now spill to disk and there's no way to stop that. Getting rid of the work_mem restriction seems like a good idea, but it's a big job and could not conceivably be back-patched. However, there's only a fairly small number of places that are concerned with the hash_mem value, and it turns out to be possible to remove the restriction there without too much code churn or any ABI breaks. So, let's do that for now to fix the regression, and leave the larger task for another day. This patch does introduce a bit more infrastructure that should help with the larger task, namely pg_bitutils.h support for working with size_t values. Per gripe from Laurent Hasson. Back-patch to v13 where the behavior change came in. Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Modify various power 2 calculations to use new helper functionsDavid Rowley2020-04-08
| | | | | | | | | | | | | | First pass of modifying various places that obtain the next power of 2 of a number and make them use the new functions added in pg_bitutils.h instead. This also removes the _hash_log2() function. There are no longer any callers in core. Other users can swap their _hash_log2(n) call to make use of pg_ceil_log2_32(n). Author: David Fetter, with some minor adjustments by me Reviewed-by: John Naylor, Jesse Zhang Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
* Add functions to calculate the next power of 2David Rowley2020-04-08
| | | | | | | | | | | | | | | | | | | There are many areas in the code where we need to determine the next highest power of 2 of a given number. We tend to always do that in an ad-hoc way each time, generally with some tight for loop which performs a bitshift left once per loop and goes until it finds a number above the given number. Here we add two generic functions which make use of the existing pg_leftmost_one_pos* functions which, when available, will allow us to calculate the next power of 2 without any looping. Here we don't add any code which uses these new functions. That will be done in follow-up commits. Author: David Fetter, with some minor adjustments by me Reviewed-by: John Naylor, Jesse Zhang Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Rotate instead of shifting hash join batch number.Thomas Munro2019-12-24
| | | | | | | | | | | | | | | | | | | Our algorithm for choosing batch numbers turned out not to work effectively for multi-billion key inner relations. We would use more hash bits than we have, and effectively concentrate all tuples into a smaller number of batches than we intended. While ideally we should switch to wider hashes, for now, change the algorithm to one that effectively gives up bits from the bucket number when we don't have enough bits. That means we'll finish up with longer bucket chains than would be ideal, but that's better than having batches that don't fit in work_mem and can't be divided. Batch-patch to all supported releases. Author: Thomas Munro Reviewed-by: Tom Lane, thanks also to Tomas Vondra, Alvaro Herrera, Andres Freund for testing and discussion Reported-by: James Coleman Discussion: https://postgr.es/m/16104-dc11ed911f1ab9df%40postgresql.org
* Fix more typos and inconsistencies in the treeMichael Paquier2019-06-17
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
* Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.Tom Lane2019-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
* Revert attempts to use POPCNT etc instructionsAlvaro Herrera2019-02-15
| | | | | | | | | | | This reverts commits fc6c72747ae6, 109de05cbb03, d0b4663c23b7 and 711bab1e4d19. Somebody will have to try harder before submitting this patch again. I've spent entirely too much time on it already, and the #ifdef maze yet to be written in order for it to build at all got on my nerves. The amount of work needed to get a platform-specific performance improvement that's barely above the noise level is not worth it.
* Fix compiler builtin usage in new pg_bitutils.cAlvaro Herrera2019-02-15
| | | | | | | | | | | | | | | | | | | | | | Split out these new functions in three parts: one in a new file that uses the compiler builtin and gets compiled with the -mpopcnt compiler option if it exists; another one that uses the compiler builtin but not the compiler option; and finally the fallback with open-coded algorithms. Split out the configure logic: in the original commit, it was selecting to use the -mpopcnt compiler switch together with deciding whether to use the compiler builtin, but those two things are really separate. Split them out. Also, expose whether the builtin exists to Makefile.global, so that src/port's Makefile can decide whether to compile the hw-optimized file. Remove CPUID test for CTZ/CLZ. Make pg_{right,left}most_ones use either the compiler intrinsic or open-coded algo; trying to use the HW-optimized version is a waste of time. Make them static inline functions. Discussion: https://postgr.es/m/20190213221719.GA15976@alvherre.pgsql
* Add basic support for using the POPCNT and SSE4.2s LZCNT opcodesAlvaro Herrera2019-02-13
These opcodes have been around in the AMD world since 2007, and 2008 in the case of intel. They're supported in GCC and Clang via some __builtin macros. The opcodes may be unavailable during runtime, in which case we fall back on a C-based implementation of the code. In order to get the POPCNT instruction we must pass the -mpopcnt option to the compiler. We do this only for the pg_bitutils.c file. David Rowley (with fragments taken from a patch by Thomas Munro) Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com