aboutsummaryrefslogtreecommitdiff
path: root/src/port/pg_bitutils.c
Commit message (Collapse)AuthorAge
* Optimize popcount functions with ARM Neon intrinsics.Nathan Bossart2025-03-28
| | | | | | | | | | | | | | | This commit introduces Neon implementations of pg_popcount{32,64}, pg_popcount(), and pg_popcount_masked(). As in simd.h, we assume that all available AArch64 hardware supports Neon, so we don't need any new configure-time or runtime checks. Some compilers already emit Neon instructions for these functions, but our hand-rolled implementations for pg_popcount() and pg_popcount_masked() performed better in testing, likely due to better instruction-level parallelism. Author: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
* Rename TRY_POPCNT_FAST to TRY_POPCNT_X86_64.Nathan Bossart2025-03-28
| | | | | | | | | | | | | This macro protects x86_64-specific code, and a subsequent commit will introduce AArch64-specific versions of that code. To prevent confusion, let's rename it to clearly indicate that it's for x86_64. We should likely move this code to its own file (perhaps merging it with the AVX-512 popcount code), but that is left as a future exercise. Reviewed-by: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Use <stdint.h> and <inttypes.h> for c.h integers.Thomas Munro2024-12-04
| | | | | | | | | | | | | | | | | | Redefine our exact width types with standard C99 types and macros, including int64_t, INT64_MAX, INT64_C(), PRId64 etc. We were already using <stdint.h> types in a few places. One complication is that Windows' <inttypes.h> uses format strings like "%I64d", "%I32", "%I" for PRI*64, PRI*32, PTR*PTR, instead of mapping to other standardized format strings like "%lld" etc as seen on other known systems. Teach our snprintf.c to understand them. This removes a lot of configure clutter, and should also allow 64-bit numbers and other standard types to be used in localized messages without casting. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/ME3P282MB3166F9D1F71F787929C0C7E7B6312%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM
* Optimize visibilitymap_count() with AVX-512 instructions.Nathan Bossart2024-04-06
| | | | | | | | | | | | | | Commit 792752af4e added infrastructure for using AVX-512 intrinsic functions, and this commit uses that infrastructure to optimize visibilitymap_count(). Specificially, a new pg_popcount_masked() function is introduced that applies a bitmask to every byte in the buffer prior to calculating the population count, which is used to filter out the all-visible or all-frozen bits as needed. Platforms without AVX-512 support should also see a nice speedup due to the reduced number of calls to a function pointer. Co-authored-by: Ants Aasma Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
* Optimize pg_popcount() with AVX-512 instructions.Nathan Bossart2024-04-06
| | | | | | | | | | | | | | | | | | | | | | Presently, pg_popcount() processes data in 32-bit or 64-bit chunks when possible. Newer hardware that supports AVX-512 instructions can use 512-bit chunks, which provides a nice speedup, especially for larger buffers. This commit introduces the infrastructure required to detect compiler and CPU support for the required AVX-512 intrinsic functions, and it adds a new pg_popcount() implementation that uses these functions. If CPU support for this optimized implementation is detected at runtime, a function pointer is updated so that it is used by subsequent calls to pg_popcount(). Most of the existing in-tree calls to pg_popcount() should benefit from these instructions, and calls with smaller buffers should at least not regress compared to v16. The new infrastructure introduced by this commit can also be used to optimize visibilitymap_count(), but that is left for a follow-up commit. Co-authored-by: Paul Amonson, Ants Aasma Reviewed-by: Matthias van de Meent, Tom Lane, Noah Misch, Akash Shankaran, Alvaro Herrera, Andres Freund, David Rowley Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
* Inline pg_popcount() for small buffers.Nathan Bossart2024-04-03
| | | | | | | | | | | | | | If there aren't many bytes to process, the function call overhead of the optimized implementation isn't worth taking, so instead we inline a loop that consults pg_number_of_ones in that case. If there are many bytes to process, we accept the function call overhead because the optimized versions are likely to be faster. The threshold at which we use the optimized implementation is set to the smallest amount of data required to use special popcount instructions. Reviewed-by: Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/20240402155301.GA2750455%40nathanxps13
* Refactor code for setting pg_popcount* function pointers.Nathan Bossart2024-04-02
| | | | | | | | | | | Presently, there are three copies of this code, and a proposed follow-up patch would add more code to each copy. This commit introduces a new inline function for this code and makes use of it in the pg_popcount*_choose functions, thereby reducing code duplication. Author: Paul Amonson Discussion: https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
* Inline pg_popcount{32,64} into pg_popcount().Nathan Bossart2024-03-19
| | | | | | | | | | | | | | On some systems, calls to pg_popcount{32,64} are indirected through a function pointer. This commit converts pg_popcount() to a function pointer on those systems so that we can inline the appropriate pg_popcount{32,64} implementations into each of the pg_popcount() implementations. Since pg_popcount() may call pg_popcount{32,64} several times, this can significantly improve its performance. Suggested-by: David Rowley Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAApHDvrb7MJRB6JuKLDEY2x_LKdFHwVbogpjZBCX547i5%2BrXOQ%40mail.gmail.com
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Use direct function calls for pg_popcount{32,64} on non-x86 platformsJohn Naylor2021-08-16
| | | | | | | | | | | | Previously, all pg_popcount{32,64} calls were indirected through a function pointer, even though we had no fast implementation for non-x86 platforms. Instead, for those platforms use wrappers around the pg_popcount{32,64}_slow functions. Review and additional hacking by David Rowley Reviewed by Álvaro Herrera Discussion: https://www.postgresql.org/message-id/flat/CAFBsxsE7otwnfA36Ly44zZO%2Bb7AEWHRFANxR1h1kxveEV%3DghLQ%40mail.gmail.com
* Add POPCNT support for MSVC x86_64 buildsDavid Rowley2021-08-09
| | | | | | | | | | | | | | | | | 02a6a54ec added code to make use of the POPCNT instruction when available for many of our common platforms. Here we do the same for MSVC for x86_64 machines. MSVC's intrinsic functions for popcnt seem to differ from GCCs in that they always appear to emit the popcnt instructions. In GCC the behavior will depend on if the source file was compiled with -mpopcnt or not. For this reason, the MSVC intrinsic function has been lumped into the pg_popcount*_asm function, however doing that sort of invalidates the name of that function, so let's rename it to pg_popcount*_fast(). Author: David Rowley Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAApHDvqL3cbbK%3DGzNcwzsNR9Gi%2BaUvTudKkC4XgnQfXirJ_oRQ%40mail.gmail.com
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Fix typos and inconsistencies in code commentsMichael Paquier2019-06-14
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/dec6aae8-2d63-639f-4d50-20e229fb83e3@gmail.com
* Initial pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
* Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.Tom Lane2019-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
* Revert attempts to use POPCNT etc instructionsAlvaro Herrera2019-02-15
| | | | | | | | | | | This reverts commits fc6c72747ae6, 109de05cbb03, d0b4663c23b7 and 711bab1e4d19. Somebody will have to try harder before submitting this patch again. I've spent entirely too much time on it already, and the #ifdef maze yet to be written in order for it to build at all got on my nerves. The amount of work needed to get a platform-specific performance improvement that's barely above the noise level is not worth it.
* Fix compiler builtin usage in new pg_bitutils.cAlvaro Herrera2019-02-15
| | | | | | | | | | | | | | | | | | | | | | Split out these new functions in three parts: one in a new file that uses the compiler builtin and gets compiled with the -mpopcnt compiler option if it exists; another one that uses the compiler builtin but not the compiler option; and finally the fallback with open-coded algorithms. Split out the configure logic: in the original commit, it was selecting to use the -mpopcnt compiler switch together with deciding whether to use the compiler builtin, but those two things are really separate. Split them out. Also, expose whether the builtin exists to Makefile.global, so that src/port's Makefile can decide whether to compile the hw-optimized file. Remove CPUID test for CTZ/CLZ. Make pg_{right,left}most_ones use either the compiler intrinsic or open-coded algo; trying to use the HW-optimized version is a waste of time. Make them static inline functions. Discussion: https://postgr.es/m/20190213221719.GA15976@alvherre.pgsql
* Fix portability issues in pg_bitutilsAlvaro Herrera2019-02-13
| | | | | | | | | | | | | | | We were using uint64 function arguments as "long int" arguments to compiler builtins, which fails on machines where long ints are 32 bits: the upper half of the uint64 was being ignored. Fix by using the "ll" builtin variants instead, which on those machines take 64 bit arguments. Also, remove configure tests for __builtin_popcountl() (as well as "long" variants for ctz and clz): the theory here is that any compiler version will provide all widths or none, so one test suffices. Were this theory to be wrong, we'd have to add tests for __builtin_popcountll() and friends, which would be tedious. Per failures in buildfarm member lapwing and ensuing discussion.
* Add basic support for using the POPCNT and SSE4.2s LZCNT opcodesAlvaro Herrera2019-02-13
These opcodes have been around in the AMD world since 2007, and 2008 in the case of intel. They're supported in GCC and Clang via some __builtin macros. The opcodes may be unavailable during runtime, in which case we fall back on a C-based implementation of the code. In order to get the POPCNT instruction we must pass the -mpopcnt option to the compiler. We do this only for the pg_bitutils.c file. David Rowley (with fragments taken from a patch by Thomas Munro) Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com