| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First pass of modifying various places that obtain the next power of 2 of
a number and make them use the new functions added in pg_bitutils.h
instead.
This also removes the _hash_log2() function. There are no longer any
callers in core. Other users can swap their _hash_log2(n) call to make use
of pg_ceil_log2_32(n).
Author: David Fetter, with some minor adjustments by me
Reviewed-by: John Naylor, Jesse Zhang
Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are many areas in the code where we need to determine the next
highest power of 2 of a given number. We tend to always do that in an
ad-hoc way each time, generally with some tight for loop which performs a
bitshift left once per loop and goes until it finds a number above the
given number.
Here we add two generic functions which make use of the existing
pg_leftmost_one_pos* functions which, when available, will allow us to
calculate the next power of 2 without any looping.
Here we don't add any code which uses these new functions. That will be
done in follow-up commits.
Author: David Fetter, with some minor adjustments by me
Reviewed-by: John Naylor, Jesse Zhang
Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
|
|
|
|
| |
Backpatch-through: update all files in master, backpatch legal files through 9.4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our algorithm for choosing batch numbers turned out not to work
effectively for multi-billion key inner relations. We would use
more hash bits than we have, and effectively concentrate all tuples
into a smaller number of batches than we intended. While ideally
we should switch to wider hashes, for now, change the algorithm to
one that effectively gives up bits from the bucket number when we
don't have enough bits. That means we'll finish up with longer
bucket chains than would be ideal, but that's better than having
batches that don't fit in work_mem and can't be divided.
Batch-patch to all supported releases.
Author: Thomas Munro
Reviewed-by: Tom Lane, thanks also to Tomas Vondra, Alvaro Herrera, Andres Freund for testing and discussion
Reported-by: James Coleman
Discussion: https://postgr.es/m/16104-dc11ed911f1ab9df%40postgresql.org
|
|
|
|
|
| |
Author: Alexander Lakhin
Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test for the compiler builtins __builtin_clz, __builtin_ctz, and
__builtin_popcount, and make use of these in preference to
handwritten C code if they're available. Create src/port
infrastructure for "leftmost one", "rightmost one", and "popcount"
so as to centralize these decisions.
On x86_64, __builtin_popcount generally won't make use of the POPCNT
opcode because that's not universally supported yet. Provide code
that checks CPUID and then calls POPCNT via asm() if available.
This requires indirecting through a function pointer, which is
an annoying amount of overhead for a one-instruction operation,
but it's probably not worth working harder than this for our
current use-cases.
I'm not sure we've found all the existing places that could profit
from this new infrastructure; but we at least touched all the
ones that used copied-and-pasted versions of the bitmapset.c code,
and got rid of multiple copies of the associated constant arrays.
While at it, replace c-compiler.m4's one-per-builtin-function
macros with a single one that can handle all the cases we need
to worry about so far. Also, because I'm paranoid, make those
checks into AC_LINK checks rather than just AC_COMPILE; the
former coding failed to verify that libgcc has support for the
builtin, in cases where it's not inline code.
David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commits fc6c72747ae6, 109de05cbb03, d0b4663c23b7 and
711bab1e4d19.
Somebody will have to try harder before submitting this patch again.
I've spent entirely too much time on it already, and the #ifdef maze yet
to be written in order for it to build at all got on my nerves. The
amount of work needed to get a platform-specific performance improvement
that's barely above the noise level is not worth it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split out these new functions in three parts: one in a new file that
uses the compiler builtin and gets compiled with the -mpopcnt compiler
option if it exists; another one that uses the compiler builtin but not
the compiler option; and finally the fallback with open-coded
algorithms.
Split out the configure logic: in the original commit, it was selecting
to use the -mpopcnt compiler switch together with deciding whether to
use the compiler builtin, but those two things are really separate.
Split them out. Also, expose whether the builtin exists to
Makefile.global, so that src/port's Makefile can decide whether to
compile the hw-optimized file.
Remove CPUID test for CTZ/CLZ. Make pg_{right,left}most_ones use either
the compiler intrinsic or open-coded algo; trying to use the
HW-optimized version is a waste of time. Make them static inline
functions.
Discussion: https://postgr.es/m/20190213221719.GA15976@alvherre.pgsql
|
|
These opcodes have been around in the AMD world since 2007, and 2008 in
the case of intel. They're supported in GCC and Clang via some __builtin
macros. The opcodes may be unavailable during runtime, in which case we
fall back on a C-based implementation of the code. In order to get the
POPCNT instruction we must pass the -mpopcnt option to the compiler. We
do this only for the pg_bitutils.c file.
David Rowley (with fragments taken from a patch by Thomas Munro)
Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
|