diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2020-09-06 21:40:39 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2020-09-06 21:40:39 -0400 |
commit | 88709176236caf3cb9655acda6bad2df0323ac8f (patch) | |
tree | d9b871a8d92bf1caf6d4a00ba51344f9d2665b4e /src/backend/utils/adt/numeric.c | |
parent | 695de5d1eda6382b20fadb0a2b2d286b57a6a61c (diff) | |
download | postgresql-88709176236caf3cb9655acda6bad2df0323ac8f.tar.gz postgresql-88709176236caf3cb9655acda6bad2df0323ac8f.zip |
Apply auto-vectorization to the inner loop of numeric multiplication.
Compile numeric.c with -ftree-vectorize where available, and adjust
the innermost loop of mul_var() so that it is amenable to being
auto-vectorized. (Mainly, that involves making it process the arrays
left-to-right not right-to-left.)
Applying -ftree-vectorize actually makes numeric.o smaller, at least
with my compiler (gcc 8.3.1 on x86_64), and it's a little faster too.
Independently of that, fixing the inner loop to be vectorizable also
makes things a bit faster. But doing both is a huge win for
multiplications with lots of digits. For me, the numeric regression
test is the same speed to within measurement noise, but numeric_big
is a full 45% faster.
We also looked into applying -funroll-loops, but that makes numeric.o
bloat quite a bit, and the additional speed improvement is very
marginal.
Amit Khandekar, reviewed and edited a little by me
Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
Diffstat (limited to 'src/backend/utils/adt/numeric.c')
-rw-r--r-- | src/backend/utils/adt/numeric.c | 15 |
1 files changed, 12 insertions, 3 deletions
diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c index ed825a1fddf..d2a42b811da 100644 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -8191,6 +8191,7 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result, int res_weight; int maxdigits; int *dig; + int *dig_i1_2; int carry; int maxdig; int newdig; @@ -8327,10 +8328,18 @@ mul_var(const NumericVar *var1, const NumericVar *var2, NumericVar *result, * * As above, digits of var2 can be ignored if they don't contribute, * so we only include digits for which i1+i2+2 <= res_ndigits - 1. + * + * This inner loop is the performance bottleneck for multiplication, + * so we want to keep it simple enough so that it can be + * auto-vectorized. Accordingly, process the digits left-to-right + * even though schoolbook multiplication would suggest right-to-left. + * Since we aren't propagating carries in this loop, the order does + * not matter. */ - for (i2 = Min(var2ndigits - 1, res_ndigits - i1 - 3), i = i1 + i2 + 2; - i2 >= 0; i2--) - dig[i--] += var1digit * var2digits[i2]; + i = Min(var2ndigits - 1, res_ndigits - i1 - 3); + dig_i1_2 = &dig[i1 + 2]; + for (i2 = 0; i2 <= i; i2++) + dig_i1_2[i2] += var1digit * var2digits[i2]; } /* |