| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
| |
While SH_STAT() is only used for debugging, the allocated array can be large,
and therefore should be freed.
It's unclear why coverity started warning now.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reported-by: Coverity
Discussion: https://postgr.es/m/3005248.1712538233@sss.pgh.pa.us
Backpatch: 12-
|
|
|
|
|
|
|
|
|
| |
Also add comment to make the reasoning behind the Assert() more explicit (per
Tom).
Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAocXNJ6s1VLz+hMamLAQAiewRoW17OJ6-+9GACKfj6iPQ@mail.gmail.com
Backpatch: 11-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bug in simplehash.h which caused an incorrect size mask to be
used when the hash table grew to SH_MAX_SIZE (2^32). The code was
incorrectly setting the size mask to 0 when the hash tables reached the
maximum possible number of buckets. This would result always trying to
use the 0th bucket causing an infinite loop of trying to grow the hash
table due to there being too many collisions.
Seemingly it's not that common for simplehash tables to ever grow this big
as this bug dates back to v10 and nobody seems to have noticed it before.
However, probably the most likely place that people would notice it would
be doing a large in-memory Hash Aggregate with something close to at least
2^31 groups.
After this fix, the code now works correctly with up to within 98% of 2^32
groups and will fail with the following error when trying to insert any
more items into the hash table:
ERROR: hash table size exceeded
However, the work_mem (or hash_mem_multiplier in newer versions) settings
will generally cause Hash Aggregates to spill to disk long before reaching
that many groups. The minimal test case I did took a work_mem setting of
over 192GB to hit the bug.
simplehash hash tables are used in a few other places such as Bitmap Index
Scans, however, again the size that the hash table can become there is
also limited to work_mem and it would take a relation of around 16TB
(2^31) pages and a very large work_mem setting to hit this. With smaller
work_mem values the table would become lossy and never grow large enough
to hit the problem.
Author: Yura Sokolov
Reviewed-by: David Rowley, Ranier Vilela
Discussion: https://postgr.es/m/b1f7f32737c3438136f64b26f4852b96@postgrespro.ru
Backpatch-through: 10, where simplehash.h was added
|
|
|
|
|
|
|
|
|
| |
Switch to 2.1 version of pg_bsd_indent. This formats
multiline function declarations "correctly", that is with
additional lines of parameter declarations indented to match
where the first line's left parenthesis is.
Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The set is implemented as a B-tree, with a compact representation at leaf
items, using Simple-8b algorithm, so that clusters of nearby values use
less memory.
The IntegerSet isn't used for anything yet, aside from the test code, but
we have two patches in the works that would benefit from this: A patch to
allow GiST vacuum to delete empty pages, and a patch to reduce heap
VACUUM's memory usage, by storing the list of dead TIDs more efficiently
and lifting the 1 GB limit on its size.
This includes a unit test module, in src/test/modules/test_integerset.
It can be used to verify correctness, as a regression test, but if you run
it manully, it can also print memory usage and execution time of some of
the tests.
Author: Heikki Linnakangas, Andrey Borodin
Reviewed-by: Julien Rouhaud
Discussion: https://www.postgresql.org/message-id/b5e82599-1966-5783-733c-1a947ddb729f@iki.fi
|
|
|
|
|
|
|
|
|
|
| |
A hashtable reset just reset the hashtable entries, but does not free
memory.
Author: Andres Freund
Discussion: https://postgr.es/m/20190114180423.ywhdg2iagzvh43we@alap3.anarazel.de
Bug: #15592 #15486
Backpatch: 11, this is a prerequisite for other fixes
|
|
|
|
| |
Backpatch-through: certain files through 9.4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "rb" prefix is used by Ruby, so that our existing code results
in name collisions that break plruby. We discussed ways to prevent
that by adjusting dynamic linker options, but it seems that at best
we'd move the pain to other cases. Renaming to avoid the collision
is the only portable fix anyway. Fortunately, our rbtree code is
not (yet?) widely used --- in core, there's only a single usage
in GIN --- so it seems likely that we can get away with a rename.
I chose to do this basically as s/rb/rbt/g, except for places where
there already was a "t" after "rb". The patch could have been made
smaller by only touching linker-visible symbols, but it would have
resulted in oddly inconsistent-looking code. Better to make it look
like "rbt" was the plan all along.
Back-patch to v10. The rbtree.c code exists back to 9.5, but
rb_iterate() which is the actual immediate source of pain was added
in v10, so it seems like changing the names before that would have
more risk than benefit.
Per report from Pavel Raiskup.
Discussion: https://postgr.es/m/4738198.8KVIIDhgEB@nb.usersys.redhat.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix reference to non-existent file in comment.
Add SH_ prefix to the EMPTY and IN_USE tokens, to reduce likelihood of
collisions with unrelated macros.
Add include guards around the function definitions that are not
"parameterized", so the header can be used again in the same translation
unit.
Undefine SH_EQUAL macro where other "parameter" macros are undefined, for
the same reason.
Author: Thomas Munro
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAEepm%3D1LdXZ3mMTM8tHt_b%3DK1kREit%3Dp8sikesak%3DkzHHM07Nw%40mail.gmail.com
|
|
|
|
|
|
| |
Justin Pryzby
Discussion: https://postgr.es/m/20180331105640.GK28454@telsasoft.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A Bloom filter is a space-efficient, probabilistic data structure that
can be used to test set membership. Callers will sometimes incur false
positives, but never false negatives. The rate of false positives is a
function of the total number of elements and the amount of memory
available for the Bloom filter.
Two classic applications of Bloom filters are cache filtering, and data
synchronization testing. Any user of Bloom filters must accept the
possibility of false positives as a cost worth paying for the benefit in
space efficiency.
This commit adds a test harness extension module, test_bloomfilter. It
can be used to get a sense of how the Bloom filter implementation
performs under varying conditions.
This is infrastructure for the upcoming "heapallindexed" amcheck patch,
which verifies the consistency of a heap relation against one of its
indexes.
Author: Peter Geoghegan
Reviewed-By: Andrey Borodin, Michael Paquier, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/CAH2-Wzm5VmG7cu1N-H=nnS57wZThoSDQU+F5dewx3o84M+jY=g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
| |
For consistency with other code that deals in numbers of buckets, the
macro BUCKETS_PER_PARTITION should produce a value of type size_t.
Also, fix a mention of an obsolete proposed name for dshash.c that
appeared in a comment.
Author: Thomas Munro, based on an observation from Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1%2BBOp5aaW3aHEkg5Bptf8Ga_BkBnmA-%3DXcAXShs0yCiYQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
Other header files should never #include postgres.h (nor postgres_fe.h,
nor c.h), per project policy. Also, there's no need for any backend .c
file to explicitly include elog.h or palloc.h, because postgres.h pulls
those in already.
Extracted from a larger patch by Kyotaro Horiguchi. The rest of the
removals he suggests require more study, but these are no-brainers.
Discussion: https://postgr.es/m/20180215.200447.209320006.horiguchi.kyotaro@lab.ntt.co.jp
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In cases where simplehash tables where filled with either a lot of
conflicting hash-values, or values that hash to consecutive
values (i.e. build "chains") the growth heuristics in
d4c62a6b623d6eef88218158e9fa3cf974c6c7e5 could trigger rather
explosively.
To fix that, address some of the reasons (see previous commit) of why
the growth heuristics where needed, and only allow growth when the
table isn't too empty. While that means there's a few cases of bad
input that can be slower, that seems a lot better than running very
quickly out of memory.
Author: Tomas Vondra and Andres Freund, with additional input by
Thomas Munro, Tom Lane Todd A. Cook
Reported-By: Todd A. Cook, Tomas Vondra, Thomas Munro
Discussion: https://postgr.es/m/20171127185700.1470.20362@wrigleys.postgresql.org
Backpatch: 10, where simplehash was introduced
|
|
|
|
| |
Backpatch-through: certain files through 9.3
|
|
|
|
| |
Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a lot of the places having appendBinaryStringInfo() maintain a
trailing NUL byte wasn't actually meaningful, e.g. when appending an
integer which can contain 0 in one of its bytes.
Removing this yields some small speedup, but more importantly will be
more consistent when providing faster variants of pq_sendint etc.
Author: Andres Freund
Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code isn't used, and there's no clear reason why anybody would ever
want to use it. These traversal mechanisms don't yield a visitation order
that is semantically meaningful for any external purpose, nor are they
any faster or simpler than the left-to-right or right-to-left traversals.
(In fact, some rough testing suggests they are slower :-(.) Moreover,
these mechanisms are impossible to test in any arm's-length fashion; doing
so requires knowledge of the red-black tree's internal implementation.
Hence, let's just jettison them.
Discussion: https://postgr.es/m/17735.1505003111@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some compilers complain, not unreasonably, about left-shifting an
int32 "1" and then assigning the result to an int64. In practice
I sure hope that this data structure never gets large enough that
an overflow would actually occur; but let's cast the constant to
the right type to avoid the hazard.
In passing, fix a typo in dshash.h.
Amit Kapila, adjusted as per comment from Thomas Munro.
Discussion: https://postgr.es/m/CAA4eK1+5vfVMYtjK_NX8O3-42yM3o80qdqWnQzGquPrbq6mb+A@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 8c0d7bafad36434cb08ac2c78e69ae72c194ca20 introduced dshash with hash
and compare functions like DynaHash's, and also variants that take a user
data pointer instead of size. Simplify the interface by merging them into
a single pair of function pointer types that take both size and a user data
pointer.
Since it is anticipated that memcmp and tag_hash behavior will be a common
requirement, provide wrapper functions dshash_memcmp and dshash_memhash that
conform to the new function types.
Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/20170823054644.efuzftxjpfi6wwqs%40alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add general purpose chaining hash tables for DSA memory. Unlike
DynaHash in shared memory mode, these hash tables can grow as
required, and cope with being mapped into different addresses in
different backends.
There is a wide range of potential users for such a hash table, though
it's very likely the interface will need to evolve as we come to
understand the needs of different kinds of users. E.g support for
iterators and incremental resizing is planned for later commits and
the details of the callback signatures are likely to change.
Author: Thomas Munro
Reviewed-By: John Gorman, Andres Freund, Dilip Kumar, Robert Haas
Discussion:
https://postgr.es/m/CAEepm=3d8o8XdVwYT6O=bHKsKAM2pu2D6sV1S_=4d+jStVCE7w@mail.gmail.com
https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:
* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
than the expected column 33.
On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list. This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.
There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses. I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
|
|
|
|
| |
Author: Daniel Gustafsson <daniel@yesql.se>
|
|
|
|
|
|
| |
Jeff Janes and me.
Discussion: https://www.postgresql.org/message-id/CAMkU=1zYnniLYg+W9itL93DXebCjx6Uk6m_=Xa8p_zM65X3S0Q@mail.gmail.com
|
|
|
|
| |
perltidy run not included.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commits fa2fa9955280 and 42f50cb8fa98.
While the functionality that was intended to be provided by these
commits is desired, the patch didn't actually solve as many of the
problematic situations as we hoped, and it created a bunch of its own
problems. Since we're going to require more extensive changes soon for
other reasons and users have been working around these problems for a
long time already, there is no point in spending effort in fixing this
halfway measure.
Per complaint from Tom Lane.
Discussion: https://postgr.es/m/21407.1484606922@sss.pgh.pa.us
(Commit fa2fa9955280 had already been reverted in branches 9.5 as
f858524ee4f and 9.6 as e9e44a0953, so this touches master only.
Commit 42f50cb8fa98 was not present in the older branches.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extends the Aggregate node with two new features: HashAggregate
can now run multiple hashtables concurrently, and a new strategy
MixedAggregate populates hashtables while doing sorted grouping.
The planner will now attempt to save as many sorts as possible when
planning grouping sets queries, while not exceeding work_mem for the
estimated combined sizes of all hashtables used. No SQL-level changes
are required. There should be no user-visible impact other than the
new EXPLAIN output and possible changes to result ordering when ORDER
BY was not used (which affected a few regression tests). The
enable_hashagg option is respected.
Author: Andrew Gierth
Reviewers: Mark Dilger, Andres Freund
Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Increase the size when either the distance between actual and optimal
slot grows too large, or when too many subsequent entries would have
to be moved.
This addresses reports that the simplehash performed, sometimes
considerably, worse than dynahash.
The reason turned out to be that insertions into the hashtable where,
due to the use of parallel query, in effect done from another
hashtable, in hash-value order. If the target hashtable, due to
mis-estimation, was sized a lot smaller than the source table(s) that
lead to very imbalanced tables; a lot of entries in many close-by
buckets from the source tables were inserted into a single, wider,
bucket on the target table. As the growth factor was solely computed
based on the fillfactor, the performance of the table decreased
further and further.
b81b5a96f424531b was an attempt to address this problem for hash
aggregates (but not for bitmap scans), but it turns out that the
current method of mixing hash values often actually leaves neighboring
hash-values close to each other, just in different value range. It
might be worth revisiting that independently of the performance issues
addressed in this patch..
To address that problem resize tables in two additional cases: Firstly
when the optimal position for an entry would be far from the actual
position, secondly when many entries would have to be moved to make
space for the new entry (while satisfying the robin hood property).
Due to the additional resizing threshold it seems possible, and
testing confirms that so far, that a higher fillfactor doesn't hurt
performance and saves a bit of memory. It seems better to increase it
now, before a release containing any of this code, rather than wonder
in some later release.
The various boundaries aren't determined in a particularly scientific
manner, they might need some fine-tuning.
In all my tests the new code now, even with parallelism, performs at
least as good as the old code, in several scenarios significantly
better.
Reported-By: Dilip Kumar, Robert Haas, Kuntal Ghosh
Discussion:
https://postgr.es/m/CAFiTN-vagvuAydKG9VnWcoK=ADAhxmOa4ZTrmNsViBBooTnriQ@mail.gmail.com
https://postgr.es/m/CAGz5QC+=fNTYgzMLTBUNeKt6uaWZFXJbkB5+7oWm-n9DwVxcLA@mail.gmail.com
|
|
|
|
|
|
|
|
| |
This could lead to problem when simplehash.h is used to define two
different types of hashtable visible in the same translation unit.
Reported-By: Josh Soref
Discussion: https://postgr.es/m/CACZqfqCC7WdBAY=rQePb9-qW1rjdaTdHsV5KoVejHkDb6qrtOg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Even if we don't emit definitions for SH_ALLOCATE and SH_FREE, we
still need prototypes. The user can't define them before including
simplehash.h because SH_TYPE isn't available yet.
For the allocator to be able to access private_data, it needs to
become an argument to SH_CREATE. Previously we relied on callers
to set that after returning from SH_CREATE, but SH_CREATE calls
SH_ALLOCATE before returning.
Dilip Kumar, reviewed by me.
|
|
|
|
|
|
|
| |
This method is more elegant and more efficient.
Per a suggestion from Andres Freund, who also briefly reviewed
the patch.
|
|
|
|
|
|
| |
There's no generic guard against multiple inclusion in this file,
for good reason. But these typedefs need one, as per a report
from Jeff Janes.
|
|
|
|
|
|
|
|
| |
This is infrastructure for a pending patch to allow parallel bitmap
heap scans.
Dilip Kumar, reviewed (in earlier versions) by Andres Freund and
(more recently) by me. Some further renaming by me, also.
|
|
|
|
|
|
|
|
|
| |
Backpatch to all supported versions, where applicable, to make backpatching
of future fixes go more smoothly.
Josh Soref
Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our documentation states that our maximum field size is 1 GB, and that
our maximum row size of 1.6 TB. However, while this might be attainable
in theory with enough contortions, it is not workable in practice; for
starters, pg_dump fails to dump tables containing rows larger than 1 GB,
even if individual columns are well below the limit; and even if one
does manage to manufacture a dump file containing a row that large, the
server refuses to load it anyway.
This commit enables dumping and reloading of such tuples, provided two
conditions are met:
1. no single column is larger than 1 GB (in output size -- for bytea
this includes the formatting overhead)
2. the whole row is not larger than 2 GB
There are three related changes to enable this:
a. StringInfo's API now has two additional functions that allow creating
a string that grows beyond the typical 1GB limit (and "long" string).
ABI compatibility is maintained. We still limit these strings to 2 GB,
though, for reasons explained below.
b. COPY now uses long StringInfos, so that pg_dump doesn't choke
trying to emit rows longer than 1GB.
c. heap_form_tuple now uses the MCXT_ALLOW_HUGE flag in its allocation
for the input tuple, which means that large tuples are accepted on
input. Note that at this point we do not apply any further limit to the
input tuple size.
The main reason to limit to 2 GB is that the FE/BE protocol uses 32 bit
length words to describe each row; and because the documentation is
ambiguous on its signedness and libpq does consider it signed, we cannot
use the highest-order bit. Additionally, the StringInfo API uses "int"
(which is 4 bytes wide in most platforms) in many places, so we'd need
to change that API too in order to improve, which has lots of fallout.
Backpatch to 9.5, which is the oldest that has
MemoryContextAllocExtended, a necessary piece of infrastructure. We
could apply to 9.4 with very minimal additional effort, but any further
than that would require backpatching "huge" allocations too.
This is the largest set of changes we could find that can be
back-patched without breaking compatibility with existing systems.
Fixing a bigger set of problems (for example, dumping tuples bigger than
2GB, or dumping fields bigger than 1GB) would require changing the FE/BE
protocol and/or changing the StringInfo API in an ABI-incompatible way,
neither of which would be back-patchable.
Authors: Daniel Vérité, Álvaro Herrera
Reviewed by: Tomas Vondra
Discussion: https://postgr.es/m/20160229183023.GA286012@alvherre.pgsql
|
|
|
|
| |
per cpluspluscheck
|
|
|
|
|
| |
Author: Erik Rijkers
Discussion: <274e4c8ac545d6622735f97c1f6c354b@xs4all.nl>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dynahash.c hash tables aren't quite fast enough for some
use-cases. There are several reasons for lacking performance:
- the use of chaining for collision handling makes them cache
inefficient, that's especially an issue when the tables get bigger.
- as the element sizes for dynahash are only determined at runtime,
offset computations are somewhat expensive
- hash and element comparisons are indirect function calls, causing
unnecessary pipeline stalls
- it's two level structure has some benefits (somewhat natural
partitioning), but increases the number of indirections
to fix several of these the hash tables have to be adjusted to the
individual use-case at compile-time. C unfortunately doesn't provide a
good way to do compile code generation (like e.g. c++'s templates for
all their weaknesses do). Thus the somewhat ugly approach taken here is
to allow for code generation using a macro-templatized header file,
which generates functions and types based on a prefix and other
parameters.
Later patches use this infrastructure to use such hash tables for
tidbitmap.c (bitmap scans) and execGrouping.c (hash aggregation,
...). In queries where these use up a large fraction of the time, this
has been measured to lead to performance improvements of over 100%.
There are other cases where this could be useful (e.g. catcache.c).
The hash table design chosen is a variant of linear open-addressing. The
biggest disadvantage of simple linear addressing schemes are highly
variable lookup times due to clustering, and deletions leaving a lot of
tombstones around. To address these issues a variant of "robin hood"
hashing is employed. Robin hood hashing optimizes chaining lengths by
moving elements close to their optimal bucket ("rich" elements), out of
the way if a to-be-inserted element is further away from its optimal
position (i.e. it's "poor"). While that can make insertions slower, the
average lookup performance is a lot better, and higher fill factors can
be used in a still performant manner. To avoid tombstones - which
normally solve the issue that a deleted node's presence is relevant to
determine whether a lookup needs to continue looking or is done -
buckets following a deleted element are shifted backwards, unless
they're empty or already at their optimal position.
There's further possible improvements that can be made to this
implementation. Amongst others:
- Use distance as a termination criteria during searches. This is
generally a good idea, but I've been able to see the overhead of
distance calculations in some cases.
- Consider combining the 'empty' status into the hashvalue, and enforce
storing the hashvalue. That could, in some cases, increase memory
density and remove a few instructions.
- Experiment further with the, very conservatively choosen, fillfactor.
- Make maximum size of hashtable configurable, to allow storing very
very large tables. That'd require 64bit hash values to be more common
than now, though.
- some smaller memcpy calls could be optimized to copy larger chunks
But since the new implementation is already considerably faster than
dynahash it seem sensible to start using it.
Reviewed-By: Tomas Vondra
Discussion: <20160727004333.r3e2k2y6fvk2ntup@alap3.anarazel.de>
|
|
|
|
|
|
|
| |
While we don't need multiple iterators at the moment, the interface is
nicer and less dangerous this way.
Aleksander Alekseev, with some changes by me.
|
|
|
|
|
|
|
|
|
|
| |
It's buggy. If somebody needs this later, they'll need to put back
a non-buggy vesion of it.
Discussion: CAM3SWZT-i6R9JU5YXa8MJUou2_r3LfGJZpQ9tYa1BYxfkj0=cQ@mail.gmail.com
Discussion: CAM3SWZRUOLsYoTT83QgdUy9D8ehYWm_nvbrrfcOOzikiRfFY7g@mail.gmail.com
Peter Geoghegan
|
|
|
|
|
|
|
|
| |
New functions initHyperLogLogError() and freeHyperLogLog() simplify
using this module from elsewhere.
Author: Tomáš Vondra
Review: Peter Geoghegan
|
|
|
|
| |
Backpatch certain files through 9.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the distances used in this algorithm are small integers (not more
than the size of the U set, in fact), there is no good reason to use float
arithmetic for them. Use short ints instead: they're smaller, faster, and
require no special portability assumptions.
Per testing by Greg Stark, which disclosed that the code got into an
infinite loop on VAX for lack of IEEE-style float infinities. We don't
really care all that much whether Postgres can run on a VAX anymore,
but there seems sufficient reason to change this code anyway.
In passing, make a few other small adjustments to make the code match
usual Postgres coding style a bit better.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So far we have worked around the fact that some very old compilers do
not support 'inline' functions by only using inline functions
conditionally (or not at all). Since such compilers are very rare by
now, we have decided to rely on inline functions from 9.6 onwards.
To avoid breaking these old compilers inline is defined away when not
supported. That'll cause "function x defined but not used" type of
warnings, but since nobody develops on such compilers anymore that's
ok.
This change in policy will allow us to more easily employ inline
functions.
I chose to remove code previously conditional on PG_USE_INLINE as it
seemed confusing to have code dependent on a define that's always
defined.
Blacklisting of compilers, like in c53f73879f, now has to be done
differently. A platform template can define PG_FORCE_DISABLE_INLINE to
force inline to be defined empty.
Discussion: 20150701161447.GB30708@awork2.anarazel.de
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves on commit bbfd7edae5aa5ad5553d3c7e102f2e450d4380d4 by
making two simple changes:
* pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn().
Likewise pg_attribute_unused(), pg_attribute_packed(). This reduces
pgindent's tendency to misformat declarations involving them.
* attributes are now always attached to function declarations, not
definitions. Previously some places were taking creative shortcuts,
which were not merely candidates for bad misformatting by pgindent
but often were outright wrong anyway. (It does little good to put a
noreturn annotation where callers can't see it.) In any case, if
we would like to believe that these macros can be used with non-gcc
compilers, we should avoid gratuitous variance in usage patterns.
I also went through and manually improved the formatting of a lot of
declarations, and got rid of excessively repetitive (and now obsolete
anyway) comments informing the reader what pg_attribute_printf is for.
|