aboutsummaryrefslogtreecommitdiff
path: root/src/include/lib
Commit message (Collapse)AuthorAge
* Fix incorrect hash table resizing code in simplehash.hDavid Rowley2021-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a bug in simplehash.h which caused an incorrect size mask to be used when the hash table grew to SH_MAX_SIZE (2^32). The code was incorrectly setting the size mask to 0 when the hash tables reached the maximum possible number of buckets. This would result always trying to use the 0th bucket causing an infinite loop of trying to grow the hash table due to there being too many collisions. Seemingly it's not that common for simplehash tables to ever grow this big as this bug dates back to v10 and nobody seems to have noticed it before. However, probably the most likely place that people would notice it would be doing a large in-memory Hash Aggregate with something close to at least 2^31 groups. After this fix, the code now works correctly with up to within 98% of 2^32 groups and will fail with the following error when trying to insert any more items into the hash table: ERROR: hash table size exceeded However, the work_mem (or hash_mem_multiplier in newer versions) settings will generally cause Hash Aggregates to spill to disk long before reaching that many groups. The minimal test case I did took a work_mem setting of over 192GB to hit the bug. simplehash hash tables are used in a few other places such as Bitmap Index Scans, however, again the size that the hash table can become there is also limited to work_mem and it would take a relation of around 16TB (2^31) pages and a very large work_mem setting to hit this. With smaller work_mem values the table would become lossy and never grow large enough to hit the problem. Author: Yura Sokolov Reviewed-by: David Rowley, Ranier Vilela Discussion: https://postgr.es/m/b1f7f32737c3438136f64b26f4852b96@postgrespro.ru Backpatch-through: 10, where simplehash.h was added
* Initial pgindent and pgperltidy run for v14.Tom Lane2021-05-12
| | | | | | | | Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.
* Fix typos and grammar in documentation and code commentsMichael Paquier2021-04-09
| | | | | | | | | Comment fixes are applied on HEAD, and documentation improvements are applied on back-branches where needed. Author: Justin Pryzby Discussion: https://postgr.es/m/20210408164008.GJ6592@telsasoft.com Backpatch-through: 9.6
* Add Result Cache executor node (take 2)David Rowley2021-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
* Revert b6002a796David Rowley2021-04-01
| | | | | | | | | | | | | This removes "Add Result Cache executor node". It seems that something weird is going on with the tracking of cache hits and misses as highlighted by many buildfarm animals. It's not yet clear what the problem is as other parts of the plan indicate that the cache did work correctly, it's just the hits and misses that were being reported as 0. This is especially a bad time to have the buildfarm so broken, so reverting before too many more animals go red. Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com
* Add Result Cache executor nodeDavid Rowley2021-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
* Allow users of simplehash.h to perform direct deletionsDavid Rowley2021-03-30
| | | | | | | | | | | | | | | | | Previously simplehash.h only exposed a method to perform a hash table delete using the hash table key. This meant that the delete function had to perform a hash lookup in order to find the entry to delete. Here we add a new function so that users of simplehash.h can perform a hash delete directly using the entry pointer, thus saving the hash lookup. An upcoming patch that uses simplehash.h already has performed the hash lookup so already has the entry pointer. This change will allow the code in that patch to perform the hash delete without the code in simplehash.h having to perform an additional hash lookup. Author: David Rowley Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CAApHDvqFLXXge153WmPsjke5VGOSt7Ez0yD0c7eBXLfmWxs3Kw@mail.gmail.com
* Add sort_template.h for making sort functions.Thomas Munro2021-03-03
| | | | | | | | Move our qsort implementation into a header that can be used to define specialized functions for better performance and reduced duplication. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CA%2BhUKGJ2-eaDqAum5bxhpMNhvuJmRDZxB_Tow0n-gse%2BHG0Yig%40mail.gmail.com
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Correct comment in simplehash.h.Thomas Munro2020-08-03
| | | | | | | Post-commit review for commit 84c0e4b9. Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvptBx_%2BUPAzY0uXzopbvPVGKPeZ6Hoy8rnPcWz20Cr0Bw%40mail.gmail.com
* Improve programmer docs for simplehash and dynahash.Thomas Munro2020-08-01
| | | | | | | | | | | When reading the code it's not obvious when one should prefer dynahash over simplehash and vice-versa, so, for programmer-friendliness, add comments to inform that decision. Show sample simplehash method signatures. Author: James Coleman <jtc331@gmail.com> Discussion: https://postgr.es/m/CAAaqYe_dOF39gAJ8rL-a3YO3Qo96MHMRQ2whFjK5ZcU6YvMQSA%40mail.gmail.com
* Initial pgindent and pgperltidy run for v13.Tom Lane2020-05-14
| | | | | | | | | | | Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit c9d297751, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.
* Fix collection of typos and grammar mistakes in the treeMichael Paquier2020-04-10
| | | | | | | This fixes some comments and documentation new as of Postgres 13. Author: Justin Pryzby Discussion: https://postgr.es/m/20200408165653.GF2228@telsasoft.com
* Modify various power 2 calculations to use new helper functionsDavid Rowley2020-04-08
| | | | | | | | | | | | | | First pass of modifying various places that obtain the next power of 2 of a number and make them use the new functions added in pg_bitutils.h instead. This also removes the _hash_log2() function. There are no longer any callers in core. Other users can swap their _hash_log2(n) call to make use of pg_ceil_log2_32(n). Author: David Fetter, with some minor adjustments by me Reviewed-by: John Naylor, Jesse Zhang Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
* Skip memcpy(x, x) in qunique().Noah Misch2020-01-04
| | | | | | | | It has undefined behavior. Follow the precedent of commit 9a9473f3cce1a21c25d6cc7569710e832d2b180b. No back-patch, since the master branch alone has this function. Discussion: https://postgr.es/m/20191229070221.GA13873@gust.leadboat.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* simplehash: Allow for use in frontend code.Robert Haas2019-12-17
| | | | | | | | | | | | Commit 48995040d5e7b1e9bac35d72aff326cae002219d removed the largest barrier to use of simplehash in frontend code, but there's one more problem: it uses elog(ERROR, ...) or elog(LOG, ...) in a couple of places. Work around that by changing those to pg_log_error() and pg_log_info() when FRONTEND is defined. Patch by me, reviewed by Andres Freund. Discussion: http://postgr.es/m/CA+Tgmob8oyh02NrZW=xCScB+5GyJ-jVowE3+TWTUmPF=FsGWTA@mail.gmail.com
* simplehash: Allow use of simplehash without MemoryContext.Robert Haas2019-12-17
| | | | | | | | | | | If the SH_RAW_ALLOCATOR is defined, it will be used to allocate bytes for the hash table, and no dependencies on MemoryContext will exist. This means, in particular, that the SH_CREATE function will not take a MemoryContext argument. Patch by me, reviewed by Andres Freund. Discussion: http://postgr.es/m/CA+Tgmob8oyh02NrZW=xCScB+5GyJ-jVowE3+TWTUmPF=FsGWTA@mail.gmail.com
* Add reusable routine for making arrays unique.Thomas Munro2019-11-07
| | | | | | | | | | Introduce qunique() and qunique_arg(), which can be used after qsort() and qsort_arg() respectively to remove duplicate values. Use it where appropriate. Author: Thomas Munro Reviewed-by: Tom Lane (in an earlier version) Discussion: https://postgr.es/m/CAEepm%3D2vmFTNpAmwbGGD2WaryM6T3hSDVKQPfUwjdD_5XY6vAA%40mail.gmail.com
* Make StringInfo available to frontend code.Andres Freund2019-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's plenty places in frontend code that could benefit from a string buffer implementation. Some because it yields simpler and faster code, and some others because of the desire to share code between backend and frontend. While there is a string buffer implementation available to frontend code, libpq's PQExpBuffer, it is clunkier than stringinfo, it introduces a libpq dependency, doesn't allow for sharing between frontend and backend code, and has a higher API/ABI stability requirement due to being exposed via libpq. Therefore it seems best to just making StringInfo being usable by frontend code. There's not much to do for that, except for rewriting two subsequent elog/ereport calls into others types of error reporting, and deciding on a maximum string length. For the maximum string size I decided to privately define MaxAllocSize to the same value as used in the backend. It seems likely that we'll want to reconsider this for both backend and frontend code in the not too far away future. For now I've left stringinfo.h in lib/, rather than common/, to reduce the likelihood of unnecessary breakage. We could alternatively decide to provide a redirecting stringinfo.h in lib/, or just not provide compatibility. Author: Andres Freund Reviewed-By: Kyotaro Horiguchi, Daniel Gustafsson Discussion: https://postgr.es/m/20190920051857.2fhnvhvx4qdddviz@alap3.anarazel.de
* Allow simplehash to use already-calculated hash values.Jeff Davis2019-08-01
| | | | | | | | | | | | | | | Add _lookup_hash and _insert_hash functions for callers that have already calculated the hash value of the key. The immediate use case is for hash algorithms that write to disk in partitions. The hash value can be calculated once, used to perform a lookup, used to select the partition, then written to the partition along with the tuple. When the tuple is read back, the hash value does not need to be recalculated. Author: Jeff Davis Reviewed-by: Andres Freund Discussion: https://postgr.es/m/48abe675e1330f0c264ab2fe0d4ff23eb244f9ef.camel%40j-davis.com
* Phase 2 pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | | Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
* Add IntegerSet, to hold large sets of 64-bit ints efficiently.Heikki Linnakangas2019-03-22
| | | | | | | | | | | | | | | | | | | | | The set is implemented as a B-tree, with a compact representation at leaf items, using Simple-8b algorithm, so that clusters of nearby values use less memory. The IntegerSet isn't used for anything yet, aside from the test code, but we have two patches in the works that would benefit from this: A patch to allow GiST vacuum to delete empty pages, and a patch to reduce heap VACUUM's memory usage, by storing the list of dead TIDs more efficiently and lifting the 1 GB limit on its size. This includes a unit test module, in src/test/modules/test_integerset. It can be used to verify correctness, as a regression test, but if you run it manully, it can also print memory usage and execution time of some of the tests. Author: Heikki Linnakangas, Andrey Borodin Reviewed-by: Julien Rouhaud Discussion: https://www.postgresql.org/message-id/b5e82599-1966-5783-733c-1a947ddb729f@iki.fi
* simplehash: Add support for resetting a hashtable's contents.Andres Freund2019-02-09
| | | | | | | | | | A hashtable reset just reset the hashtable entries, but does not free memory. Author: Andres Freund Discussion: https://postgr.es/m/20190114180423.ywhdg2iagzvh43we@alap3.anarazel.de Bug: #15592 #15486 Backpatch: 11, this is a prerequisite for other fixes
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Rename rbtree.c functions to use "rbt" prefix not "rb" prefix.Tom Lane2018-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | The "rb" prefix is used by Ruby, so that our existing code results in name collisions that break plruby. We discussed ways to prevent that by adjusting dynamic linker options, but it seems that at best we'd move the pain to other cases. Renaming to avoid the collision is the only portable fix anyway. Fortunately, our rbtree code is not (yet?) widely used --- in core, there's only a single usage in GIN --- so it seems likely that we can get away with a rename. I chose to do this basically as s/rb/rbt/g, except for places where there already was a "t" after "rb". The patch could have been made smaller by only touching linker-visible symbols, but it would have resulted in oddly inconsistent-looking code. Better to make it look like "rbt" was the plan all along. Back-patch to v10. The rbtree.c code exists back to 9.5, but rb_iterate() which is the actual immediate source of pain was added in v10, so it seems like changing the names before that would have more risk than benefit. Per report from Pavel Raiskup. Discussion: https://postgr.es/m/4738198.8KVIIDhgEB@nb.usersys.redhat.com
* Code review for simplehash.h.Thomas Munro2018-08-28
| | | | | | | | | | | | | | | | | | Fix reference to non-existent file in comment. Add SH_ prefix to the EMPTY and IN_USE tokens, to reduce likelihood of collisions with unrelated macros. Add include guards around the function definitions that are not "parameterized", so the header can be used again in the same translation unit. Undefine SH_EQUAL macro where other "parameter" macros are undefined, for the same reason. Author: Thomas Munro Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAEepm%3D1LdXZ3mMTM8tHt_b%3DK1kREit%3Dp8sikesak%3DkzHHM07Nw%40mail.gmail.com
* Fix a boatload of typos in C comments.Tom Lane2018-04-01
| | | | | | Justin Pryzby Discussion: https://postgr.es/m/20180331105640.GK28454@telsasoft.com
* Add Bloom filter implementation.Andres Freund2018-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | A Bloom filter is a space-efficient, probabilistic data structure that can be used to test set membership. Callers will sometimes incur false positives, but never false negatives. The rate of false positives is a function of the total number of elements and the amount of memory available for the Bloom filter. Two classic applications of Bloom filters are cache filtering, and data synchronization testing. Any user of Bloom filters must accept the possibility of false positives as a cost worth paying for the benefit in space efficiency. This commit adds a test harness extension module, test_bloomfilter. It can be used to get a sense of how the Bloom filter implementation performs under varying conditions. This is infrastructure for the upcoming "heapallindexed" amcheck patch, which verifies the consistency of a heap relation against one of its indexes. Author: Peter Geoghegan Reviewed-By: Andrey Borodin, Michael Paquier, Thomas Munro, Andres Freund Discussion: https://postgr.es/m/CAH2-Wzm5VmG7cu1N-H=nnS57wZThoSDQU+F5dewx3o84M+jY=g@mail.gmail.com
* Minor clean-up in dshash.{c,h}.Andres Freund2018-03-01
| | | | | | | | | | For consistency with other code that deals in numbers of buckets, the macro BUCKETS_PER_PARTITION should produce a value of type size_t. Also, fix a mention of an obsolete proposed name for dshash.c that appeared in a comment. Author: Thomas Munro, based on an observation from Amit Kapila Discussion: https://postgr.es/m/CAA4eK1%2BBOp5aaW3aHEkg5Bptf8Ga_BkBnmA-%3DXcAXShs0yCiYQ%40mail.gmail.com
* Remove some inappropriate #includes.Tom Lane2018-02-16
| | | | | | | | | | | | Other header files should never #include postgres.h (nor postgres_fe.h, nor c.h), per project policy. Also, there's no need for any backend .c file to explicitly include elog.h or palloc.h, because postgres.h pulls those in already. Extracted from a larger patch by Kyotaro Horiguchi. The rest of the removals he suggests require more study, but these are no-brainers. Discussion: https://postgr.es/m/20180215.200447.209320006.horiguchi.kyotaro@lab.ntt.co.jp
* Prevent growth of simplehash tables when they're "too empty".Andres Freund2018-01-29
| | | | | | | | | | | | | | | | | | | | In cases where simplehash tables where filled with either a lot of conflicting hash-values, or values that hash to consecutive values (i.e. build "chains") the growth heuristics in d4c62a6b623d6eef88218158e9fa3cf974c6c7e5 could trigger rather explosively. To fix that, address some of the reasons (see previous commit) of why the growth heuristics where needed, and only allow growth when the table isn't too empty. While that means there's a few cases of bad input that can be slower, that seems a lot better than running very quickly out of memory. Author: Tomas Vondra and Andres Freund, with additional input by Thomas Munro, Tom Lane Todd A. Cook Reported-By: Todd A. Cook, Tomas Vondra, Thomas Munro Discussion: https://postgr.es/m/20171127185700.1470.20362@wrigleys.postgresql.org Backpatch: 10, where simplehash was introduced
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Update typedefs.list and re-run pgindentRobert Haas2017-11-29
| | | | Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
* Allow to avoid NUL-byte management for stringinfos and use in format.c.Andres Freund2017-10-11
| | | | | | | | | | | | In a lot of the places having appendBinaryStringInfo() maintain a trailing NUL byte wasn't actually meaningful, e.g. when appending an integer which can contain 0 in one of its bytes. Removing this yields some small speedup, but more importantly will be more consistent when providing faster variants of pq_sendint etc. Author: Andres Freund Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de
* Remove pre-order and post-order traversal logic for red-black trees.Tom Lane2017-09-10
| | | | | | | | | | | | | This code isn't used, and there's no clear reason why anybody would ever want to use it. These traversal mechanisms don't yield a visitation order that is semantically meaningful for any external purpose, nor are they any faster or simpler than the left-to-right or right-to-left traversals. (In fact, some rough testing suggests they are slower :-(.) Moreover, these mechanisms are impossible to test in any arm's-length fashion; doing so requires knowledge of the red-black tree's internal implementation. Hence, let's just jettison them. Discussion: https://postgr.es/m/17735.1505003111@sss.pgh.pa.us
* Suppress compiler warnings in dshash.c.Tom Lane2017-09-03
| | | | | | | | | | | | | | Some compilers complain, not unreasonably, about left-shifting an int32 "1" and then assigning the result to an int64. In practice I sure hope that this data structure never gets large enough that an overflow would actually occur; but let's cast the constant to the right type to avoid the hazard. In passing, fix a typo in dshash.h. Amit Kapila, adjusted as per comment from Thomas Munro. Discussion: https://postgr.es/m/CAA4eK1+5vfVMYtjK_NX8O3-42yM3o80qdqWnQzGquPrbq6mb+A@mail.gmail.com
* Consolidate the function pointer types used by dshash.c.Andres Freund2017-08-24
| | | | | | | | | | | | | | | | Commit 8c0d7bafad36434cb08ac2c78e69ae72c194ca20 introduced dshash with hash and compare functions like DynaHash's, and also variants that take a user data pointer instead of size. Simplify the interface by merging them into a single pair of function pointer types that take both size and a user data pointer. Since it is anticipated that memcmp and tag_hash behavior will be a common requirement, provide wrapper functions dshash_memcmp and dshash_memhash that conform to the new function types. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/20170823054644.efuzftxjpfi6wwqs%40alap3.anarazel.de
* Hash tables backed by DSA shared memory.Andres Freund2017-08-22
| | | | | | | | | | | | | | | | | | | Add general purpose chaining hash tables for DSA memory. Unlike DynaHash in shared memory mode, these hash tables can grow as required, and cope with being mapped into different addresses in different backends. There is a wide range of potential users for such a hash table, though it's very likely the interface will need to evolve as we come to understand the needs of different kinds of users. E.g support for iterators and incremental resizing is planned for later commits and the details of the callback signatures are likely to change. Author: Thomas Munro Reviewed-By: John Gorman, Andres Freund, Dilip Kumar, Robert Haas Discussion: https://postgr.es/m/CAEepm=3d8o8XdVwYT6O=bHKsKAM2pu2D6sV1S_=4d+jStVCE7w@mail.gmail.com https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
* Phase 3 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Phase 2 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Initial pgindent run with pg_bsd_indent version 2.0.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new indent version includes numerous fixes thanks to Piotr Stefaniak. The main changes visible in this commit are: * Nicer formatting of function-pointer declarations. * No longer unexpectedly removes spaces in expressions using casts, sizeof, or offsetof. * No longer wants to add a space in "struct structname *varname", as well as some similar cases for const- or volatile-qualified pointers. * Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely. * Fixes bug where comments following declarations were sometimes placed with no space separating them from the code. * Fixes some odd decisions for comments following case labels. * Fixes some cases where comments following code were indented to less than the expected column 33. On the less good side, it now tends to put more whitespace around typedef names that are not listed in typedefs.list. This might encourage us to put more effort into typedef name collection; it's not really a bug in indent itself. There are more changes coming after this round, having to do with comment indentation and alignment of lines appearing within parentheses. I wanted to limit the size of the diffs to something that could be reviewed without one's eyes completely glazing over, so it seemed better to split up the changes as much as practical. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Fix typos in commentsMagnus Hagander2017-06-17
| | | | Author: Daniel Gustafsson <daniel@yesql.se>
* Fix comments in simplehash.h.Heikki Linnakangas2017-06-05
| | | | | | Jeff Janes and me. Discussion: https://www.postgresql.org/message-id/CAMkU=1zYnniLYg+W9itL93DXebCjx6Uk6m_=Xa8p_zM65X3S0Q@mail.gmail.com
* Post-PG 10 beta1 pgindent runBruce Momjian2017-05-17
| | | | perltidy run not included.
* Revert "Permit dump/reload of not-too-large >1GB tuples"Alvaro Herrera2017-05-10
| | | | | | | | | | | | | | | | | | | This reverts commits fa2fa9955280 and 42f50cb8fa98. While the functionality that was intended to be provided by these commits is desired, the patch didn't actually solve as many of the problematic situations as we hoped, and it created a bunch of its own problems. Since we're going to require more extensive changes soon for other reasons and users have been working around these problems for a long time already, there is no point in spending effort in fixing this halfway measure. Per complaint from Tom Lane. Discussion: https://postgr.es/m/21407.1484606922@sss.pgh.pa.us (Commit fa2fa9955280 had already been reverted in branches 9.5 as f858524ee4f and 9.6 as e9e44a0953, so this touches master only. Commit 42f50cb8fa98 was not present in the older branches.)
* Support hashed aggregation with grouping sets.Andrew Gierth2017-03-27
| | | | | | | | | | | | | | | | | | This extends the Aggregate node with two new features: HashAggregate can now run multiple hashtables concurrently, and a new strategy MixedAggregate populates hashtables while doing sorted grouping. The planner will now attempt to save as many sorts as possible when planning grouping sets queries, while not exceeding work_mem for the estimated combined sizes of all hashtables used. No SQL-level changes are required. There should be no user-visible impact other than the new EXPLAIN output and possible changes to result ordering when ORDER BY was not used (which affected a few regression tests). The enable_hashagg option is respected. Author: Andrew Gierth Reviewers: Mark Dilger, Andres Freund Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk
* Make simplehash.h grow hashtable in additional cases.Andres Freund2017-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increase the size when either the distance between actual and optimal slot grows too large, or when too many subsequent entries would have to be moved. This addresses reports that the simplehash performed, sometimes considerably, worse than dynahash. The reason turned out to be that insertions into the hashtable where, due to the use of parallel query, in effect done from another hashtable, in hash-value order. If the target hashtable, due to mis-estimation, was sized a lot smaller than the source table(s) that lead to very imbalanced tables; a lot of entries in many close-by buckets from the source tables were inserted into a single, wider, bucket on the target table. As the growth factor was solely computed based on the fillfactor, the performance of the table decreased further and further. b81b5a96f424531b was an attempt to address this problem for hash aggregates (but not for bitmap scans), but it turns out that the current method of mixing hash values often actually leaves neighboring hash-values close to each other, just in different value range. It might be worth revisiting that independently of the performance issues addressed in this patch.. To address that problem resize tables in two additional cases: Firstly when the optimal position for an entry would be far from the actual position, secondly when many entries would have to be moved to make space for the new entry (while satisfying the robin hood property). Due to the additional resizing threshold it seems possible, and testing confirms that so far, that a higher fillfactor doesn't hurt performance and saves a bit of memory. It seems better to increase it now, before a release containing any of this code, rather than wonder in some later release. The various boundaries aren't determined in a particularly scientific manner, they might need some fine-tuning. In all my tests the new code now, even with parallelism, performs at least as good as the old code, in several scenarios significantly better. Reported-By: Dilip Kumar, Robert Haas, Kuntal Ghosh Discussion: https://postgr.es/m/CAFiTN-vagvuAydKG9VnWcoK=ADAhxmOa4ZTrmNsViBBooTnriQ@mail.gmail.com https://postgr.es/m/CAGz5QC+=fNTYgzMLTBUNeKt6uaWZFXJbkB5+7oWm-n9DwVxcLA@mail.gmail.com
* Fix s/ITERTOR/ITERATOR/ typo in simplehash.h.Andres Freund2017-03-01
| | | | | | | | This could lead to problem when simplehash.h is used to define two different types of hashtable visible in the same translation unit. Reported-By: Josh Soref Discussion: https://postgr.es/m/CACZqfqCC7WdBAY=rQePb9-qW1rjdaTdHsV5KoVejHkDb6qrtOg@mail.gmail.com
* simplehash: Additional tweaks to make specifying an allocator work.Robert Haas2017-02-09
| | | | | | | | | | | | | Even if we don't emit definitions for SH_ALLOCATE and SH_FREE, we still need prototypes. The user can't define them before including simplehash.h because SH_TYPE isn't available yet. For the allocator to be able to access private_data, it needs to become an argument to SH_CREATE. Previously we relied on callers to set that after returning from SH_CREATE, but SH_CREATE calls SH_ALLOCATE before returning. Dilip Kumar, reviewed by me.