aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/hash/hashutil.c
Commit message (Collapse)AuthorAge
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Initial pgindent and pgperltidy run for v13.Tom Lane2020-05-14
| | | | | | | | | | | Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit c9d297751, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.
* Modify various power 2 calculations to use new helper functionsDavid Rowley2020-04-08
| | | | | | | | | | | | | | First pass of modifying various places that obtain the next power of 2 of a number and make them use the new functions added in pg_bitutils.h instead. This also removes the _hash_log2() function. There are no longer any callers in core. Other users can swap their _hash_log2(n) call to make use of pg_ceil_log2_32(n). Author: David Fetter, with some minor adjustments by me Reviewed-by: John Naylor, Jesse Zhang Discussion: https://postgr.es/m/20200114173553.GE32763%40fetter.org
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Refactor reloption handling for index AMs in-coreMichael Paquier2019-11-25
| | | | | | | | | | | | | | | | | | This reworks the reloption parsing and build of a couple of index AMs by creating new structures for each index AM's options. This split was already done for BRIN, GIN and GiST (which actually has a fillfactor parameter), but not for hash, B-tree and SPGiST which relied on StdRdOptions due to an overlap with the default option set. This saves a couple of bytes for rd_options in each relcache entry with indexes making use of relation options, and brings more consistency between all index AMs. While on it, add a couple of AssertMacro() calls to make sure that utility macros to grab values of reloptions are used with the expected index AM. Author: Nikolay Shaplov Reviewed-by: Amit Langote, Michael Paquier, Álvaro Herrera, Dent John Discussion: https://postgr.es/m/4127670.gFlpRb6XCm@x200m
* Make the order of the header file includes consistent in backend modules.Amit Kapila2019-11-12
| | | | | | | | | | | Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order of header file inclusion consistent for backend modules. In the passing, removed a couple of duplicate inclusions. Author: Vignesh C Reviewed-by: Kuntal Ghosh and Amit Kapila Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* doc: Update uses of the word "procedure"Peter Eisentraut2018-08-22
| | | | | | | | | | | | | | | | | | | Historically, the term procedure was used as a synonym for function in Postgres/PostgreSQL. Now we have procedures as separate objects from functions, so we need to clean up the documentation to not mix those terms. In particular, mentions of "trigger procedures" are changed to "trigger functions", and access method "support procedures" are changed to "support functions". (The latter already used FUNCTION in the SQL syntax anyway.) Also, the terminology in the SPI chapter has been cleaned up. A few tests, examples, and code comments are also adjusted to be consistent with documentation changes, but not everything. Reported-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Remove lsn from HashScanPosData.Robert Haas2017-09-26
| | | | | | | | | | | | | | | | This was intended as infrastructure for weakening VACUUM's locking requirements, similar to what was done for btree indexes in commit 2ed5b87f96d473962ec5230fd820abfeaccb2069. However, for hash indexes, it seems that the improvements which are possible are actually extremely marginal. Furthermore, performing the LSN cross-check will end up skipping cleanup far more often than is necessary; we only care about page modifications due to a VACUUM, but the LSN check will fail if ANY modification has occurred. So, rather than pressing forward with that "optimization", just rip the LSN field out. Patch by me, reviewed by Ashutosh Sharma and Amit Kapila Discussion: http://postgr.es/m/CAA4eK1JxqqcuC5Un7YLQVhOYSZBS+t=3xqZuEkt5RyquyuxpwQ@mail.gmail.com
* hash: Implement page-at-a-time scan.Robert Haas2017-09-22
| | | | | | | | | | | | | | | | Commit 09cb5c0e7d6fbc9dee26dc429e4fc0f2a88e5272 added a similar optimization to btree back in 2006, but nobody bothered to implement the same thing for hash indexes, probably because they weren't WAL-logged and had lots of other performance problems as well. As with the corresponding btree case, this eliminates the problem of potentially needing to refind our position within the page, and cuts down on pin/unpin traffic as well. Ashutosh Sharma, reviewed by Alexander Korotkov, Jesper Pedersen, Amit Kapila, and me. Some final edits to comments and README by me. Discussion: http://postgr.es/m/CAE9k0Pm3KTx93K8_5j6VMzG4h5F+SyknxUwXrN-zqSZ9X8ZS3w@mail.gmail.com
* Introduce 64-bit hash functions with a 64-bit seed.Robert Haas2017-08-31
| | | | | | | | | | | | | | | | This will be useful for hash partitioning, which needs a way to seed the hash functions to avoid problems such as a hash index on a hash partitioned table clumping all values into a small portion of the bucket space; it's also useful for anything that wants a 64-bit hash value rather than a 32-bit hash value. Just in case somebody wants a 64-bit hash value that is compatible with the existing 32-bit hash values, make the low 32-bits of the 64-bit hash value match the 32-bit hash value when the seed is 0. Robert Haas and Amul Sul Discussion: http://postgr.es/m/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com
* Phase 3 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Phase 2 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Post-PG 10 beta1 pgindent runBruce Momjian2017-05-17
| | | | perltidy run not included.
* Clean up manipulations of hash indexes' hasho_flag field.Tom Lane2017-04-14
| | | | | | | | | | | | | | | | | | | | | | | | Standardize on testing a hash index page's type by doing (opaque->hasho_flag & LH_PAGE_TYPE) == LH_xxx_PAGE Various places were taking shortcuts like opaque->hasho_flag & LH_BUCKET_PAGE which while not actually wrong, is still bad practice because it encourages use of opaque->hasho_flag & LH_UNUSED_PAGE which *is* wrong (LH_UNUSED_PAGE == 0, so the above is constant false). hash_xlog.c's hash_mask() contained such an incorrect test. This also ensures that we mask out the additional flag bits that hasho_flag has accreted since 9.6. pgstattuple's pgstat_hash_page(), for one, was failing to do that and was thus actively broken. Also fix assorted comments that hadn't been updated to reflect the extended usage of hasho_flag, and fix some macros that were testing just "(hasho_flag & bit)" to use the less dangerous, project-approved form "((hasho_flag & bit) != 0)". Coverity found the bug in hash_mask(); I noted the one in pgstat_hash_page() through code reading.
* Fix formula in _hash_spareindex.Robert Haas2017-04-04
| | | | | | | | | | This was correct in earlier versions of the patch that lead to commit ea69a0dead5128c421140dc53fac165ba4af8520, but somehow got broken in the last version which I actually committed. Mithun Cy, per an off-list report from Ashutosh Sharma Discussion: http://postgr.es/m/CAD__OujbAwNU71v1y-RoQxZ8LZ6-V2UFTkex3v34MK6uZ3Xb5w@mail.gmail.com
* Expand hash indexes more gradually.Robert Haas2017-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since hash indexes typically have very few overflow pages, adding a new splitpoint essentially doubles the on-disk size of the index, which can lead to large and abrupt increases in disk usage (and perhaps long delays on occasion). To mitigate this problem to some degree, divide larger splitpoints into four equal phases. This means that, for example, instead of growing from 4GB to 8GB all at once, a hash index will now grow from 4GB to 5GB to 6GB to 7GB to 8GB, which is perhaps still not as smooth as we'd like but certainly an improvement. This changes the on-disk format of the metapage, so bump HASH_VERSION from 2 to 3. This will force a REINDEX of all existing hash indexes, but that's probably a good idea anyway. First, hash indexes from pre-10 versions of PostgreSQL could easily be corrupted, and we don't want to confuse corruption carried over from an older release with any corruption caused despite the new write-ahead logging in v10. Second, it will let us remove some backward-compatibility code added by commit 293e24e507838733aba4748b514536af2d39d7f2. Mithun Cy, reviewed by Amit Kapila, Jesper Pedersen and me. Regression test outputs updated by me. Discussion: http://postgr.es/m/CAD__OuhG6F1gQLCgMQNnMNgoCvOLQZz9zKYJQNYvYmmJoM42gA@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoYty0jCf-pa+m+vYUJ716+AxM7nv_syvyanyf5O-L_i2A@mail.gmail.com
* Port single-page btree vacuum logic to hash indexes.Robert Haas2017-03-15
| | | | | | | | | | This is advantageous for hash indexes for the same reasons it's good for btrees: it accelerates space recycling, reducing bloat. Ashutosh Sharma, reviewed by Amit Kapila and by me. A bit of additional hacking by me. Discussion: http://postgr.es/m/CAE9k0PkRSyzx8dOnokEpUi2A-RFZK72WN0h9DEMv_ut9q6bPRw@mail.gmail.com
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Improve hash index bucket split behavior.Robert Haas2016-11-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the right to split a bucket was represented by a heavyweight lock on the page number of the primary bucket page. Unfortunately, this meant that every scan needed to take a heavyweight lock on that bucket also, which was bad for concurrency. Instead, use a cleanup lock on the primary bucket page to indicate the right to begin a split, so that scans only need to retain a pin on that page, which is they would have to acquire anyway, and which is also much cheaper. In addition to reducing the locking cost, this also avoids locking out scans and inserts for the entire lifetime of the split: while the new bucket is being populated with copies of the appropriate tuples from the old bucket, scans and inserts can happen in parallel. There are minor concurrency improvements for vacuum operations as well, though the situation there is still far from ideal. This patch also removes the unworldly assumption that a split will never be interrupted. With the new code, a split is done in a series of small steps and the system can pick up where it left off if it is interrupted prior to completion. While this patch does not itself add write-ahead logging for hash indexes, it is clearly a necessary first step, since one of the things that could interrupt a split is the removal of electrical power from the machine performing it. Amit Kapila. I wrote the original design on which this patch is based, and did a good bit of work on the comments and README through multiple rounds of review, but all of the code is Amit's. Also reviewed by Jesper Pedersen, Jeff Janes, and others. Discussion: http://postgr.es/m/CAA4eK1LfzcZYxLoXS874Ad0+S-ZM60U9bwcyiUZx9mHZ-KCWhw@mail.gmail.com
* Fix building of large (bigger than shared_buffers) hash indexes.Tom Lane2016-06-24
| | | | | | | | | | | | | | | | When the index is predicted to need more than NBuffers buckets, CREATE INDEX attempts to sort the index entries by hash key before insertion, so as to reduce thrashing. This code path got broken by commit 9f03ca915196dfc8, which overlooked that _hash_form_tuple() is not just an alias for index_form_tuple(). The index got built anyway, but with garbage data, so that searches for pre-existing tuples always failed. Fix by refactoring to separate construction of the indexable data from calling index_form_tuple(). Per bug #14210 from Daniel Newman. Back-patch to 9.5 where the bug was introduced. Report: <20160623162507.17237.39471@wrigleys.postgresql.org>
* Revert no-op changes to BufferGetPage()Kevin Grittner2016-04-20
| | | | | | | | | | | | | | | | | | The reverted changes were intended to force a choice of whether any newly-added BufferGetPage() calls needed to be accompanied by a test of the snapshot age, to support the "snapshot too old" feature. Such an accompanying test is needed in about 7% of the cases, where the page is being used as part of a scan rather than positioning for other purposes (such as DML or vacuuming). The additional effort required for back-patching, and the doubt whether the intended benefit would really be there, have indicated it is best just to rely on developers to do the right thing based on comments and existing usage, as we do with many other conventions. This change should have little or no effect on generated executable code. Motivated by the back-patching pain of Tom Lane and Robert Haas
* Modify BufferGetPage() to prepare for "snapshot too old" featureKevin Grittner2016-04-08
| | | | | | | | | | | This patch is a no-op patch which is intended to reduce the chances of failures of omission once the functional part of the "snapshot too old" patch goes in. It adds parameters for snapshot, relation, and an enum to specify whether the snapshot age check needs to be done for the page at this point. This initial patch passes NULL for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the third. The follow-on patch will change the places where the test needs to be made.
* Restructure index access method API to hide most of it at the C level.Tom Lane2016-01-17
| | | | | | | | | | | | | | | | | | | | | | | | This patch reduces pg_am to just two columns, a name and a handler function. All the data formerly obtained from pg_am is now provided in a C struct returned by the handler function. This is similar to the designs we've adopted for FDWs and tablesample methods. There are multiple advantages. For one, the index AM's support functions are now simple C functions, making them faster to call and much less error-prone, since the C compiler can now check function signatures. For another, this will make it far more practical to define index access methods in installable extensions. A disadvantage is that SQL-level code can no longer see attributes of index AMs; in particular, some of the crosschecks in the opr_sanity regression test are no longer possible from SQL. We've addressed that by adding a facility for the index AM to perform such checks instead. (Much more could be done in that line, but for now we're content if the amvalidate functions more or less replace what opr_sanity used to do.) We might also want to expose some sort of reporting functionality, but this patch doesn't do that. Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily editorialized on by me.
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Remove unnecessary #include references, per pgrminclude script.Bruce Momjian2011-09-01
|
* Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.hAlvaro Herrera2011-07-04
| | | | | This lets us stop including rel.h into execnodes.h, which is a widely used header.
* Hash indexes had better pass the index collation to support functions, too.Tom Lane2011-04-23
| | | | | Per experimentation with contrib/citext, whose hash function assumes that it'll be passed a collation.
* Pass collations to functions in FunctionCallInfoData, not FmgrInfo.Tom Lane2011-04-12
| | | | | | | | | | | Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Remove very ancient tuple-counting infrastructure (IncrRetrieved() andTom Lane2009-10-08
| | | | | | | | | friends). This code has all been ifdef'd out for many years, and doesn't seem to have any prospect of becoming any more useful in the future. EXPLAIN ANALYZE is what people use in practice, and I think if we did want process-wide counters we'd be more likely to put in dtrace events for that than try to resurrect this code. Get rid of it so as to have one less detail to worry about while refactoring execMain.c.
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Change the reloptions machinery to use a table-based parser, and provideAlvaro Herrera2009-01-05
| | | | | | | | a more complete framework for writing custom option processing routines by user-defined access methods. Catalog version bumped due to the general API changes, which are going to affect user-defined "amoptions" routines.
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Change hash indexes to store only the hash code rather than the whole indexedTom Lane2008-09-15
| | | | | | | | | | | | | | | | value. This means that hash index lookups are always lossy and have to be rechecked when the heap is visited; however, the gain in index compactness outweighs this when the indexed values are wide. Also, we only need to perform datatype comparisons when the hash codes match exactly, rather than for every entry in the hash bucket; so it could also win for datatypes that have expensive comparison functions. A small additional win is gained by keeping hash index pages sorted by hash code and using binary search to reduce the number of index tuples we have to look at. Xiao Meng This commit also incorporates Zdenek Kotala's patch to isolate hash metapages and hash bitmaps a bit better from the page header datastructures.
* Clean up the use of some page-header-access macros: principally, useTom Lane2008-07-13
| | | | | | | | | | SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that makes the code clearer, and avoid casting between Page and PageHeader where possible. Zdenek Kotala, with some additional cleanup by Heikki Linnakangas. I did not apply the parts of the proposed patch that would have resulted in slightly changing the on-disk format of hash indexes; it seems to me that's not a win as long as there's any chance of having in-place upgrade for 8.4.
* Improve our #include situation by moving pointer types away from theAlvaro Herrera2008-06-19
| | | | | | | corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.
* Restructure some header files a bit, in particular heapam.h, by removing someAlvaro Herrera2008-05-12
| | | | | | | | | | | | unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* Tweak hash index AM to use the new ReadOrZeroBuffer bufmgr API when fetchingTom Lane2007-05-03
| | | | | | | pages it intends to zero immediately. Just to show there is some use for that function besides WAL recovery :-). Along the way, fold _hash_checkpage and _hash_pageinit calls into _hash_getbuf and friends, instead of expecting callers to do that separately.
* Add support for cross-type hashing in hash index searches and hash joins.Tom Lane2007-01-30
| | | | | | Hashing for aggregation purposes still needs work, so it's not time to mark any cross-type operators as hashable for general use, but these cases work if the operators are so marked by hand in the system catalogs.