aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor/nodeHash.c
Commit message (Collapse)AuthorAge
* ExecHashRemoveNextSkewBucket must physically copy tuples to main hashtable.Tom Lane2016-02-07
| | | | | | | | | | | | | | | | | | | | | | Commit 45f6240a8fa9d355 added an assumption in ExecHashIncreaseNumBatches and ExecHashIncreaseNumBuckets that they could find all tuples in the main hash table by iterating over the "dense storage" introduced by that patch. However, ExecHashRemoveNextSkewBucket continued its old practice of simply re-linking deleted skew tuples into the main table's hashchains. Hence, such tuples got lost during any subsequent increase in nbatch or nbuckets, and would never get joined, as reported in bug #13908 from Seth P. I (tgl) think that the aforesaid commit has got multiple design issues and should be reworked rather completely; but there is no time for that right now, so band-aid the problem by making ExecHashRemoveNextSkewBucket physically copy deleted skew tuples into the "dense storage" arena. The added test case is able to exhibit the problem by means of fooling the planner with a WHERE condition that it will underestimate the selectivity of, causing the initial nbatch estimate to be too small. Tomas Vondra and Tom Lane. Thanks to David Johnston for initial investigation into the bug report.
* Fix comment block trashed by pgindent.Tom Lane2016-02-06
| | | | Looks like I put the protective dashes in the wrong place in f4e4b32743.
* Improve HJDEBUG code a bit.Tom Lane2016-02-06
| | | | | | | | | | | | | | | Commit 30d7ae3c76d2de144232ae6ab328ca86b70e72c3 introduced an HJDEBUG stanza that probably didn't compile at the time, and definitely doesn't compile now, because it refers to a nonexistent variable. It doesn't seem terribly useful anyway, so just get rid of it. While I'm fooling with it, use %z modifier instead of the obsolete hack of casting size_t to unsigned long, and include the HashJoinTable's address in each printout so that it's possible to distinguish the activities of multiple hashjoins occurring in one query. Noted while trying to use HJDEBUG to investigate bug #13908. Back-patch to 9.5, because code that doesn't compile is certainly not very helpful.
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Further twiddling of nodeHash.c hashtable sizing calculation.Tom Lane2015-10-04
| | | | | | | | | | | On reflection, the submitted patch didn't really work to prevent the request size from exceeding MaxAllocSize, because of the fact that we'd happily round nbuckets up to the next power of 2 after we'd limited it to max_pointers. The simplest way to enforce the limit correctly is to round max_pointers down to a power of 2 when it isn't one already. (Note that the constraint to INT_MAX / 2, if it were doing anything useful at all, is properly applied after that.)
* Fix some issues in new hashtable size calculations in nodeHash.c.Tom Lane2015-10-04
| | | | | | | | | | | | | | | | Limit the size of the hashtable pointer array to not more than MaxAllocSize, per reports from Kouhei Kaigai and others of "invalid memory alloc request size" failures. There was discussion of allowing the array to get larger than that by using the "huge" palloc API, but so far no proof that that is actually a good idea, and at this point in the 9.5 cycle major changes from old behavior don't seem like the way to go. Fix a rather serious secondary bug in the new code, which was that it didn't ensure nbuckets remained a power of 2 when recomputing it for the multiple-batch case. Clean up sloppy division of labor between ExecHashIncreaseNumBuckets and its sole call site.
* Fix bug in calculations of hash join buckets.Kevin Grittner2015-08-19
| | | | | | | | | | | | | | Commit 8cce08f168481c5fc5be4e7e29b968e314f1b41e used a left-shift on a literal of 1 that could (in large allocations) be shifted by 31 or more bits. This was assigned to a local variable that was already declared to be a long to protect against overruns of int, but the literal in this shift needs to be declared long to allow it to work correctly in some compilers. Backpatch to 9.5, where the bug was introduced. Report and patch by KaiGai Kohei, slighly modified based on discussion.
* Avoid some zero-divide hazards in the planner.Tom Lane2015-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | Although I think on all modern machines floating division by zero results in Infinity not SIGFPE, we still don't want infinities running around in the planner's costing estimates; too much risk of that leading to insane behavior. grouping_planner() failed to consider the possibility that final_rel might be known dummy and hence have zero rowcount. (I wonder if it would be better to set a rows estimate of 1 for dummy relations? But at least in the back branches, changing this convention seems like a bad idea, so I'll leave that for another day.) Make certain that get_variable_numdistinct() produces a nonzero result. The case that can be shown to be broken is with stadistinct < 0.0 and small ntuples; we did not prevent the result from rounding to zero. For good luck I applied clamp_row_est() to all the nonconstant return values. In ExecChooseHashTableSize(), Assert that we compute positive nbuckets and nbatch. I know of no reason to think this isn't the case, but it seems like a good safety check. Per reports from Piotr Stefaniak. Back-patch to all active branches.
* Manual cleanup of pgindent results.Tom Lane2015-05-24
| | | | | | Fix some places where pgindent did silly stuff, often because project style wasn't followed to begin with. (I've not touched the atomics headers, though.)
* pgindent run for 9.5Bruce Momjian2015-05-23
|
* Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[].Tom Lane2015-02-21
| | | | | | | This requires changing quite a few places that were depending on sizeof(HeapTupleHeaderData), but it seems for the best. Michael Paquier, some adjustments by me
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Increase number of hash join buckets for underestimate.Kevin Grittner2014-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | If we expect batching at the very beginning, we size nbuckets for "full work_mem" (see how many tuples we can get into work_mem, while not breaking NTUP_PER_BUCKET threshold). If we expect to be fine without batching, we start with the 'right' nbuckets and track the optimal nbuckets as we go (without actually resizing the hash table). Once we hit work_mem (considering the optimal nbuckets value), we keep the value. At the end of the first batch, we check whether (nbuckets != nbuckets_optimal) and resize the hash table if needed. Also, we keep this value for all batches (it's OK because it assumes full work_mem, and it makes the batchno evaluation trivial). So the resize happens only once. There could be cases where it would improve performance to allow the NTUP_PER_BUCKET threshold to be exceeded to keep everything in one batch rather than spilling to a second batch, but attempts to generate such a case have so far been unsuccessful; that issue may be addressed with a follow-on patch after further investigation. Tomas Vondra with minor format and comment cleanup by me Reviewed by Robert Haas, Heikki Linnakangas, and Kevin Grittner
* Fix pointer type in size passed to memset.Heikki Linnakangas2014-09-14
| | | | | | | Pointers are all the same size, so it makes no practical difference, but let's be tidy. Found by Coverity, noted off-list by Tom Lane.
* Change NTUP_PER_BUCKET to 1 to improve hash join lookup speed.Robert Haas2014-09-12
| | | | | | | | | | | | | | | | Since this makes the bucket headers use ~10x as much memory, properly account for that memory when we figure out whether everything fits in work_mem. This might result in some cases that previously used only a single batch getting split into multiple batches, but it's unclear as yet whether we need defenses against that case, and if so, what the shape of those defenses should be. It's worth noting that even in these edge cases, users should still be no worse off than they would have been last week, because commit 45f6240a8fa9d35548eb2ef23dba2c11540aa02a saved a big pile of memory on exactly the same workloads. Tomas Vondra, reviewed and somewhat revised by me.
* Pack tuples in a hash join batch densely, to save memory.Heikki Linnakangas2014-09-10
| | | | | | | | | | | | | Instead of palloc'ing each HashJoinTuple individually, allocate 32kB chunks and pack the tuples densely in the chunks. This avoids the AllocChunk header overhead, and the space wasted by standard allocator's habit of rounding sizes up to the nearest power of two. This doesn't contain any planner changes, because the planner's estimate of memory usage ignores the palloc overhead. Now that the overhead is smaller, the planner's estimates are in fact more accurate. Tomas Vondra, reviewed by Robert Haas.
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Add defenses against integer overflow in dynahash numbuckets calculations.Tom Lane2012-12-11
| | | | | | | | | | | | | | | The dynahash code requires the number of buckets in a hash table to fit in an int; but since we calculate the desired hash table size dynamically, there are various scenarios where we might calculate too large a value. The resulting overflow can lead to infinite loops, division-by-zero crashes, etc. I (tgl) had previously installed some defenses against that in commit 299d1716525c659f0e02840e31fbe4dea3, but that covered only one call path. Moreover it worked by limiting the request size to work_mem, but in a 64-bit machine it's possible to set work_mem high enough that the problem appears anyway. So let's fix the problem at the root by installing limits in the dynahash.c functions themselves. Trouble report and patch by Jeff Davis.
* Split tuple struct defs from htup.h to htup_details.hAlvaro Herrera2012-08-30
| | | | | | | | | | | | This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Rearrange the implementation of index-only scans.Tom Lane2011-10-11
| | | | | | | | | | | | | | | | | | | | | | This commit changes index-only scans so that data is read directly from the index tuple without first generating a faux heap tuple. The only immediate benefit is that indexes on system columns (such as OID) can be used in index-only scans, but this is necessary infrastructure if we are ever to support index-only scans on expression indexes. The executor is now ready for that, though the planner still needs substantial work to recognize the possibility. To do this, Vars in index-only plan nodes have to refer to index columns not heap columns. I introduced a new special varno, INDEX_VAR, to mark such Vars to avoid confusion. (In passing, this commit renames the two existing special varnos to OUTER_VAR and INNER_VAR.) This allows ruleutils.c to handle them with logic similar to what we use for subplan reference Vars. Since index-only scans are now fundamentally different from regular indexscans so far as their expression subtrees are concerned, I also chose to change them to have their own plan node type (and hence, their own executor source file).
* Make EXPLAIN ANALYZE report the numbers of rows rejected by filter steps.Tom Lane2011-09-22
| | | | | | | | | | | | | | | This provides information about the numbers of tuples that were visited but not returned by table scans, as well as the numbers of join tuples that were considered and discarded within a join plan node. There is still some discussion going on about the best way to report counts for outer-join situations, but I think most of what's in the patch would not change if we revise that, so I'm going to go ahead and commit it as-is. Documentation changes to follow (they weren't in the submitted patch either). Marko Tiikkaja, reviewed by Marc Cousin, somewhat revised by Tom
* Remove unnecessary #include references, per pgrminclude script.Bruce Momjian2011-09-01
|
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-10
|
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Support RIGHT and FULL OUTER JOIN in hash joins.Tom Lane2010-12-30
| | | | | | | | | | | | | | | | | | | | | | This is advantageous first because it allows us to hash the smaller table regardless of the outer-join type, and second because hash join can be more flexible than merge join in dealing with arbitrary join quals in a FULL join. For merge join all the join quals have to be mergejoinable, but hash join will work so long as there's at least one hashjoinable qual --- the others can be any condition. (This is true essentially because we don't keep per-inner-tuple match flags in merge join, while hash join can do so.) To do this, we need a has-it-been-matched flag for each tuple in the hashtable, not just one for the current outer tuple. The key idea that makes this practical is that we can store the match flag in the tuple's infomask, since there are lots of bits there that are of no interest for a MinimalTuple. So we aren't increasing the size of the hashtable at all for the feature. To write this without turning the hash code into even more of a pile of spaghetti than it already was, I rewrote ExecHashJoin in a state-machine style, similar to ExecMergeJoin. Other than that decision, it was pretty straightforward.
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Make NestLoop plan nodes pass outer-relation variables into their innerTom Lane2010-07-12
| | | | | | | | | | | | relation using the general PARAM_EXEC executor parameter mechanism, rather than the ad-hoc kluge of passing the outer tuple down through ExecReScan. The previous method was hard to understand and could never be extended to handle parameters coming from multiple join levels. This patch doesn't change the set of possible plans nor have any significant performance effect, but it's necessary infrastructure for future generalization of the concept of an inner indexscan plan. ExecReScan's second parameter is now unused, so it's removed.
* pgindent run for 9.0Bruce Momjian2010-02-26
|
* Wrap calls to SearchSysCache and related functions using macros.Robert Haas2010-02-14
| | | | | | | | | | | | The purpose of this change is to eliminate the need for every caller of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists, GetSysCacheOid, and SearchSysCacheList to know the maximum number of allowable keys for a syscache entry (currently 4). This will make it far easier to increase the maximum number of keys in a future release should we choose to do so, and it makes the code shorter, too. Design and review by Tom Lane.
* Augment EXPLAIN output with more details on Hash nodes.Robert Haas2010-02-01
| | | | | | We show the number of buckets, the number of batches (and also the original number if it has changed), and the peak space used by the hash table. Minor executor changes to track peak space used.
* When estimating the selectivity of an inequality "column > constant" orTom Lane2010-01-04
| | | | | | | | | | | | | | | "column < constant", and the comparison value is in the first or last histogram bin or outside the histogram entirely, try to fetch the actual column min or max value using an index scan (if there is an index on the column). If successful, replace the lower or upper histogram bound with that value before carrying on with the estimate. This limits the estimation error caused by moving min/max values when the comparison value is close to the min or max. Per a complaint from Josh Berkus. It is tempting to consider using this mechanism for mergejoinscansel as well, but that would inject index fetches into main-line join estimation not just endpoint cases. I'm refraining from that until we can get a better handle on the costs of doing this type of lookup.
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Add the ability to store inheritance-tree statistics in pg_statistic,Tom Lane2009-12-29
| | | | | | | | and teach ANALYZE to compute such stats for tables that have subclasses. Per my proposal of yesterday. autovacuum still needs to be taught about running ANALYZE on parent tables when their subclasses change, but the feature is useful even without that.
* Make the overflow guards in ExecChooseHashTableSize be more protective.Tom Lane2009-10-30
| | | | | | | | | | | | | | | The original coding ensured nbuckets and nbatch didn't exceed INT_MAX, which while not insane on its own terms did nothing to protect subsequent code like "palloc(nbatch * sizeof(BufFile *))". Since enormous join size estimates might well be planner error rather than reality, it seems best to constrain the initial sizes to be not more than work_mem/sizeof(pointer), thus ensuring the allocated arrays don't exceed work_mem. We will allow nbatch to get bigger than that during subsequent ExecHashIncreaseNumBatches calls, but we should still guard against integer overflow in those palloc requests. Per bug #5145 from Bernt Marius Johnsen. Although the given test case only seems to fail back to 8.2, previous releases have variants of this issue, so patch all supported branches.
* Remove no-longer-needed ExecCountSlots infrastructure.Tom Lane2009-09-27
|
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Revert DTrace patch from Robert LorBruce Momjian2009-04-02
|
* Add support for additional DTrace probes.Bruce Momjian2009-04-02
| | | | Robert Lor
* Optimize multi-batch hash joins when the outer relation has a nonuniformTom Lane2009-03-21
| | | | | | | | | distribution, by creating a special fast path for the (first few) most common values of the outer relation. Tuples having hashvalues matching the MCVs are effectively forced to be in the first batch, so that we never write them out to the batch temp files. Bryce Cutt and Ramon Lawrence, with some editorialization by me.
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Rework temp_tablespaces patch so that temp tablespaces are assigned separatelyTom Lane2007-06-07
| | | | | | | | | for each temp file, rather than once per sort or hashjoin; this allows spreading the data of a large sort or join across multiple tablespaces. (I remain dubious that this will make any difference in practice, but certain people insisted.) Arrange to cache the results of parsing the GUC variable instead of recomputing from scratch on every demand, and push usage of the cache down to the bottommost fd.c level.
* Create a GUC parameter temp_tablespaces that allows selection of theTom Lane2007-06-03
| | | | | | | | | | tablespace(s) in which to store temp tables and temporary files. This is a list to allow spreading the load across multiple tablespaces (a random list element is chosen each time a temp object is to be created). Temp files are not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace directories. Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
* Buy back some of the cycles spent in more-expensive hash functions byTom Lane2007-06-01
| | | | | | | selecting power-of-2, rather than prime, numbers of buckets in hash joins. If the hash functions are doing their jobs properly by making all hash bits equally random, this is good enough, and it saves expensive integer division and modulus operations.
* Fix bug I introduced in recent patch to make hash joins discard null tuplesTom Lane2007-02-22
| | | | | immediately: ExecHashGetHashValue failed to restore the caller's memory context before taking the failure exit.
* Add support for cross-type hashing in hash index searches and hash joins.Tom Lane2007-01-30
| | | | | | Hashing for aggregation purposes still needs work, so it's not time to mark any cross-type operators as hashable for general use, but these cases work if the operators are so marked by hand in the system catalogs.