aboutsummaryrefslogtreecommitdiff
path: root/src/include/optimizer/paths.h
Commit message (Collapse)AuthorAge
* Fix planner failures with overlapping mergejoin clauses in an outer join.Tom Lane2018-02-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given overlapping or partially redundant join clauses, for example t1 JOIN t2 ON t1.a = t2.x AND t1.b = t2.x the planner's EquivalenceClass machinery will ordinarily refactor the clauses as "t1.a = t1.b AND t1.a = t2.x", so that join processing doesn't see multiple references to the same EquivalenceClass in a list of join equality clauses. However, if the join is outer, it's incorrect to derive a restriction clause on the outer side from the join conditions, so the clause refactoring does not happen and we end up with overlapping join conditions. The code that attempted to deal with such cases had several subtle bugs, which could result in "left and right pathkeys do not match in mergejoin" or "outer pathkeys do not match mergeclauses" planner errors, if the selected join plan type was a mergejoin. (It does not appear that any actually incorrect plan could have been emitted.) The core of the problem really was failure to recognize that the outer and inner relations' pathkeys have different relationships to the mergeclause list. A join's mergeclause list is constructed by reference to the outer pathkeys, so it will always be ordered the same as the outer pathkeys, but this cannot be presumed true for the inner pathkeys. If the inner sides of the mergeclauses contain multiple references to the same EquivalenceClass ({t2.x} in the above example) then a simplistic rendering of the required inner sort order is like "ORDER BY t2.x, t2.x", but the pathkey machinery recognizes that the second sort column is redundant and throws it away. The mergejoin planning code failed to account for that behavior properly. One error was to try to generate cut-down versions of the mergeclause list from cut-down versions of the inner pathkeys in the same way as the initial construction of the mergeclause list from the outer pathkeys was done; this could lead to choosing a mergeclause list that fails to match the outer pathkeys. The other problem was that the pathkey cross-checking code in create_mergejoin_plan treated the inner and outer pathkey lists identically, whereas actually the expectations for them must be different. That led to false "pathkeys do not match" failures in some cases, and in principle could have led to failure to detect bogus plans in other cases, though there is no indication that such bogus plans could be generated. Reported by Alexander Kuzmenkov, who also reviewed this patch. This has been broken for years (back to around 8.3 according to my testing), so back-patch to all supported branches. Discussion: https://postgr.es/m/5dad9160-4632-0e47-e120-8e2082000c01@postgrespro.ru
* Fix mishandling of equivalence-class tests in parameterized plans.Tom Lane2016-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z, it was possible for the planner to omit one of the quals needed to enforce that all members of the equivalence class are actually equal. This only happened in the case of a parameterized join node for two of the relations, that is a plan tree like Nested Loop -> Scan X -> Nested Loop -> Scan Y -> Scan Z Filter: Z.Z = X.X The eclass machinery normally expects to apply X.X = Y.Y when those two relations are joined, but in this shape of plan tree they aren't joined until the top node --- and, if the lower nested loop is marked as parameterized by X, the top node will assume that the relevant eclass condition(s) got pushed down into the lower node. On the other hand, the scan of Z assumes that it's only responsible for constraining Z.Z to match any one of the other eclass members. So one or another of the required quals sometimes fell between the cracks, depending on whether consideration of the eclass in get_joinrel_parampathinfo() for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z as the appropriate constraint there. If it generated the latter, it'd erroneously suppose that the Z scan would take care of matters. To fix, force X.X = Y.Y to be generated and applied at that join node when this case occurs. This is *extremely* hard to hit in practice, because various planner behaviors conspire to mask the problem; starting with the fact that the planner doesn't really like to generate a parameterized plan of the above shape. (It might have been impossible to hit it before we tweaked things to allow this plan shape for star-schema cases.) Many thanks to Alexander Kirkouski for submitting a reproducible test case. The bug can be demonstrated in all branches back to 9.2 where parameterized paths were introduced, so back-patch that far.
* Still more fixes for planner's handling of LATERAL references.Tom Lane2015-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More fuzz testing by Andreas Seltenreich exposed that the planner did not cope well with chains of lateral references. If relation X references Y laterally, and Y references Z laterally, then we will have to scan X on the inside of a nestloop with Z, so for all intents and purposes X is laterally dependent on Z too. The planner did not understand this and would generate intermediate joins that could not be used. While that was usually harmless except for wasting some planning cycles, under the right circumstances it would lead to "failed to build any N-way joins" or "could not devise a query plan" planner failures. To fix that, convert the existing per-relation lateral_relids and lateral_referencers relid sets into their transitive closures; that is, they now show all relations on which a rel is directly or indirectly laterally dependent. This not only fixes the chained-reference problem but allows some of the relevant tests to be made substantially simpler and faster, since they can be reduced to simple bitmap manipulations instead of searches of the LateralJoinInfo list. Also, when a PlaceHolderVar that is due to be evaluated at a join contains lateral references, we should treat those references as indirect lateral dependencies of each of the join's base relations. This prevents us from trying to join any individual base relations to the lateral reference source before the join is formed, which again cannot work. Andreas' testing also exposed another oversight in the "dangerous PlaceHolderVar" test added in commit 85e5e222b1dd02f1. Simply rejecting unsafe join paths in joinpath.c is insufficient, because in some cases we will end up rejecting *all* possible paths for a particular join, again leading to "could not devise a query plan" failures. The restriction has to be known also to join_is_legal and its cohort functions, so that they will not select a join for which that will happen. I chose to move the supporting logic into joinrels.c where the latter functions are. Back-patch to 9.3 where LATERAL support was introduced.
* Fix eclass_useful_for_merging to give valid results for appendrel children.Tom Lane2015-08-06
| | | | | | | | | | | | | | | | | | | | Formerly, this function would always return "true" for an appendrel child relation, because it would think that the appendrel parent was a potential join target for the child. In principle that should only lead to some inefficiency in planning, but fuzz testing by Andreas Seltenreich disclosed that it could lead to "could not find pathkey item to sort" planner errors in odd corner cases. Specifically, we would think that all columns of a child table's multicolumn index were interesting pathkeys, causing us to generate a MergeAppend path that sorts by all the columns. However, if any of those columns weren't actually used above the level of the appendrel, they would not get added to that rel's targetlist, which would result in being unable to resolve the MergeAppend's sort keys against its targetlist during createplan.c. Backpatch to 9.3. In older versions, columns of an appendrel get added to its targetlist even if they're not mentioned above the scan level, so that the failure doesn't occur. It might be worth back-patching this fix to older versions anyway, but I'll refrain for the moment.
* Code review for foreign/custom join pushdown patch.Tom Lane2015-05-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e7cb7ee14555cc9c5773e2c102efd6371f6f2005 included some design decisions that seem pretty questionable to me, and there was quite a lot of stuff not to like about the documentation and comments. Clean up as follows: * Consider foreign joins only between foreign tables on the same server, rather than between any two foreign tables with the same underlying FDW handler function. In most if not all cases, the FDW would simply have had to apply the same-server restriction itself (far more expensively, both for lack of caching and because it would be repeated for each combination of input sub-joins), or else risk nasty bugs. Anyone who's really intent on doing something outside this restriction can always use the set_join_pathlist_hook. * Rename fdw_ps_tlist/custom_ps_tlist to fdw_scan_tlist/custom_scan_tlist to better reflect what they're for, and allow these custom scan tlists to be used even for base relations. * Change make_foreignscan() API to include passing the fdw_scan_tlist value, since the FDW is required to set that. Backwards compatibility doesn't seem like an adequate reason to expect FDWs to set it in some ad-hoc extra step, and anyway existing FDWs can just pass NIL. * Change the API of path-generating subroutines of add_paths_to_joinrel, and in particular that of GetForeignJoinPaths and set_join_pathlist_hook, so that various less-used parameters are passed in a struct rather than as separate parameter-list entries. The objective here is to reduce the probability that future additions to those parameter lists will result in source-level API breaks for users of these hooks. It's possible that this is even a small win for the core code, since most CPU architectures can't pass more than half a dozen parameters efficiently anyway. I kept root, joinrel, outerrel, innerrel, and jointype as separate parameters to reduce code churn in joinpath.c --- in particular, putting jointype into the struct would have been problematic because of the subroutines' habit of changing their local copies of that variable. * Avoid ad-hocery in ExecAssignScanProjectionInfo. It was probably all right for it to know about IndexOnlyScan, but if the list is to grow we should refactor the knowledge out to the callers. * Restore nodeForeignscan.c's previous use of the relcache to avoid extra GetFdwRoutine lookups for base-relation scans. * Lots of cleanup of documentation and missed comments. Re-order some code additions into more logical places.
* Allow FDWs and custom scan providers to replace joins with scans.Robert Haas2015-05-01
| | | | | | | | | | | | | | | | | Foreign data wrappers can use this capability for so-called "join pushdown"; that is, instead of executing two separate foreign scans and then joining the results locally, they can generate a path which performs the join on the remote server and then is scanned locally. This commit does not extend postgres_fdw to take advantage of this capability; it just provides the infrastructure. Custom scan providers can use this in a similar way. Previously, it was only possible for a custom scan provider to scan a single relation. Now, it can scan an entire join tree, provided of course that it knows how to produce the same results that the join would have produced if executed normally. KaiGai Kohei, reviewed by Shigeru Hanada, Ashutosh Bapat, and me.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Simplify API for initially hooking custom-path providers into the planner.Tom Lane2014-11-21
| | | | | | | | | | Instead of register_custom_path_provider and a CreateCustomScanPath callback, let's just provide a standard function hook in set_rel_pathlist. This is more flexible than what was previously committed, is more like the usual conventions for planner hooks, and requires less support code in the core. We had discussed this design (including centralizing the set_cheapest() calls) back in March or so, so I'm not sure why it wasn't done like this already.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Remove dead code now that orindxpath.c is history.Tom Lane2013-12-30
| | | | | | We don't need make_restrictinfo_from_bitmapqual() anymore at all. generate_bitmap_or_paths() doesn't need to be exported, and we can drop its rather klugy restriction_only flag.
* Extract restriction OR clauses whether or not they are indexable.Tom Lane2013-12-30
| | | | | | | | | | | | | | | | | | | | | | | | | It's possible to extract a restriction OR clause from a join clause that has the form of an OR-of-ANDs, if each sub-AND includes a clause that mentions only one specific relation. While PG has been aware of that idea for many years, the code previously only did it if it could extract an indexable OR clause. On reflection, though, that seems a silly limitation: adding a restriction clause can be a win by reducing the number of rows that have to be filtered at the join step, even if we have to test the clause as a plain filter clause during the scan. This should be especially useful for foreign tables, where the change can cut the number of rows that have to be retrieved from the foreign server; but testing shows it can win even on local tables. Per a suggestion from Robert Haas. As a heuristic, I made the code accept an extracted restriction clause if its estimated selectivity is less than 0.9, which will probably result in accepting extracted clauses just about always. We might need to tweak that later based on experience. Since the code no longer has even a weak connection to Path creation, remove orindxpath.c and create a new file optimizer/util/orclauses.c. There's some additional janitorial cleanup of now-dead code that needs to happen, but it seems like that's a fit subject for a separate commit.
* Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.Tom Lane2013-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the ability to write TABLE( function1(), function2(), ...) as a single FROM-clause entry. The result is the concatenation of the first row from each function, followed by the second row from each function, etc; with NULLs inserted if any function produces fewer rows than others. This is believed to be a much more useful behavior than what Postgres currently does with multiple SRFs in a SELECT list. This syntax also provides a reasonable way to combine use of column definition lists with WITH ORDINALITY: put the column definition list inside TABLE(), where it's clear that it doesn't control the ordinality column as well. Also implement SQL-compliant multiple-argument UNNEST(), by turning UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)). The SQL standard specifies TABLE() with only a single function, not multiple functions, and it seems to require an implicit UNNEST() which is not what this patch does. There may be something wrong with that reading of the spec, though, because if it's right then the spec's TABLE() is just a pointless alternative spelling of UNNEST(). After further review of that, we might choose to adopt a different syntax for what this patch does, but in any case this functionality seems clearly worthwhile. Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and significantly revised by me
* Compute correct em_nullable_relids in get_eclass_for_sort_expr().Tom Lane2013-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | Bug #8591 from Claudio Freire demonstrates that get_eclass_for_sort_expr must be able to compute valid em_nullable_relids for any new equivalence class members it creates. I'd worried about this in the commit message for db9f0e1d9a4a0842c814a464cdc9758c3f20b96c, but claimed that it wasn't a problem because multi-member ECs should already exist when it runs. That is transparently wrong, though, because this function is also called by initialize_mergeclause_eclasses, which runs during deconstruct_jointree. The example given in the bug report (which the new regression test item is based upon) fails because the COALESCE() expression is first seen by initialize_mergeclause_eclasses rather than process_equivalence. Fixing this requires passing the appropriate nullable_relids set to get_eclass_for_sort_expr, and it requires new code to compute that set for top-level expressions such as ORDER BY, GROUP BY, etc. We store the top-level nullable_relids in a new field in PlannerInfo to avoid computing it many times. In the back branches, I've added the new field at the end of the struct to minimize ABI breakage for planner plugins. There doesn't seem to be a good alternative to changing get_eclass_for_sort_expr's API signature, though. There probably aren't any third-party extensions calling that function directly; moreover, if there are, they probably need to think about what to pass for nullable_relids anyway. Back-patch to 9.2, like the previous patch in this area.
* Postpone creation of pathkeys lists to fix bug #8049.Tom Lane2013-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch gets rid of the concept of, and infrastructure for, non-canonical PathKeys; we now only ever create canonical pathkey lists. The need for non-canonical pathkeys came from the desire to have grouping_planner initialize query_pathkeys and related pathkey lists before calling query_planner. However, since query_planner didn't actually *do* anything with those lists before they'd been made canonical, we can get rid of the whole mess by just not creating the lists at all until the point where we formerly canonicalized them. There are several ways in which we could implement that without making query_planner itself deal with grouping/sorting features (which are supposed to be the province of grouping_planner). I chose to add a callback function to query_planner's API; other alternatives would have required adding more fields to PlannerInfo, which while not bad in itself would create an ABI break for planner-related plugins in the 9.2 release series. This still breaks ABI for anything that calls query_planner directly, but it seems somewhat unlikely that there are any such plugins. I had originally conceived of this change as merely a step on the way to fixing bug #8049 from Teun Hoogendoorn; but it turns out that this fixes that bug all by itself, as per the added regression test. The reason is that now get_eclass_for_sort_expr is adding the ORDER BY expression at the end of EquivalenceClass creation not the start, and so anything that is in a multi-member EquivalenceClass has already been created with correct em_nullable_relids. I am suspicious that there are related scenarios in which we still need to teach get_eclass_for_sort_expr to compute correct nullable_relids, but am not eager to risk destabilizing either 9.2 or 9.3 to fix bugs that are only hypothetical. So for the moment, do this and stop here. Back-patch to 9.2 but not to earlier branches, since they don't exhibit this bug for lack of join-clause-movement logic that depends on em_nullable_relids being correct. (We might have to revisit that choice if any related bugs turn up.) In 9.2, don't change the signature of make_pathkeys_for_sortclauses nor remove canonicalize_pathkeys, so as not to risk more plugin breakage than we have to.
* Redo postgres_fdw's planner code so it can handle parameterized paths.Tom Lane2013-03-21
| | | | | | | | | | | | I wasn't going to ship this without having at least some example of how to do that. This version isn't terribly bright; in particular it won't consider any combinations of multiple join clauses. Given the cost of executing a remote EXPLAIN, I'm not sure we want to be very aggressive about doing that, anyway. In support of this, refactor generate_implied_equalities_for_indexcol so that it can be used to extract equivalence clauses that aren't necessarily tied to an index.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Fix SELECT DISTINCT with index-optimized MIN/MAX on inheritance trees.Tom Lane2012-11-26
| | | | | | | | | | | | | | | | | | | | | In a query such as "SELECT DISTINCT min(x) FROM tab", the DISTINCT is pretty useless (there being only one output row), but nonetheless it shouldn't fail. But it could fail if "tab" is an inheritance parent, because planagg.c's code for fixing up equivalence classes after making the index-optimized MIN/MAX transformation wasn't prepared to find child-table versions of the aggregate expression. The least ugly fix seems to be to add an option to mutate_eclass_expressions() to skip child-table equivalence class members, which aren't used anymore at this stage of planning so it's not really necessary to fix them. Since child members are ignored in many cases already, it seems plausible for mutate_eclass_expressions() to have an option to ignore them too. Per bug #7703 from Maxim Boguk. Back-patch to 9.1. Although the same code exists before that, it cannot encounter child-table aggregates AFAICS, because the index optimization transformation cannot succeed on inheritance trees before 9.1 (for lack of MergeAppend).
* Suppress creation of backwardly-indexed paths for LATERAL join clauses.Tom Lane2012-08-30
| | | | | | | | | | | | | Given a query such as SELECT * FROM foo JOIN LATERAL (SELECT foo.var1) ss(x) ON ss.x = foo.var2 the existence of the join clause "ss.x = foo.var2" encourages indxpath.c to build a parameterized path for foo using any index available for foo.var2. This is completely useless activity, though, since foo has got to be on the outside not the inside of any nestloop join with ss. It's reasonably inexpensive to add tests that prevent creation of such paths, so let's do that.
* Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian2012-06-10
| | | | commit-fest.
* Revise parameterized-path mechanism to fix assorted issues.Tom Lane2012-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adjusts the treatment of parameterized paths so that all paths with the same parameterization (same set of required outer rels) for the same relation will have the same rowcount estimate. We cache the rowcount estimates to ensure that property, and hopefully save a few cycles too. Doing this makes it practical for add_path_precheck to operate without a rowcount estimate: it need only assume that paths with different parameterizations never dominate each other, which is close enough to true anyway for coarse filtering, because normally a more-parameterized path should yield fewer rows thanks to having more join clauses to apply. In add_path, we do the full nine yards of comparing rowcount estimates along with everything else, so that we can discard parameterized paths that don't actually have an advantage. This fixes some issues I'd found with add_path rejecting parameterized paths on the grounds that they were more expensive than not-parameterized ones, even though they yielded many fewer rows and hence would be cheaper once subsequent joining was considered. To make the same-rowcounts assumption valid, we have to require that any parameterized path enforce *all* join clauses that could be obtained from the particular set of outer rels, even if not all of them are useful for indexing. This is required at both base scans and joins. It's a good thing anyway since the net impact is that join quals are checked at the lowest practical level in the join tree. Hence, discard the original rather ad-hoc mechanism for choosing parameterization joinquals, and build a better one that has a more principled rule for when clauses can be moved. The original rule was actually buggy anyway for lack of knowledge about which relations are part of an outer join's outer side; getting this right requires adding an outer_relids field to RestrictInfo.
* Revisit handling of UNION ALL subqueries with non-Var output columns.Tom Lane2012-03-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 57664ed25e5dea117158a2e663c29e60b3546e1c I tried to fix a bug reported by Teodor Sigaev by making non-simple-Var output columns distinct (by wrapping their expressions with dummy PlaceHolderVar nodes). This did not work too well. Commit b28ffd0fcc583c1811e5295279e7d4366c3cae6c fixed some ensuing problems with matching to child indexes, but per a recent report from Claus Stadler, constraint exclusion of UNION ALL subqueries was still broken, because constant-simplification didn't handle the injected PlaceHolderVars well either. On reflection, the original patch was quite misguided: there is no reason to expect that EquivalenceClass child members will be distinct. So instead of trying to make them so, we should ensure that we can cope with the situation when they're not. Accordingly, this patch reverts the code changes in the above-mentioned commits (though the regression test cases they added stay). Instead, I've added assorted defenses to make sure that duplicate EC child members don't cause any problems. Teodor's original problem ("MergeAppend child's targetlist doesn't match MergeAppend") is addressed more directly by revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort list guide creation of each child's sort list. In passing, get rid of add_sort_column; as far as I can tell, testing for duplicate sort keys at this stage is dead code. Certainly it doesn't trigger often enough to be worth expending cycles on in ordinary queries. And keeping the test would've greatly complicated the new logic in prepare_sort_from_pathkeys, because comparing pathkey list entries against a previous output array requires that we not skip any entries in the list. Back-patch to 9.1, like the previous patches. The only known issue in this area that wasn't caused by the ill-advised previous patches was the MergeAppend planning failure, which of course is not relevant before 9.1. It's possible that we need some of the new defenses against duplicate child EC entries in older branches, but until there's some clear evidence of that I'm going to refrain from back-patching further.
* Use parameterized paths to generate inner indexscans more flexibly.Tom Lane2012-01-27
| | | | | | | | | | | | | | | | | | | This patch fixes the planner so that it can generate nestloop-with- inner-indexscan plans even with one or more levels of joining between the indexscan and the nestloop join that is supplying the parameter. The executor was fixed to handle such cases some time ago, but the planner was not ready. This should improve our plans in many situations where join ordering restrictions formerly forced complete table scans. There is probably a fair amount of tuning work yet to be done, because of various heuristics that have been added to limit the number of parameterized paths considered. However, we are not going to find out what needs to be adjusted until the code gets some real-world use, so it's time to get it in there where it can be tested easily. Note API change for index AM amcostestimate functions. I'm not aware of any non-core index AMs, but if there are any, they will need minor adjustments.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Rethink representation of index clauses' mapping to index columns.Tom Lane2011-12-24
| | | | | | | | | | | | | | | | | | | | | In commit e2c2c2e8b1df7dfdb01e7e6f6191a569ce3c3195 I made use of nested list structures to show which clauses went with which index columns, but on reflection that's a data structure that only an old-line Lisp hacker could love. Worse, it adds unnecessary complication to the many places that don't much care which clauses go with which index columns. Revert to the previous arrangement of flat lists of clauses, and instead add a parallel integer list of column numbers. The places that care about the pairing can chase both lists with forboth(), while the places that don't care just examine one list the same as before. The only real downside to this is that there are now two more lists that need to be passed to amcostestimate functions in case they care about column matching (which btcostestimate does, so not passing the info is not an option). Rather than deal with 11-argument amcostestimate functions, pass just the IndexPath and expect the functions to extract fields from it. That gets us down to 7 arguments which is better than 11, and it seems more future-proof against likely additions to the information we keep about an index path.
* Improve planner's handling of duplicated index column expressions.Tom Lane2011-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It's potentially useful for an index to repeat the same indexable column or expression in multiple index columns, if the columns have different opclasses. (If they share opclasses too, the duplicate column is pretty useless, but nonetheless we've allowed such cases since 9.0.) However, the planner failed to cope with this, because createplan.c was relying on simple equal() matching to figure out which index column each index qual is intended for. We do have that information available upstream in indxpath.c, though, so the fix is to not flatten the multi-level indexquals list when putting it into an IndexPath. Then we can rely on the sublist structure to identify target index columns in createplan.c. There's a similar issue for index ORDER BYs (the KNNGIST feature), so introduce a multi-level-list representation for that too. This adds a bit more representational overhead, but we might more or less buy that back by not having to search for matching index columns anymore in createplan.c; likewise btcostestimate saves some cycles. Per bug #6351 from Christian Rudolph. Likely symptoms include the "btree index keys must be ordered by attribute" failure shown there, as well as "operator MMMM is not a member of opfamily NNNN". Although this is a pre-existing problem that can be demonstrated in 9.0 and 9.1, I'm not going to back-patch it, because the API changes in the planner seem likely to break things such as index plugins. The corner cases where this matters seem too narrow to justify possibly breaking things in a minor release.
* Improve planner's ability to recognize cases where an IN's RHS is unique.Tom Lane2011-10-26
| | | | | | | | | | | | If the right-hand side of a semijoin is unique, then we can treat it like a normal join (or another way to say that is: we don't need to explicitly unique-ify the data before doing it as a normal join). We were recognizing such cases when the RHS was a sub-query with appropriate DISTINCT or GROUP BY decoration, but there's another way: if the RHS is a plain relation with unique indexes, we can check if any of the indexes prove the output is unique. Most of the infrastructure for that was there already in the join removal code, though I had to rearrange it a bit. Per reflection about a recent example in pgsql-performance.
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-10
|
* Reimplement planner's handling of MIN/MAX aggregate optimization (again).Tom Lane2011-03-22
| | | | | | | | | | | | | | Instead of playing cute games with pathkeys, just build a direct representation of the intended sub-select, and feed it through query_planner to get a Path for the index access. This is a bit slower than 9.1's previous method, since we'll duplicate most of the overhead of query_planner; but since the whole optimization only applies to rather simple single-table queries, that probably won't be much of a problem in practice. The advantage is that we get to do the right thing when there's a partial index that needs the implicit IS NOT NULL clause to be usable. Also, although this makes planagg.c be a bit more closely tied to the ordering of operations in grouping_planner, we can get rid of some coupling to lower-level parts of the planner. Per complaint from Marti Raudsepp.
* Revise collation derivation method and expression-tree representation.Tom Lane2011-03-19
| | | | | | | | | | | | | | | | | | | All expression nodes now have an explicit output-collation field, unless they are known to only return a noncollatable data type (such as boolean or record). Also, nodes that can invoke collation-aware functions store a separate field that is the collation value to pass to the function. This avoids confusion that arises when a function has collatable inputs and noncollatable output type, or vice versa. Also, replace the parser's on-the-fly collation assignment method with a post-pass over the completed expression tree. This allows us to use a more complex (and hopefully more nearly spec-compliant) assignment rule without paying for it in extra storage in every expression node. Fix assorted bugs in the planner's handling of collations by making collation one of the defining properties of an EquivalenceClass and by converting CollateExprs into discardable RelabelType nodes during expression preprocessing.
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Reimplement planner's handling of MIN/MAX aggregate optimization.Tom Lane2010-11-04
| | | | | | | | | | | | | | | | | | | | | | Per my recent proposal, get rid of all the direct inspection of indexes and manual generation of paths in planagg.c. Instead, set up EquivalenceClasses for the aggregate argument expressions, and let the regular path generation logic deal with creating paths that can satisfy those sort orders. This makes planagg.c a bit more visible to the rest of the planner than it was originally, but the approach is basically a lot cleaner than before. A major advantage of doing it this way is that we get MIN/MAX optimization on inheritance trees (using MergeAppend of indexscans) practically for free, whereas in the old way we'd have had to add a whole lot more duplicative logic. One small disadvantage of this approach is that MIN/MAX aggregates can no longer exploit partial indexes having an "x IS NOT NULL" predicate, unless that restriction or something that implies it is specified in the query. The previous implementation was able to use the added "x IS NOT NULL" condition as an extra predicate proof condition, but in this version we rely entirely on indexes that are considered usable by the main planning process. That seems a fair tradeoff for the simplicity and functionality gained.
* Avoid creation of useless EquivalenceClasses during planning.Tom Lane2010-10-29
| | | | | | | | | | | | | | | | | | | | | | Zoltan Boszormenyi exhibited a test case in which planning time was dominated by construction of EquivalenceClasses and PathKeys that had no actual relevance to the query (and in fact got discarded immediately). This happened because we generated PathKeys describing the sort ordering of every index on every table in the query, and only after that checked to see if the sort ordering was relevant. The EC/PK construction code is O(N^2) in the number of ECs, which is all right for the intended number of such objects, but it gets out of hand if there are ECs for lots of irrelevant indexes. To fix, twiddle the handling of mergeclauses a little bit to ensure that every interesting EC is created before we begin path generation. (This doesn't cost anything --- in fact I think it's a bit cheaper than before --- since we always eventually created those ECs anyway.) Then, if an index column can't be found in any pre-existing EC, we know that that sort ordering is irrelevant for the query. Instead of creating a useless EC, we can just not build a pathkey for the index column in the first place. The index will still be considered if it's useful for non-order-related reasons, but we will think of its output as unsorted.
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Eliminate a lot of list-management overhead within join_search_one_levelTom Lane2009-11-28
| | | | | | | | | | | | | | by adding a requirement that build_join_rel add new join RelOptInfos to the appropriate list immediately at creation. Per report from Robert Haas, the list_concat_unique_ptr() calls that this change eliminates were taking the lion's share of the runtime in larger join problems. This doesn't do anything to fix the fundamental combinatorial explosion in large join problems, but it should push out the threshold of pain a bit further. Note: because this changes the order in which joinrel lists are built, it might result in changes in selected plans in cases where different alternatives have exactly the same costs. There is one example in the regression tests.
* Implement "join removal" for cases where the inner side of a left joinTom Lane2009-09-17
| | | | | | | | | | | | | | is unique and is not referenced above the join. In this case the inner side doesn't affect the query result and can be thrown away entirely. Although perhaps nobody would ever write such a thing by hand, it's a reasonably common case in machine-generated SQL. The current implementation only recognizes the case where the inner side is a simple relation with a unique index matching the query conditions. This is enough for the use-cases that have been shown so far, but we might want to try to handle other cases later. Robert Haas, somewhat rewritten by Tom
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Implement SEMI and ANTI joins in the planner and executor. (Semijoins replaceTom Lane2008-08-14
| | | | | | | | | | | | | | the old JOIN_IN code, but antijoins are new functionality.) Teach the planner to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL filters are recognized as being anti joins. Unify the InClauseInfo and OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change, it becomes possible to associate a SpecialJoinInfo with every join attempt, which permits some cleanup of join selectivity estimation. That needs to be taken much further than this patch does, but the next step is to change the API for oprjoin selectivity functions, which seems like material for a separate patch. So for the moment the output size estimates for semi and especially anti joins are quite bogus.
* Apply my original fix for Taiki Yamaguchi's bug report about DISTINCT MAX().Tom Lane2008-03-31
| | | | Add some regression tests for plausible failures in this area.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* Re-run pgindent with updated list of typedefs. (Updated README shouldBruce Momjian2007-11-15
| | | | avoid this problem in the future.)
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Fix EquivalenceClass code to handle volatile sort expressions in a moreTom Lane2007-11-08
| | | | | | | | | | | | | | predictable manner; in particular that if you say ORDER BY output-column-ref, it will in fact sort by that specific column even if there are multiple syntactic matches. An example is SELECT random() AS a, random() AS b FROM ... ORDER BY b, a; While the use-case for this might be a bit debatable, it worked as expected in earlier releases, so we should preserve the behavior for 8.3. Per my recent proposal. While at it, fix convert_subquery_pathkeys() to handle RelabelType stripping in both directions; it needs this for the same reasons make_sort_from_pathkeys does.
* Create a function variable "join_search_hook" to let plugins override theTom Lane2007-09-26
| | | | | | | | join search order portion of the planner; this is specifically intended to simplify developing a replacement for GEQO planning. Patch by Julius Stroffek, editorialized on by me. I renamed make_one_rel_by_joins to standard_join_search and make_rels_by_joins to join_search_one_level to better reflect their place within this scheme.
* Fix best_inner_indexscan to return both the cheapest-total-cost andTom Lane2007-05-22
| | | | | | | | | | | | cheapest-startup-cost innerjoin indexscans, and make joinpath.c consider both of these (when different) as the inside of a nestloop join. The original design was based on the assumption that indexscan paths always have negligible startup cost, and so total cost is the only important figure of merit; an assumption that's obviously broken by bitmap indexscans. This oversight could lead to choosing poor plans in cases where fast-start behavior is more important than total cost, such as LIMIT and IN queries. 8.1-vintage brain fade exposed by an example from Chuck D.
* Avoid running build_index_pathkeys() in situations where there cannotTom Lane2007-04-15
| | | | | | | possibly be any useful pathkeys --- to wit, queries with neither any join clauses nor any ORDER BY request. It's nearly free to check for this case and it saves a useful fraction of the planning time for simple queries.
* Restructure code that is responsible for ensuring that clauseless joins areTom Lane2007-02-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | considered when it is necessary to do so because of a join-order restriction (that is, an outer-join or IN-subselect construct). The former coding was a bit ad-hoc and inconsistent, and it missed some cases, as exposed by Mario Weilguni's recent bug report. His specific problem was that an IN could be turned into a "clauseless" join due to constant-propagation removing the IN's joinclause, and if the IN's subselect involved more than one relation and there was more than one such IN linking to the same upper relation, then the only valid join orders involve "bushy" plans but we would fail to consider the specific paths needed to get there. (See the example case added to the join regression test.) On examining the code I wonder if there weren't some other problem cases too; in particular it seems that GEQO was defending against a different set of corner cases than the main planner was. There was also an efficiency problem, in that when we did realize we needed a clauseless join because of an IN, we'd consider clauseless joins against every other relation whether this was sensible or not. It seems a better design is to use the outer-join and in-clause lists as a backup heuristic, just as the rule of joining only where there are joinclauses is a heuristic: we'll join two relations if they have a usable joinclause *or* this might be necessary to satisfy an outer-join or IN-clause join order restriction. I refactored the code to have just one place considering this instead of three, and made sure that it covered all the cases that any of them had been considering. Backpatch as far as 8.1 (which has only the IN-clause form of the disease). By rights 8.0 and 7.4 should have the bug too, but they accidentally fail to fail, because the joininfo structure used in those releases preserves some memory of there having once been a joinclause between the inner and outer sides of an IN, and so it leads the code in the right direction anyway. I'll be conservative and not touch them.
* Refactor planner's pathkeys data structure to create a separate, explicitTom Lane2007-01-20
| | | | | | | | | | | | | | representation of equivalence classes of variables. This is an extensive rewrite, but it brings a number of benefits: * planner no longer fails in the presence of "incomplete" operator families that don't offer operators for every possible combination of datatypes. * avoid generating and then discarding redundant equality clauses. * remove bogus assumption that derived equalities always use operators named "=". * mergejoins can work with a variety of sort orders (e.g., descending) now, instead of tying each mergejoinable operator to exactly one sort order. * better recognition of redundant sort columns. * can make use of equalities appearing underneath an outer join.
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-05
| | | | back-stamped for this.