aboutsummaryrefslogtreecommitdiff
path: root/src/backend/optimizer/util
Commit message (Collapse)AuthorAge
* Avoid inserting Result nodes that only compute identity projections.Tom Lane2013-03-14
| | | | | | | | | | | | | | | | | | | | The planner sometimes inserts Result nodes to perform column projections (ie, arbitrary scalar calculations) above plan nodes that lack projection logic of their own. However, we did that even if the lower plan node was in fact producing the required column set already; which is a pretty common case given the popularity of "SELECT * FROM ...". Measurements show that the useless plan node adds non-negligible overhead, especially when there are many columns in the result. So add a check to avoid inserting a Result node unless there's something useful for it to do. There are a couple of remaining places where unnecessary Result nodes could get inserted, but they are (a) much less performance-critical, and (b) coded in such a way that it's hard to avoid inserting a Result, because the desired tlist is changed on-the-fly in subsequent logic. We'll leave those alone for now. Kyotaro Horiguchi; reviewed and further hacked on by Amit Kapila and Tom Lane.
* Arrange to cache FdwRoutine structs in foreign tables' relcache entries.Tom Lane2013-03-06
| | | | | | | This saves several catalog lookups per reference. It's not all that exciting right now, because we'd managed to minimize the number of places that need to fetch the data; but the upcoming writable-foreign-tables patch needs this info in a lot more places.
* Add a materialized view relations.Kevin Grittner2013-03-03
| | | | | | | | | | | | | | | | | | | | | | A materialized view has a rule just like a view and a heap and other physical properties like a table. The rule is only used to populate the table, references in queries refer to the materialized data. This is a minimal implementation, but should still be useful in many cases. Currently data is only populated "on demand" by the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements. It is expected that future releases will add incremental updates with various timings, and that a more refined concept of defining what is "fresh" data will be developed. At some point it may even be possible to have queries use a materialized in place of references to underlying tables, but that requires the other above-mentioned features to be working first. Much of the documentation work by Robert Haas. Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja Security review by KaiGai Kohei, with a decision on how best to implement sepgsql still pending.
* Add infrastructure for storing a VARIADIC ANY function's VARIADIC flag.Tom Lane2013-01-21
| | | | | | | | | | | | | | | | | Originally we didn't bother to mark FuncExprs with any indication whether VARIADIC had been given in the source text, because there didn't seem to be any need for it at runtime. However, because we cannot fold a VARIADIC ANY function's arguments into an array (since they're not necessarily all the same type), we do actually need that information at runtime if VARIADIC ANY functions are to respond unsurprisingly to use of the VARIADIC keyword. Add the missing field, and also fix ruleutils.c so that VARIADIC ANY function calls are dumped properly. Extracted from a larger patch that also fixes concat() and format() (the only two extant VARIADIC ANY functions) to behave properly when VARIADIC is specified. This portion seems appropriate to review and commit separately. Pavel Stehule
* Redesign the planner's handling of index-descent cost estimation.Tom Lane2013-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically we've used a couple of very ad-hoc fudge factors to try to get the right results when indexes of different sizes would satisfy a query with the same number of index leaf tuples being visited. In commit 21a39de5809cd3050a37d2554323cc1d0cbeed9d I tweaked one of these fudge factors, with results that proved disastrous for larger indexes. Commit bf01e34b556ff37982ba2d882db424aa484c0d07 fudged it some more, but still with not a lot of principle behind it. What seems like a better way to address these issues is to explicitly model index-descent costs, since that's what's really at stake when considering diferent indexes with similar leaf-page-level costs. We tried that once long ago, and found that charging random_page_cost per page descended through was way too much, because upper btree levels tend to stay in cache in real-world workloads. However, there's still CPU costs to think about, and the previous fudge factors can be seen as a crude attempt to account for those costs. So this patch replaces those fudge factors with explicit charges for the number of tuple comparisons needed to descend the index tree, plus a small charge per page touched in the descent. The cost multipliers are chosen so that the resulting charges are in the vicinity of the historical (pre-9.2) fudge factors for indexes of up to about a million tuples, while not ballooning unreasonably beyond that, as the old fudge factor did (even more so in 9.2). To make this work accurately for btree indexes, add some code that allows extraction of the known root-page height from a btree. There's no equivalent number readily available for other index types, but we can use the log of the number of index pages as an approximate substitute. This seems like too much of a behavioral change to risk back-patching, but it should improve matters going forward. In 9.2 I'll just revert the fudge-factor change.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY.Tom Lane2012-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a poor choice of catalog state representation. The pg_index state for an index that's reached the final pre-drop stage was the same as the state for an index just created by CREATE INDEX CONCURRENTLY. This meant that the (necessary) change to make RelationGetIndexList ignore about-to-die indexes also made it ignore freshly-created indexes; which is catastrophic because the latter do need to be considered in HOT-safety decisions. Failure to do so leads to incorrect index entries and subsequently wrong results from queries depending on the concurrently-created index. To fix, add an additional boolean column "indislive" to pg_index, so that the freshly-created and about-to-die states can be distinguished. (This change obviously is only possible in HEAD. This patch will need to be back-patched, but in 9.2 we'll use a kluge consisting of overloading the formerly-impossible state of indisvalid = true and indisready = false.) In addition, change CREATE/DROP INDEX CONCURRENTLY so that the pg_index flag changes they make without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. This is a pre-existing bug in CREATE INDEX CONCURRENTLY, which was copied into the DROP code. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. In addition, do some code and docs review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but mostly addition and revision of comments. This will need to be back-patched, but in a noticeably different form, so I'm committing it to HEAD before working on the back-patch. Problem reported by Amit Kapila, diagnosis by Pavan Deolassee, fix by Tom Lane and Andres Freund.
* Get rid of COERCE_DONTCARE.Tom Lane2012-10-12
| | | | We don't need this hack any more.
* Make equal() ignore CoercionForm fields for better planning with casts.Tom Lane2012-10-12
| | | | | | | | | | | | | | | | | | | | | | | This change ensures that the planner will see implicit and explicit casts as equivalent for all purposes, except in the minority of cases where there's actually a semantic difference (as reflected by having a 3-argument cast function). In particular, this fixes cases where the EquivalenceClass machinery failed to consider two references to a varchar column as equivalent if one was implicitly cast to text but the other was explicitly cast to text, as seen in bug #7598 from Vaclav Juza. We have had similar bugs before in other parts of the planner, so I think it's time to fix this problem at the core instead of continuing to band-aid around it. Remove set_coercionform_dontcare(), which represents the band-aid previously in use for allowing matching of index and constraint expressions with inconsistent cast labeling. (We can probably get rid of COERCE_DONTCARE altogether, but I don't think removing that enum value in back branches would be wise; it's possible there's third party code referring to it.) Back-patch to 9.2. We could go back further, and might want to once this has been tested more; but for the moment I won't risk destabilizing plan choices in long-since-stable branches.
* Fix PARAM_EXEC assignment mechanism to be safe in the presence of WITH.Tom Lane2012-09-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The planner previously assumed that parameter Vars having the same absolute query level, varno, and varattno could safely be assigned the same runtime PARAM_EXEC slot, even though they might be different Vars appearing in different subqueries. This was (probably) safe before the introduction of CTEs, but the lazy-evalution mechanism used for CTEs means that a CTE can be executed during execution of some other subquery, causing the lifespan of Params at the same syntactic nesting level as the CTE to overlap with use of the same slots inside the CTE. In 9.1 we created additional hazards by using the same parameter-assignment technology for nestloop inner scan parameters, but it was broken before that, as illustrated by the added regression test. To fix, restructure the planner's management of PlannerParamItems so that items having different semantic lifespans are kept rigorously separated. This will probably result in complex queries using more runtime PARAM_EXEC slots than before, but the slots are cheap enough that this hardly matters. Also, stop generating PlannerParamItems containing Params for subquery outputs: all we really need to do is reserve the PARAM_EXEC slot number, and that now only takes incrementing a counter. The planning code is simpler and probably faster than before, as well as being more correct. Per report from Vik Reykja. These changes will mostly also need to be made in the back branches, but I'm going to hold off on that until after 9.2.0 wraps.
* Drop cheap-startup-cost paths during add_path() if we don't need them.Tom Lane2012-09-01
| | | | | | | | | | | | | | | | | | We can detect whether the planner top level is going to care at all about cheap startup cost (it will only do so if query_planner's tuple_fraction argument is greater than zero). If it isn't, we might as well discard paths immediately whose only advantage over others is cheap startup cost. This turns out to get rid of quite a lot of paths in complex queries --- I saw planner runtime reduction of more than a third on one large query. Since add_path isn't currently passed the PlannerInfo "root", the easiest way to tell it whether to do this was to add a bool flag to RelOptInfo. That's a bit redundant, since all relations in a given query level will have the same setting. But in the future it's possible that we'd refine the control decision to work on a per-relation basis, so this seems like a good arrangement anyway. Per my suggestion of a few months ago.
* Fix mark_placeholder_maybe_needed to handle LATERAL references.Tom Lane2012-09-01
| | | | | | | | | | | | | | | If a PlaceHolderVar contains a pulled-up LATERAL reference, its minimum possible evaluation level might be higher in the join tree than its original syntactic location. That in turn affects the ph_needed level for any contained PlaceHolderVars (that is, those PHVs had better propagate up the join tree at least to the evaluation level of the outer PHV). We got this mostly right, but mark_placeholder_maybe_needed() failed to account for the effect, and in consequence could leave the inner PHVs with ph_may_need less than what their ultimate ph_needed value will be. That's bad because it could lead to failure to select a join order that will allow evaluation of the inner PHV at a valid location. Fix that, and add an Assert that checks that we don't ever set ph_needed to more than ph_may_need.
* Fix LATERAL references to join alias variables.Tom Lane2012-08-31
| | | | | I had thought this case worked already, but perhaps I didn't re-test it after adding extract_lateral_references() ...
* Split tuple struct defs from htup.h to htup_details.hAlvaro Herrera2012-08-30
| | | | | | | | | | | | This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.
* Adjust definition of cheapest_total_path to work better with LATERAL.Tom Lane2012-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the initial cut at LATERAL, I kept the rule that cheapest_total_path was always unparameterized, which meant it had to be NULL if the relation has no unparameterized paths. It turns out to work much more nicely if we always have *some* path nominated as cheapest-total for each relation. In particular, let's still say it's the cheapest unparameterized path if there is one; if not, take the cheapest-total-cost path among those of the minimum available parameterization. (The first rule is actually a special case of the second.) This allows reversion of some temporary lobotomizations I'd put in place. In particular, the planner can now consider hash and merge joins for joins below a parameter-supplying nestloop, even if there aren't any unparameterized paths available. This should bring planning of LATERAL-containing queries to the same level as queries not using that feature. Along the way, simplify management of parameterized paths in add_path() and friends. In the original coding for parameterized paths in 9.2, I tried to minimize the logic changes in add_path(), so it just treated parameterization as yet another dimension of comparison for paths. We later made it ignore pathkeys (sort ordering) of parameterized paths, on the grounds that ordering isn't a useful property for the path on the inside of a nestloop, so we might as well get rid of useless parameterized paths as quickly as possible. But we didn't take that reasoning as far as we should have. Startup cost isn't a useful property inside a nestloop either, so add_path() ought to discount startup cost of parameterized paths as well. Having done that, the secondary sorting I'd implemented (in add_parameterized_path) is no longer needed --- any parameterized path that survives add_path() at all is worth considering at higher levels. So this should be a bit faster as well as simpler.
* Split heapam_xlog.h from heapam.hAlvaro Herrera2012-08-28
| | | | | | | | | | | | The heapam XLog functions are used by other modules, not all of which are interested in the rest of the heapam API. With this, we let them get just the XLog stuff in which they are interested and not pollute them with unrelated includes. Also, since heapam.h no longer requires xlog.h, many files that do include heapam.h no longer get xlog.h automatically, including a few headers. This is useful because heapam.h is getting pulled in by execnodes.h, which is in turn included by a lot of files.
* Fix up planner infrastructure to support LATERAL properly.Tom Lane2012-08-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch takes care of a number of problems having to do with failure to choose valid join orders and incorrect handling of lateral references pulled up from subqueries. Notable changes: * Add a LateralJoinInfo data structure similar to SpecialJoinInfo, to represent join ordering constraints created by lateral references. (I first considered extending the SpecialJoinInfo structure, but the semantics are different enough that a separate data structure seems better.) Extend join_is_legal() and related functions to prevent trying to form unworkable joins, and to ensure that we will consider joins that satisfy lateral references even if the joins would be clauseless. * Fill in the infrastructure needed for the last few types of relation scan paths to support parameterization. We'd have wanted this eventually anyway, but it is necessary now because a relation that gets pulled up out of a UNION ALL subquery may acquire a reltargetlist containing lateral references, meaning that its paths *have* to be parameterized whether or not we have any code that can push join quals down into the scan. * Compute data about lateral references early in query_planner(), and save in RelOptInfo nodes, to avoid repetitive calculations later. * Assorted corner-case bug fixes. There's probably still some bugs left, but this is a lot closer to being real than it was before.
* Another round of planner fixes for LATERAL.Tom Lane2012-08-18
| | | | | | | | | | | Formerly, subquery pullup had no need to examine other entries in the range table, since they could not contain any references to the subquery being pulled up. That's no longer true with LATERAL, so now we need to be able to visit rangetable subexpressions to replace Vars referencing the pulled-up subquery. Also, this means that extract_lateral_references must be unsurprised at encountering lateral PlaceHolderVars, since such might be created when pulling up a subquery that's underneath an outer join with respect to the lateral reference.
* More fixes for planner's handling of LATERAL.Tom Lane2012-08-12
| | | | | | | | | | | | | | | | | | | | | | | | Re-allow subquery pullup for LATERAL subqueries, except when the subquery is below an outer join and contains lateral references to relations outside that outer join. If we pull up in such a case, we risk introducing lateral cross-references into outer joins' ON quals, which is something the code is entirely unprepared to cope with right now; and I'm not sure it'll ever be worth coping with. Support lateral refs in VALUES (this seems to be the only additional path type that needs such support as a consequence of re-allowing subquery pullup). Put in a slightly hacky fix for joinpath.c's refusal to consider parameterized join paths even when there cannot be any unparameterized ones. This was causing "could not devise a query plan for the given query" failures in queries involving more than two FROM items. Put in an even more hacky fix for distribute_qual_to_rels() being unhappy with join quals that contain references to rels outside their syntactic scope; which is to say, disable that test altogether. Need to think about how to preserve some sort of debugging cross-check here, while not expending more cycles than befits a debugging cross-check.
* Centralize the logic for detecting misplaced aggregates, window funcs, etc.Tom Lane2012-08-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Formerly we relied on checking after-the-fact to see if an expression contained aggregates, window functions, or sub-selects when it shouldn't. This is grotty, easily forgotten (indeed, we had forgotten to teach DefineIndex about rejecting window functions), and none too efficient since it requires extra traversals of the parse tree. To improve matters, define an enum type that classifies all SQL sub-expressions, store it in ParseState to show what kind of expression we are currently parsing, and make transformAggregateCall, transformWindowFuncCall, and transformSubLink check the expression type and throw error if the type indicates the construct is disallowed. This allows removal of a large number of ad-hoc checks scattered around the code base. The enum type is sufficiently fine-grained that we can still produce error messages of at least the same specificity as before. Bringing these error checks together revealed that we'd been none too consistent about phrasing of the error messages, so standardize the wording a bit. Also, rewrite checking of aggregate arguments so that it requires only one traversal of the arguments, rather than up to three as before. In passing, clean up some more comments left over from add_missing_from support, and annotate some tests that I think are dead code now that that's gone. (I didn't risk actually removing said dead code, though.)
* Implement SQL-standard LATERAL subqueries.Tom Lane2012-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the standard syntax of LATERAL attached to a sub-SELECT in FROM, and also allows LATERAL attached to a function in FROM, since set-returning function calls are expected to be one of the principal use-cases. The main change here is a rewrite of the mechanism for keeping track of which relations are visible for column references while the FROM clause is being scanned. The parser "namespace" lists are no longer lists of bare RTEs, but are lists of ParseNamespaceItem structs, which carry an RTE pointer as well as some visibility-controlling flags. Aside from supporting LATERAL correctly, this lets us get rid of the ancient hacks that required rechecking subqueries and JOIN/ON and function-in-FROM expressions for invalid references after they were initially parsed. Invalid column references are now always correctly detected on sight. In passing, remove assorted parser error checks that are now dead code by virtue of our having gotten rid of add_missing_from, as well as some comments that are obsolete for the same reason. (It was mainly add_missing_from that caused so much fudging here in the first place.) The planner support for this feature is very minimal, and will be improved in future patches. It works well enough for testing purposes, though. catversion bump forced due to new field in RangeTblEntry.
* Account for SRFs in targetlists in planner rowcount estimates.Tom Lane2012-07-21
| | | | | | | | | | | | | We made use of the ROWS estimate for set-returning functions used in FROM, but not for those used in SELECT targetlists; which is a bit of an oversight considering there are common usages that require the latter approach. Improve that. (I had initially thought it might be worth folding this into cost_qual_eval, but after investigation concluded that that wouldn't be very helpful, so just do it separately.) Per complaint from David Johnston. Back-patch to 9.2, but not further, for fear of destabilizing plan choices in existing releases.
* Fix planner to pass correct collation to operator selectivity estimators.Tom Lane2012-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | We can do this without creating an API break for estimation functions by passing the collation using the existing fmgr functionality for passing an input collation as a hidden parameter. The need for this was foreseen at the outset, but we didn't get around to making it happen in 9.1 because of the decision to sort all pg_statistic histograms according to the database's default collation. That meant that selectivity estimators generally need to use the default collation too, even if they're estimating for an operator that will do something different. The reason it's suddenly become more interesting is that regexp interpretation also uses a collation (for its LC_TYPE not LC_COLLATE property), and we no longer want to use the wrong collation when examining regexps during planning. It's not that the selectivity estimate is likely to change much from this; rather that we are thinking of caching compiled regexps during planner estimation, and we won't get the intended benefit if we cache them with a different collation than the executor will use. Back-patch to 9.1, both because the regexp change is likely to get back-patched and because we might as well get this right in all collation-supporting branches, in case any third-party code wants to rely on getting the collation. The patch turns out to be minuscule now that I've done it ...
* Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian2012-06-10
| | | | commit-fest.
* Use fuzzy not exact cost comparison for the final tie-breaker in add_path.Tom Lane2012-04-21
| | | | | | | | Instead of an exact cost comparison, use a fuzzy comparison with 1e-10 delta after all other path metrics have proved equal. This is to avoid having platform-specific roundoff behaviors determine the choice when two paths are really the same to our cost estimators. Adjust the recently-added test case that made it obvious we had a problem here.
* Revise parameterized-path mechanism to fix assorted issues.Tom Lane2012-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adjusts the treatment of parameterized paths so that all paths with the same parameterization (same set of required outer rels) for the same relation will have the same rowcount estimate. We cache the rowcount estimates to ensure that property, and hopefully save a few cycles too. Doing this makes it practical for add_path_precheck to operate without a rowcount estimate: it need only assume that paths with different parameterizations never dominate each other, which is close enough to true anyway for coarse filtering, because normally a more-parameterized path should yield fewer rows thanks to having more join clauses to apply. In add_path, we do the full nine yards of comparing rowcount estimates along with everything else, so that we can discard parameterized paths that don't actually have an advantage. This fixes some issues I'd found with add_path rejecting parameterized paths on the grounds that they were more expensive than not-parameterized ones, even though they yielded many fewer rows and hence would be cheaper once subsequent joining was considered. To make the same-rowcounts assumption valid, we have to require that any parameterized path enforce *all* join clauses that could be obtained from the particular set of outer rels, even if not all of them are useful for indexing. This is required at both base scans and joins. It's a good thing anyway since the net impact is that join quals are checked at the lowest practical level in the join tree. Hence, discard the original rather ad-hoc mechanism for choosing parameterization joinquals, and build a better one that has a more principled rule for when clauses can be moved. The original rule was actually buggy anyway for lack of knowledge about which relations are part of an outer join's outer side; getting this right requires adding an outer_relids field to RestrictInfo.
* Weaken the planner's tests for relevant joinclauses.Tom Lane2012-04-13
| | | | | | | | | | | | | | | | | | | | | | We should be willing to cross-join two small relations if that allows us to use an inner indexscan on a large relation (that is, the potential indexqual for the large table requires both smaller relations). This worked in simple cases but fell apart as soon as there was a join clause to a fourth relation, because the existence of any two-relation join clause caused the planner to not consider clauseless joins between other base relations. The added regression test shows an example case adapted from a recent complaint from Benoit Delbosc. Adjust have_relevant_joinclause, have_relevant_eclass_joinclause, and has_relevant_eclass_joinclause to consider that a join clause mentioning three or more relations is sufficient grounds for joining any subset of those relations, even if we have to do so via a cartesian join. Since such clauses are relatively uncommon, this shouldn't affect planning speed on typical queries; in fact it should help a bit, because the latter two functions in particular get significantly simpler. Although this is arguably a bug fix, I'm not going to risk back-patching it, since it might have currently-unforeseen consequences.
* Refactor simplify_function et al to centralize argument simplification.Tom Lane2012-03-23
| | | | | | | | | | We were doing the recursive simplification of function/operator arguments in half a dozen different places, with rather baroque logic to ensure it didn't get done multiple times on some arguments. This patch improves that by postponing argument simplification until after we've dealt with named parameters and added any needed default expressions. Marti Raudsepp, somewhat hacked on by me
* Code review for protransform patches.Tom Lane2012-03-23
| | | | | | | | | | | | | | | | | Fix loss of previous expression-simplification work when a transform function fires: we must not simply revert to untransformed input tree. Instead build a dummy FuncExpr node to pass to the transform function. This has the additional advantage of providing a simpler, more uniform API for transform functions. Move documentation to a somewhat less buried spot, relocate some poorly-placed code, be more wary of null constants and invalid typmod values, add an opr_sanity check on protransform function signatures, and some other minor cosmetic adjustments. Note: although this patch touches pg_proc.h, no need for catversion bump, because the changes are cosmetic and don't actually change the intended catalog contents.
* Restructure SELECT INTO's parsetree representation into CreateTableAsStmt.Tom Lane2012-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Making this operation look like a utility statement seems generally a good idea, and particularly so in light of the desire to provide command triggers for utility statements. The original choice of representing it as SELECT with an IntoClause appendage had metastasized into rather a lot of places, unfortunately, so that this patch is a great deal more complicated than one might at first expect. In particular, keeping EXPLAIN working for SELECT INTO and CREATE TABLE AS subcommands required restructuring some EXPLAIN-related APIs. Add-on code that calls ExplainOnePlan or ExplainOneUtility, or uses ExplainOneQuery_hook, will need adjustment. Also, the cases PREPARE ... SELECT INTO and CREATE RULE ... SELECT INTO, which formerly were accepted though undocumented, are no longer accepted. The PREPARE case can be replaced with use of CREATE TABLE AS EXECUTE. The CREATE RULE case doesn't seem to have much real-world use (since the rule would work only once before failing with "table already exists"), so we'll not bother with that one. Both SELECT INTO and CREATE TABLE AS still return a command tag of "SELECT nnnn". There was some discussion of returning "CREATE TABLE nnnn", but for the moment backwards compatibility wins the day. Andres Freund and Tom Lane
* Revisit handling of UNION ALL subqueries with non-Var output columns.Tom Lane2012-03-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 57664ed25e5dea117158a2e663c29e60b3546e1c I tried to fix a bug reported by Teodor Sigaev by making non-simple-Var output columns distinct (by wrapping their expressions with dummy PlaceHolderVar nodes). This did not work too well. Commit b28ffd0fcc583c1811e5295279e7d4366c3cae6c fixed some ensuing problems with matching to child indexes, but per a recent report from Claus Stadler, constraint exclusion of UNION ALL subqueries was still broken, because constant-simplification didn't handle the injected PlaceHolderVars well either. On reflection, the original patch was quite misguided: there is no reason to expect that EquivalenceClass child members will be distinct. So instead of trying to make them so, we should ensure that we can cope with the situation when they're not. Accordingly, this patch reverts the code changes in the above-mentioned commits (though the regression test cases they added stay). Instead, I've added assorted defenses to make sure that duplicate EC child members don't cause any problems. Teodor's original problem ("MergeAppend child's targetlist doesn't match MergeAppend") is addressed more directly by revising prepare_sort_from_pathkeys to let the parent MergeAppend's sort list guide creation of each child's sort list. In passing, get rid of add_sort_column; as far as I can tell, testing for duplicate sort keys at this stage is dead code. Certainly it doesn't trigger often enough to be worth expending cycles on in ordinary queries. And keeping the test would've greatly complicated the new logic in prepare_sort_from_pathkeys, because comparing pathkey list entries against a previous output array requires that we not skip any entries in the list. Back-patch to 9.1, like the previous patches. The only known issue in this area that wasn't caused by the ill-advised previous patches was the MergeAppend planning failure, which of course is not relevant before 9.1. It's possible that we need some of the new defenses against duplicate child EC entries in older branches, but until there's some clear evidence of that I'm going to refrain from back-patching further.
* Revise FDW planning API, again.Tom Lane2012-03-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Further reflection shows that a single callback isn't very workable if we desire to let FDWs generate multiple Paths, because that forces the FDW to do all work necessary to generate a valid Plan node for each Path. Instead split the former PlanForeignScan API into three steps: GetForeignRelSize, GetForeignPaths, GetForeignPlan. We had already bit the bullet of breaking the 9.1 FDW API for 9.2, so this shouldn't cause very much additional pain, and it's substantially more flexible for complex FDWs. Add an fdw_private field to RelOptInfo so that the new functions can save state there rather than possibly having to recalculate information two or three times. In addition, we'd not thought through what would be needed to allow an FDW to set up subexpressions of its choice for runtime execution. We could treat ForeignScan.fdw_private as an executable expression but that seems likely to break existing FDWs unnecessarily (in particular, it would restrict the set of node types allowable in fdw_private to those supported by expression_tree_walker). Instead, invent a separate field fdw_exprs which will receive the postprocessing appropriate for expression trees. (One field is enough since it can be a list of expressions; also, we assume the corresponding expression state tree(s) will be held within fdw_state, so we don't need to add anything to ForeignScanState.) Per review of Hanada Shigeru's pgsql_fdw patch. We may need to tweak this further as we continue to work on that patch, but to me it feels a lot closer to being right now.
* Redesign PlanForeignScan API to allow multiple paths for a foreign table.Tom Lane2012-03-05
| | | | | | | | | | | The original API specification only allowed an FDW to create a single access path, which doesn't seem like a terribly good idea in hindsight. Instead, move the responsibility for building the Path node and calling add_path() into the FDW's PlanForeignScan function. Now, it can do that more than once if appropriate. There is no longer any need for the transient FdwPlan struct, so get rid of that. Etsuro Fujita, Shigeru Hanada, Tom Lane
* Preserve column names in the execution-time tupledesc for a RowExpr.Tom Lane2012-02-14
| | | | | | | | | | | | | | | The hstore and json datatypes both have record-conversion functions that pay attention to column names in the composite values they're handed. We used to not worry about inserting correct field names into tuple descriptors generated at runtime, but given these examples it seems useful to do so. Observe the nicer-looking results in the regression tests whose results changed. catversion bump because there is a subtle change in requirements for stored rule parsetrees: RowExprs from ROW() constructs now have to include field names. Andrew Dunstan and Tom Lane
* Allow LEAKPROOF functions for better performance of security views.Robert Haas2012-02-13
| | | | | | | | | | | | | | | | We don't normally allow quals to be pushed down into a view created with the security_barrier option, but functions without side effects are an exception: they're OK. This allows much better performance in common cases, such as when using an equality operator (that might even be indexable). There is an outstanding issue here with the CREATE FUNCTION / ALTER FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the leakproof flag. But I'm committing this as-is so that it doesn't have to be rebased again; we can fix up the grammar in a future commit. KaiGai Kohei, with some wordsmithing by me.
* Use parameterized paths to generate inner indexscans more flexibly.Tom Lane2012-01-27
| | | | | | | | | | | | | | | | | | | This patch fixes the planner so that it can generate nestloop-with- inner-indexscan plans even with one or more levels of joining between the indexscan and the nestloop join that is supplying the parameter. The executor was fixed to handle such cases some time ago, but the planner was not ready. This should improve our plans in many situations where join ordering restrictions formerly forced complete table scans. There is probably a fair amount of tuning work yet to be done, because of various heuristics that have been added to limit the number of parameterized paths considered. However, we are not going to find out what needs to be adjusted until the code gets some real-world use, so it's time to get it in there where it can be tested easily. Note API change for index AM amcostestimate functions. I'm not aware of any non-core index AMs, but if there are any, they will need minor adjustments.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Rethink representation of index clauses' mapping to index columns.Tom Lane2011-12-24
| | | | | | | | | | | | | | | | | | | | | In commit e2c2c2e8b1df7dfdb01e7e6f6191a569ce3c3195 I made use of nested list structures to show which clauses went with which index columns, but on reflection that's a data structure that only an old-line Lisp hacker could love. Worse, it adds unnecessary complication to the many places that don't much care which clauses go with which index columns. Revert to the previous arrangement of flat lists of clauses, and instead add a parallel integer list of column numbers. The places that care about the pairing can chase both lists with forboth(), while the places that don't care just examine one list the same as before. The only real downside to this is that there are now two more lists that need to be passed to amcostestimate functions in case they care about column matching (which btcostestimate does, so not passing the info is not an option). Rather than deal with 11-argument amcostestimate functions, pass just the IndexPath and expect the functions to extract fields from it. That gets us down to 7 arguments which is better than 11, and it seems more future-proof against likely additions to the information we keep about an index path.
* Improve planner's handling of duplicated index column expressions.Tom Lane2011-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It's potentially useful for an index to repeat the same indexable column or expression in multiple index columns, if the columns have different opclasses. (If they share opclasses too, the duplicate column is pretty useless, but nonetheless we've allowed such cases since 9.0.) However, the planner failed to cope with this, because createplan.c was relying on simple equal() matching to figure out which index column each index qual is intended for. We do have that information available upstream in indxpath.c, though, so the fix is to not flatten the multi-level indexquals list when putting it into an IndexPath. Then we can rely on the sublist structure to identify target index columns in createplan.c. There's a similar issue for index ORDER BYs (the KNNGIST feature), so introduce a multi-level-list representation for that too. This adds a bit more representational overhead, but we might more or less buy that back by not having to search for matching index columns anymore in createplan.c; likewise btcostestimate saves some cycles. Per bug #6351 from Christian Rudolph. Likely symptoms include the "btree index keys must be ordered by attribute" failure shown there, as well as "operator MMMM is not a member of opfamily NNNN". Although this is a pre-existing problem that can be demonstrated in 9.0 and 9.1, I'm not going to back-patch it, because the API changes in the planner seem likely to break things such as index plugins. The corner cases where this matters seem too narrow to justify possibly breaking things in a minor release.
* Replace simple constant pg_am.amcanreturn with an AM support function.Tom Lane2011-12-18
| | | | | | | | | The need for this was debated when we put in the index-only-scan feature, but at the time we had no near-term expectation of having AMs that could support such scans for only some indexes; so we kept it simple. However, the SP-GiST AM forces the issue, so let's fix it. This patch only installs the new API; no behavior actually changes.
* Add const qualifiers to node inspection functionsPeter Eisentraut2011-12-07
| | | | Thomas Munro
* Make some minor formatting improvements to what pgindent did.Tom Lane2011-11-28
| | | | | | Moving the code two full tab stops to the right requires rethinking of cosmetic code layout choices, which pgindent isn't really able to do for us. Whitespace and comment adjustments only, no code changes.
* Pgindent clauses.c, per request from Tom.Bruce Momjian2011-11-28
|
* Convert eval_const_expressions's long series of IsA tests into a switch.Tom Lane2011-11-28
| | | | | | | | | This function has now grown enough cases that a switch seems appropriate. This results in a measurable speed improvement on some platforms, and should certainly not hurt. The code's in need of a pgindent run now, though. Andres Freund
* Wrap appendrel member outputs in PlaceHolderVars in additional cases.Tom Lane2011-11-08
| | | | | | | | | | | | | | | | Add PlaceHolderVar wrappers as needed to make UNION ALL sub-select output expressions appear non-constant and distinct from each other. This makes the world safe for add_child_rel_equivalences to do what it does. Before, it was possible for that function to add identical expressions to different EquivalenceClasses, which logically should imply merging such ECs, which would be wrong; or to improperly add a constant to an EquivalenceClass, drastically changing its behavior. Per report from Teodor Sigaev. The only currently known consequence of this bug is "MergeAppend child's targetlist doesn't match MergeAppend" planner failures in 9.1 and later. I am suspicious that there may be other failure modes that could affect older release branches; but in the absence of any hard evidence, I'll refrain from back-patching further than 9.1.
* Fix inline_set_returning_function() to allow multiple OUT parameters.Tom Lane2011-11-03
| | | | | | | | inline_set_returning_function failed to distinguish functions returning generic RECORD (which require a column list in the RTE, as well as run-time type checking) from those with multiple OUT parameters (which do not). This prevented inlining from happening. Per complaint from Jay Levitt. Back-patch to 8.4 where this capability was introduced.
* Preserve Var location information during flatten_join_alias_vars.Tom Lane2011-11-01
| | | | | | | This allows us to give correct syntax error pointers when complaining about ungrouped variables in a join query with aggregates or GROUP BY. It's pretty much irrelevant for the planner's use of the function, though perhaps it might aid debugging sometimes.
* Improve planner's ability to recognize cases where an IN's RHS is unique.Tom Lane2011-10-26
| | | | | | | | | | | | If the right-hand side of a semijoin is unique, then we can treat it like a normal join (or another way to say that is: we don't need to explicitly unique-ify the data before doing it as a normal join). We were recognizing such cases when the RHS was a sub-query with appropriate DISTINCT or GROUP BY decoration, but there's another way: if the RHS is a plain relation with unique indexes, we can check if any of the indexes prove the output is unique. Most of the infrastructure for that was there already in the join removal code, though I had to rearrange it a bit. Per reflection about a recent example in pgsql-performance.
* Don't trust deferred-unique indexes for join removal.Tom Lane2011-10-23
| | | | | | | | | | | The uniqueness condition might fail to hold intra-transaction, and assuming it does can give incorrect query results. Per report from Marti Raudsepp, though this is not his proposed patch. Back-patch to 9.0, where both these features were introduced. In the released branches, add the new IndexOptInfo field to the end of the struct, to try to minimize ABI breakage for third-party code that may be examining that struct.
* Teach btree to handle ScalarArrayOpExpr quals natively.Tom Lane2011-10-16
| | | | | This allows "indexedcol op ANY(ARRAY[...])" conditions to be used in plain indexscans, and particularly in index-only scans.