aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt/selfuncs.c
Commit message (Collapse)AuthorAge
...
* Support ALTER TABLESPACE name SET/RESET ( tablespace_options ).Robert Haas2010-01-05
| | | | | | | | | This patch only supports seq_page_cost and random_page_cost as parameters, but it provides the infrastructure to scalably support many more. In particular, we may want to add support for effective_io_concurrency, but I'm leaving that as future work for now. Thanks to Tom Lane for design help and Alvaro Herrera for the review.
* When estimating the selectivity of an inequality "column > constant" orTom Lane2010-01-04
| | | | | | | | | | | | | | | "column < constant", and the comparison value is in the first or last histogram bin or outside the histogram entirely, try to fetch the actual column min or max value using an index scan (if there is an index on the column). If successful, replace the lower or upper histogram bound with that value before carrying on with the estimate. This limits the estimation error caused by moving min/max values when the comparison value is close to the min or max. Per a complaint from Josh Berkus. It is tempting to consider using this mechanism for mergejoinscansel as well, but that would inject index fetches into main-line join estimation not just endpoint cases. I'm refraining from that until we can get a better handle on the costs of doing this type of lookup.
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Support "x IS NOT NULL" clauses as indexscan conditions. This turns outTom Lane2010-01-01
| | | | | | | | | | | to be just a minor extension of the previous patch that made "x IS NULL" indexable, because we can treat the IS NOT NULL condition as if it were "x < NULL" or "x > NULL" (depending on the index's NULLS FIRST/LAST option), just like IS NULL is treated like "x = NULL". Aside from any possible usefulness in its own right, this is an important improvement for index-optimized MAX/MIN aggregates: it is now reliably possible to get a column's min or max value cheaply, even when there are a lot of nulls cluttering the interesting end of the index.
* Add the ability to store inheritance-tree statistics in pg_statistic,Tom Lane2009-12-29
| | | | | | | | and teach ANALYZE to compute such stats for tables that have subclasses. Per my proposal of yesterday. autovacuum still needs to be taught about running ANALYZE on parent tables when their subclasses change, but the feature is useful even without that.
* Remove regex_flavor GUC, so that regular expressions are always "advanced"Tom Lane2009-10-21
| | | | | | | | | style by default. Per discussion, there seems to be hardly anything that really relies on being able to change the regex flavor, so the ability to select it via embedded options ought to be enough for any stragglers. Also, if we didn't remove the GUC, we'd really be morally obligated to mark the regex functions non-immutable, which'd possibly create performance issues.
* Support hex-string input and output for type BYTEA.Tom Lane2009-08-04
| | | | | | | | | | | | | Both hex format and the traditional "escape" format are automatically handled on input. The output format is selected by the new GUC variable bytea_output. As committed, bytea_output defaults to HEX, which is an *incompatible change*. We will keep it this way for awhile for testing purposes, but should consider whether to switch to the more backwards-compatible default of ESCAPE before 8.5 is released. Peter Eisentraut
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Fix estimate_num_groups() to not fail on PlaceHolderVars, per report fromTom Lane2009-04-19
| | | | | | | | | Stefan Kaltenbrunner. The most reasonable behavior (at least for the near term) seems to be to ignore the PlaceHolderVar and examine its argument instead. In support of this, change the API of pull_var_clause() to allow callers to request recursion into PlaceHolderVars. Currently estimate_num_groups() is the only customer for that behavior, but where there's one there may be others.
* Teach the planner to treat a partial unique index as proving a variable isTom Lane2009-02-15
| | | | | | | unique for a particular query, if the index predicate is satisfied. This requires a bit of reordering of operations so that we check the predicates before doing any selectivity estimates, but shouldn't really cause any noticeable slowdown. Per a comment from Michal Politowski.
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* When estimating without benefit of MCV lists (suggesting that one or bothTom Lane2008-10-23
| | | | | | | | | | | | | inputs is unique or nearly so), make eqjoinsel() clamp the ndistinct estimates to be not more than the estimated number of rows coming from the input relations. This allows the estimate to change in response to the selectivity of restriction conditions on the inputs. This is a pretty narrow patch and maybe we should be more aggressive about similarly clamping ndistinct in other cases; but I'm worried about double-counting the effects of the restriction conditions. However, it seems to help for the case exhibited by Grzegorz Jaskiewicz (antijoin against a small subset of a relation), so let's try this for awhile.
* Add a concept of "placeholder" variables to the planner. These are variablesTom Lane2008-10-21
| | | | | | | | | | | | | | | | | | | that represent some expression that we desire to compute below the top level of the plan, and then let that value "bubble up" as though it were a plain Var (ie, a column value). The immediate application is to allow sub-selects to be flattened even when they are below an outer join and have non-nullable output expressions. Formerly we couldn't flatten because such an expression wouldn't properly go to NULL when evaluated above the outer join. Now, we wrap it in a PlaceHolderVar and arrange for the actual evaluation to occur below the outer join. When the resulting Var bubbles up through the join, it will be set to NULL if necessary, yielding the correct results. This fixes a planner limitation that's existed since 7.1. In future we might want to use this mechanism to re-introduce some form of Hellerstein's "expensive functions" optimization, ie place the evaluation of an expensive function at the most suitable point in the plan tree.
* Dept of second thoughts: let's make sure that get_index_stats_hook is onlyTom Lane2008-09-28
| | | | | | applied to expression indexes, not to plain relations. The original coding in btcostestimate conflated the two cases, but it's not hard to use get_relation_stats_hook instead when we're looking to the underlying relation.
* Add hooks to let plugins override the planner's lookups in pg_statistic.Tom Lane2008-09-28
| | | | Simon Riggs, with some editorialization by me.
* Move exprType(), exprTypmod(), expression_tree_walker(), and related routinesTom Lane2008-08-25
| | | | | | into nodes/nodeFuncs, so as to reduce wanton cross-subsystem #includes inside the backend. There's probably more that should be done along this line, but this is a start anyway.
* Clean up the loose ends in selectivity estimation left by my patch for semiTom Lane2008-08-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and anti joins. To do this, pass the SpecialJoinInfo struct for the current join as an additional optional argument to operator join selectivity estimation functions. This allows the estimator to tell not only what kind of join is being formed, but which variable is on which side of the join; a requirement long recognized but not dealt with till now. This also leaves the door open for future improvements in the estimators, such as accounting for the null-insertion effects of lower outer joins. I didn't do anything about that in the current patch but the information is in principle deducible from what's passed. The patch also clarifies the definition of join selectivity for semi/anti joins: it's the fraction of the left input that has (at least one) match in the right input. This allows getting rid of some very fuzzy thinking that I had committed in the original 7.4-era IN-optimization patch. There's probably room to estimate this better than the present patch does, but at least we know what to estimate. Since I had to touch CREATE OPERATOR anyway to allow a variant signature for join estimator functions, I took the opportunity to add a couple of additional checks that were missing, per my recent message to -hackers: * Check that estimator functions return float8; * Require execute permission at the time of CREATE OPERATOR on the operator's function as well as the estimator functions; * Require ownership of any pre-existing operator that's modified by the command. I also moved the lookup of the functions out of OperatorCreate() and into operatorcmds.c, since that seemed more consistent with most of the other catalog object creation processes, eg CREATE TYPE.
* Implement SEMI and ANTI joins in the planner and executor. (Semijoins replaceTom Lane2008-08-14
| | | | | | | | | | | | | | the old JOIN_IN code, but antijoins are new functionality.) Teach the planner to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL filters are recognized as being anti joins. Unify the InClauseInfo and OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change, it becomes possible to associate a SpecialJoinInfo with every join attempt, which permits some cleanup of join selectivity estimation. That needs to be taken much further than this patch does, but the next step is to change the API for oprjoin selectivity functions, which seems like material for a separate patch. So for the moment the output size estimates for semi and especially anti joins are quite bogus.
* Fix estimate_num_groups() to assume that GROUP BY expressions yielding booleanTom Lane2008-07-07
| | | | | | | | | results always contribute two groups, regardless of the expression contents. This is very substantially more accurate than the regular heuristic for certain boolean tests like "col IS NULL". Per gripe from Sam Mason. Back-patch to all supported releases, since the behavior of estimate_num_groups() hasn't changed all that much since 7.4.
* Restructure some header files a bit, in particular heapam.h, by removing someAlvaro Herrera2008-05-12
| | | | | | | | | | | | unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.
* Since createplan.c no longer cares whether index operators are lossy, it hasTom Lane2008-04-13
| | | | | | | | | | no particular need to do get_op_opfamily_properties() while building an indexscan plan. Postpone that lookup until executor start. This simplifies createplan.c a lot more than it complicates nodeIndexscan.c, and makes things more uniform since we already had to do it that way for RowCompare expressions. Should be a bit faster too, at least for plans that aren't re-used many times, since we avoid palloc'ing and perhaps copying the intermediate list data structure.
* Simplify and standardize conversions between TEXT datums and ordinary CTom Lane2008-03-25
| | | | | | | | | | | | | | | | | | | | strings. This patch introduces four support functions cstring_to_text, cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and two macros CStringGetTextDatum and TextDatumGetCString. A number of existing macros that provided variants on these themes were removed. Most of the places that need to make such conversions now require just one function or macro call, in place of the multiple notational layers that used to be needed. There are no longer any direct calls of textout or textin, and we got most of the places that were using handmade conversions via memcpy (there may be a few still lurking, though). This commit doesn't make any serious effort to eliminate transient memory leaks caused by detoasting toasted text objects before they reach text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few places where it was easy, but much more could be done. Brendan Jurd and Tom Lane
* Revert thinko introduced into prefix_selectivity() by my recent patch:Tom Lane2008-03-17
| | | | | make_greater_string needs the < procedure not the >= one. Spotted by Peter.
* Change patternsel() so that instead of switching from a pureTom Lane2008-03-09
| | | | | | | | | | pattern-examination heuristic method to purely histogram-driven selectivity at histogram size 100, we compute both estimates and use a weighted average. The weight put on the heuristic estimate decreases linearly with histogram size, dropping to zero for 100 or more histogram entries. Likewise in ltreeparentsel(). After a patch by Greg Stark, though I reorganized the logic a bit to give the caller of histogram_selectivity() more control.
* Modify prefix_selectivity() so that it will never estimate the selectivityTom Lane2008-03-08
| | | | | | | | | | | | | | of the generated range condition var >= 'foo' AND var < 'fop' as being less than what eqsel() would estimate for var = 'foo'. This is intuitively reasonable and it gets rid of the need for some entirely ad-hoc coding we formerly used to reject bogus estimates. The basic problem here is that if the prefix is more than a few characters long, the two boundary values are too close together to be distinguishable by comparison to the column histogram, resulting in a selectivity estimate of zero, which is often not very sane. Change motivated by an example from Peter Eisentraut. Arguably this is a bug fix, but I'll refrain from back-patching it for the moment.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* Fix mergejoin cost estimation so that we consider the statistical ranges ofTom Lane2007-12-08
| | | | | | | | | | the two join variables at both ends: not only trailing rows that need not be scanned because there cannot be a match on the other side, but initial rows that will be scanned without possibly having a match. This allows a more realistic estimate of startup cost to be made, per recent pgsql-performance discussion. In passing, fix a couple of bugs that had crept into mergejoinscansel: it was not quite up to speed for the task of estimating descending-order scans, which is a new requirement in 8.3.
* Re-run pgindent with updated list of typedefs. (Updated README shouldBruce Momjian2007-11-15
| | | | avoid this problem in the future.)
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Second pass at improving LIKE/regex estimation in non-C locales. It turnsTom Lane2007-11-09
| | | | | | | | | | out that it's actually quite likely that a string that is an extension of the given prefix will sort as larger than the "greater" string our previous code created. To provide some defense against that, do the comparisons against a modified string instead of just the bare prefix. We tack on "Z", "z", "y", or "9", whichever is seen as largest in the current locale. Testing suggests that this is sufficient at least for cases involving ASCII data.
* Improve the performance of LIKE/regex estimation in non-C locales, by makingTom Lane2007-11-07
| | | | | | | | | | | | | | | | make_greater_string() try harder to generate a string that's actually greater than its input string. Before we just assumed that making a string that was memcmp-greater was enough, but it is easy to generate examples where this is not so when the locale is not C. Instead, loop until the relevant comparison function agrees that the generated string is greater than the input. Unfortunately this is probably not enough to guarantee that the generated string is greater than all extensions of the input, so we cannot relax the restriction to C locale for the LIKE/regex index optimization. But it should at least improve the odds of getting a useful selectivity estimate in prefix_selectivity(). Per example from Guillaume Smet. Backpatch to 8.1, mainly because that's what the complainant is using...
* Fix patternsel() and callers to do the right thing for NOT LIKE and the otherTom Lane2007-11-07
| | | | | | | | | | | | | | negated-match operators. patternsel had been using the supplied operator as though it were a positive-match operator, and thus obtaining a wrong result, which was even more wrong after the caller subtracted it from 1. Seems cleanest to give patternsel an explicit "negate" argument so that it knows what's going on. Also install the same factorization scheme for pattern join selectivity estimators; even though they are just stubs at the moment, this may keep someone from making the same type of mistake when they get filled out. Per report from Greg Mullane. Backpatch to 8.2 --- previous releases do not show the problem because patternsel() doesn't actually use the operator directly.
* Apply a band-aid fix for the problem that 8.2 and up completely misestimateTom Lane2007-08-31
| | | | | | | | | | | | | | | | | | the number of rows likely to be produced by a query such as SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL; What this is doing is selecting for t1 rows with no match in t2, and thus it may produce a significant number of rows even if the t2.key table column contains no nulls at all. 8.2 thinks the table column's null fraction is relevant and thus may estimate no rows out, which results in terrible plans if there are more joins above this one. A proper fix for this will involve passing much more information about the context of a clause to the selectivity estimator functions than we ever have. There's no time left to write such a patch for 8.3, and it wouldn't be back-patchable into 8.2 anyway. Instead, put in an ad-hoc test to defeat the normal table-stats-based estimation when an IS NULL test is evaluated at an outer join, and just use a constant estimate instead --- I went with 0.5 for lack of a better idea. This won't catch every case but it will catch the typical ways of writing such queries, and it seems unlikely to make things worse for other queries.
* Tsearch2 functionality migrates to core. The bulk of this work is byTom Lane2007-08-21
| | | | | | | | Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing, so anything that's broken is probably my fault. Documentation is nonexistent as yet, but let's land the patch so we can get some portability testing done.
* Check return code from strxfrm on Windows since it has aMagnus Hagander2007-05-05
| | | | | non-standard way of indicating errors, so we don't try to allocate INT_MAX bytes to store a result in.
* Some further performance tweaks for planning large inheritance trees thatTom Lane2007-04-21
| | | | | | | | are mostly excluded by constraints: do the CE test a bit earlier to save some adjust_appendrel_attrs() work on excluded children, and arrange to use array indexing rather than rt_fetch() to fetch RTEs in the main body of the planner. The latter is something I'd wanted to do for awhile anyway, but seeing list_nth_cell() as 35% of the runtime gets one's attention.
* Make 'col IS NULL' clauses be indexable conditions.Tom Lane2007-04-06
| | | | Teodor Sigaev, with some kibitzing from Tom Lane.
* Fix array coercion expressions to ensure that the correct volatility isTom Lane2007-03-27
| | | | | | | | | seen by code inspecting the expression. The best way to do this seems to be to drop the original representation as a function invocation, and instead make a special expression node type that represents applying the element-type coercion function to each array element. In this way the element function is exposed and will be checked for volatility. Per report from Guillaume Smet.
* Fix some problems with selectivity estimation for partial indexes.Tom Lane2007-03-21
| | | | | | | | | | | | | | | | | | | | | First, genericcostestimate() was being way too liberal about including partial-index conditions in its selectivity estimate, resulting in substantial underestimates for situations such as an indexqual "x = 42" used with an index on x "WHERE x >= 40 AND x < 50". While the code is intentionally set up to favor selecting partial indexes when available, this was too much... Second, choose_bitmap_and() was likewise easily fooled by cases of this type, since it would similarly think that the partial index had selectivity independent of the indexqual. Fixed by using predicate_implied_by() rather than simple equality checks to determine redundancy. This is a good deal more expensive but I don't see much alternative. At least the extra cost is only paid when there's actually a partial index under consideration. Per report from Jeff Davis. I'm not going to risk back-patching this, though.
* Fix up the remaining places where the expression node structure would loseTom Lane2007-03-17
| | | | | | | | | | | | | | available information about the typmod of an expression; namely, Const, ArrayRef, ArrayExpr, and EXPR and ARRAY SubLinks. In the ArrayExpr and SubLink cases it wasn't really the data structure's fault, but exprTypmod() being lazy. This seems like a good idea in view of the expected increase in typmod usage from Teodor's work to allow user-defined types to have typmods. In particular this responds to the concerns we had about eliminating the special-purpose hack that exprTypmod() used to have for BPCHAR Consts. We can now tell whether or not such a Const has been cast to a specific length, and report or display properly if so. initdb forced due to changes in stored rules.
* Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len).Tom Lane2007-02-27
| | | | | | | | | | | Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with VARSIZE and VARDATA, and as a consequence almost no code was using the longer names. Rename the length fields of struct varlena and various derived structures to catch anyplace that was accessing them directly; and clean up various places so caught. In itself this patch doesn't change any behavior at all, but it is necessary infrastructure if we hope to play any games with the representation of varlena headers. Greg Stark and Tom Lane
* Turn the rangetable used by the executor into a flat list, and avoid storingTom Lane2007-02-22
| | | | | | | | | | | | | | | | | useless substructure for its RangeTblEntry nodes. (I chose to keep using the same struct node type and just zero out the link fields for unneeded info, rather than making a separate ExecRangeTblEntry type --- it seemed too fragile to have two different rangetable representations.) Along the way, put subplans into a list in the toplevel PlannedStmt node, and have SubPlan nodes refer to them by list index instead of direct pointers. Vadim wanted to do that years ago, but I never understood what he was on about until now. It makes things a *whole* lot more robust, because we can stop worrying about duplicate processing of subplans during expression tree traversals. That's been a constant source of bugs, and it's finally gone. There are some consequent simplifications yet to be made, like not using a separate EState for subplans in the executor, but I'll tackle that later.
* Get rid of some old and crufty global variables in the planner. WhenTom Lane2007-02-19
| | | | | | | | | this code was last gone over, there wasn't really any alternative to globals because we didn't have the PlannerInfo struct being passed all through the planner code. Now that we do, we can restructure things to avoid non-reentrancy. I'm fooling with this because otherwise I'd have had to add another global variable for the planned compact range table list.
* Revert gincostestimate changes.Teodor Sigaev2007-01-31
|
* Allow GIN's extractQuery method to signal that nothing can satisfy the query.Teodor Sigaev2007-01-31
| | | | | | | | | | | | | In this case extractQuery should returns -1 as nentries. This changes prototype of extractQuery method to use int32* instead of uint32* for nentries argument. Based on that gincostestimate may see two corner cases: nothing will be found or seqscan should be used. Per proposal at http://archives.postgresql.org/pgsql-hackers/2007-01/msg01581.php PS tsearch_core patch should be sightly modified to support changes, but I'm waiting a verdict about reviewing of tsearch_core patch.
* Dept of second thoughts: the IQ of estimate_array_length() needs to beTom Lane2007-01-28
| | | | | kept on par with that of scalararraysel(), else estimates that should track might not. Hence teach it about binary-compatible cases, too.
* Fix scalararraysel() to cope with binary-compatible cases, such as text[]Tom Lane2007-01-28
| | | | | | versus varchar[]. This oversight probably explains Ryan Holmes' recent complaint --- he was getting a generic selectivity estimate instead of anything intelligent.
* Put back planner's ability to cache the results of mergejoinscansel(),Tom Lane2007-01-22
| | | | | | | | | | which I had removed in the first cut of the EquivalenceClass rewrite to simplify that patch a little. But it's still important --- in a four-way join problem mergejoinscansel() was eating about 40% of the planning time according to gprof. Also, improve the EquivalenceClass code to re-use join RestrictInfos rather than generating fresh ones for each join considered. This saves some memory space but more importantly improves the effectiveness of caching planning info in RestrictInfos.
* Refactor planner's pathkeys data structure to create a separate, explicitTom Lane2007-01-20
| | | | | | | | | | | | | | representation of equivalence classes of variables. This is an extensive rewrite, but it brings a number of benefits: * planner no longer fails in the presence of "incomplete" operator families that don't offer operators for every possible combination of datatypes. * avoid generating and then discarding redundant equality clauses. * remove bogus assumption that derived equalities always use operators named "=". * mergejoins can work with a variety of sort orders (e.g., descending) now, instead of tying each mergejoinable operator to exactly one sort order. * better recognition of redundant sort columns. * can make use of equalities appearing underneath an outer join.
* Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LASTTom Lane2007-01-09
| | | | | | | | | | | | per-column options for btree indexes. The planner's support for this is still pretty rudimentary; it does not yet know how to plan mergejoins with nondefault ordering options. The documentation is pretty rudimentary, too. I'll work on improving that stuff later. Note incompatible change from prior behavior: ORDER BY ... USING will now be rejected if the operator is not a less-than or greater-than member of some btree opclass. This prevents less-than-sane behavior if an operator that doesn't actually define a proper sort ordering is selected.