aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt/selfuncs.c
Commit message (Collapse)AuthorAge
...
* Fix BRIN cost estimationAlvaro Herrera2017-04-06
| | | | | | | | | | | The original code was overly optimistic about the cost of scanning a BRIN index, leading to BRIN indexes being selected when they'd be a worse choice than some other index. This complete rewrite should be more accurate. Author: David Rowley, based on an earlier patch by Emre Hasegeli Reviewed-by: Emre Hasegeli Discussion: https://postgr.es/m/CAKJS1f9n-Wapop5Xz1dtGdpdqmzeGqQK4sV2MK-zZugfC14Xtw@mail.gmail.com
* Collect and use multi-column dependency statsSimon Riggs2017-04-05
| | | | | | | | | | | | | | | | | | Follow on patch in the multi-variate statistics patch series. CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t; ANALYZE; will collect dependency stats on (a, b) and then use the measured dependency in subsequent query planning. Commit 7b504eb282ca2f5104b5c00b4f05a3ef6bb1385b added CREATE STATISTICS with n-distinct coefficients. These are now specified using the mutually exclusive option WITH (ndistinct). Author: Tomas Vondra, David Rowley Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas and many other comments and contributions Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com
* Abstract logic to allow for multiple kinds of child rels.Robert Haas2017-04-03
| | | | | | | | | | | | | | | | | | | | | | Currently, the only type of child relation is an "other member rel", which is the child of a baserel, but in the future joins and even upper relations may have child rels. To facilitate that, introduce macros that test to test for particular RelOptKind values, and use them in various places where they help to clarify the sense of a test. (For example, a test may allow RELOPT_OTHER_MEMBER_REL either because it intends to allow child rels, or because it intends to allow simple rels.) Also, remove find_childrel_top_parent, which will not work for a child rel that is not a baserel. Instead, add a new RelOptInfo member top_parent_relids to track the same kind of information in a more generic manner. Ashutosh Bapat, slightly tweaked by me. Review and testing of the patch set from which this was taken by Rajkumar Raghuwanshi and Rafia Sabih. Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
* Fix thinko in estimate_num_groupsAlvaro Herrera2017-03-27
| | | | | | | | | | | | | | | The code for the reworked n-distinct estimation on commit 7b504eb282 was written differently in a previous version of the patch, prior to commit; on rewriting it, we missed updating an initializer. This caused the code to (mistakenly) apply a fudge factor even in the case where a single value is applied, leading to incorrect results. This means that the 'relvarcount' variable name is now wrong. Add a comment to try and make the situation clearer, and remove an incorrect comment I added. Problem noticed, and code patch, by Tomas Vondra. Additional commentary by Álvaro.
* Implement multivariate n-distinct coefficientsAlvaro Herrera2017-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for explicitly declared statistic objects (CREATE STATISTICS), allowing collection of statistics on more complex combinations that individual table columns. Companion commands DROP STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are added too. All this DDL has been designed so that more statistic types can be added later on, such as multivariate most-common-values and multivariate histograms between columns of a single table, leaving room for permitting columns on multiple tables, too, as well as expressions. This commit only adds support for collection of n-distinct coefficient on user-specified sets of columns in a single table. This is useful to estimate number of distinct groups in GROUP BY and DISTINCT clauses; estimation errors there can cause over-allocation of memory in hashed aggregates, for instance, so it's a worthwhile problem to solve. A new special pseudo-type pg_ndistinct is used. (num-distinct estimation was deemed sufficiently useful by itself that this is worthwhile even if no further statistic types are added immediately; so much so that another version of essentially the same functionality was submitted by Kyotaro Horiguchi: https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp though this commit does not use that code.) Author: Tomas Vondra. Some code rework by Álvaro. Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes, Ideriha Takeshi Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
* ICU supportPeter Eisentraut2017-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a column collprovider to pg_collation that determines which library provides the collation data. The existing choices are default and libc, and this adds an icu choice, which uses the ICU4C library. The pg_locale_t type is changed to a union that contains the provider-specific locale handles. Users of locale information are changed to look into that struct for the appropriate handle to use. Also add a collversion column that records the version of the collation when it is created, and check at run time whether it is still the same. This detects potentially incompatible library upgrades that can corrupt indexes and other structures. This is currently only supported by ICU-provided collations. initdb initializes the default collation set as before from the `locale -a` output but also adds all available ICU locales with a "-x-icu" appended. Currently, ICU-provided collations can only be explicitly named collations. The global database locales are still always libc-provided. ICU support is enabled by configure --with-icu. Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Andreas Karlsson <andreas@proxel.se>
* Add support for EUI-64 MAC addresses as macaddr8Stephen Frost2017-03-15
| | | | | | | | | | | | | | | | | | | | | This adds in support for EUI-64 MAC addresses by adding a new data type called 'macaddr8' (using our usual convention of indicating the number of bytes stored). This was largely a copy-and-paste from the macaddr data type, with appropriate adjustments for having 8 bytes instead of 6 and adding support for converting a provided EUI-48 (6 byte format) to the EUI-64 format. Conversion from EUI-48 to EUI-64 inserts FFFE as the 4th and 5th bytes but does not perform the IPv6 modified EUI-64 action of flipping the 7th bit, but we add a function to perform that specific action for the user as it may be commonly done by users who wish to calculate their IPv6 address based on their network prefix and 48-bit MAC address. Author: Haribabu Kommi, with a good bit of rework of macaddr8_in by me. Reviewed by: Vitaly Burovoy, Kuntal Ghosh Discussion: https://postgr.es/m/CAJrrPGcUi8ZH+KkK+=TctNQ+EfkeCEHtMU_yo1mvX8hsk_ghNQ@mail.gmail.com
* Spelling fixes in code commentsPeter Eisentraut2017-03-14
| | | | From: Josh Soref <jsoref@gmail.com>
* Use wrappers of PG_DETOAST_DATUM_PACKED() more.Noah Misch2017-03-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | This makes almost all core code follow the policy introduced in the previous commit. Specific decisions: - Text search support functions with char* and length arguments, such as prsstart and lexize, may receive unaligned strings. I doubt maintainers of non-core text search code will notice. - Use plain VARDATA() on values detoasted or synthesized earlier in the same function. Use VARDATA_ANY() on varlenas sourced outside the function, even if they happen to always have four-byte headers. As an exception, retain the universal practice of using VARDATA() on return values of SendFunctionCall(). - Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large for a one-byte header, so this misses no optimization.) Sites that do not call get_page_from_raw() typically need the four-byte alignment. - For now, do not change btree_gist. Its use of four-byte headers in memory is partly entangled with storage of 4-byte headers inside GBT_VARKEY, on disk. - For now, do not change gtrgm_consistent() or gtrgm_distance(). They incorporate the varlena header into a cache, and there are multiple credible implementation strategies to consider.
* Put back <float.h> in a few files that need it for _isnan().Tom Lane2017-03-08
| | | | | | | | | | | Further fallout from commit c29aff959: there are some files that need <float.h>, and were getting it from datatype/timestamp.h, but it was not apparent in my (tgl's) testing because the requirement for <float.h> exists only on certain Windows toolchains. Report and patch by David Rowley. Discussion: https://postgr.es/m/CAKJS1f-BHceaFzZScFapDV48gUVM2CAOBfhkgffdqXzFb+kwew@mail.gmail.com
* Remove now-dead code for !HAVE_INT64_TIMESTAMP.Tom Lane2017-02-23
| | | | | | | This is a basically mechanical removal of #ifdef HAVE_INT64_TIMESTAMP tests and the negative-case controlled code. Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us
* Make more use of castNode()Peter Eisentraut2017-02-21
|
* Add optimizer and executor support for parallel index scans.Robert Haas2017-02-15
| | | | | | | | | | | | In combination with 569174f1be92be93f5366212cc46960d28a5c5cd, which taught the btree AM how to perform parallel index scans, this allows parallel index scan plans on btree indexes. This infrastructure should be general enough to support parallel index scans for other index AMs as well, if someone updates them to support parallel scans. Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar Ahuja, and Haribabu Kommi, and me.
* Move some things from builtins.h to new header filesPeter Eisentraut2017-01-20
| | | | This avoids that builtins.h has to include additional header files.
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Improve eqjoinsel_semi's behavior for small inner relations with no stats.Tom Lane2016-11-29
| | | | | | | | | | | | | | | | | | | | If we don't have any MCV statistics for the inner relation, and we don't trust its numdistinct estimate either, eqjoinsel_semi falls back to a very conservative estimate (that 50% of the outer rows have matches). This is particularly problematic if the inner relation is completely empty, since then even an explicit ANALYZE won't produce any pg_statistic entries, so there's no way to budge the planner off the bad estimate. We'd produce a better estimate in such cases if we used the nd2/nd1 selectivity heuristic, so an easy fix is to treat the nd2 estimate as non-default if we derive it from clamping to the inner rel's rowcount estimate. This won't fix every related case (mainly because the rowcount estimate might be larger than DEFAULT_NUM_DISTINCT), but it seems like a sane extension of the existing logic, so let's apply the change in HEAD and see if anyone complains. Per bug #14438 from Nikolay Nikitin. Report: https://postgr.es/m/20161128182113.6527.58926@wrigleys.postgresql.org Discussion: https://postgr.es/m/31089.1480384713@sss.pgh.pa.us
* Fix misestimation of n_distinct for a nearly-unique column with many nulls.Tom Lane2016-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If ANALYZE found no repeated non-null entries in its sample, it set the column's stadistinct value to -1.0, intending to indicate that the entries are all distinct. But what this value actually means is that the number of distinct values is 100% of the table's rowcount, and thus it was overestimating the number of distinct values by however many nulls there are. This could lead to very poor selectivity estimates, as for example in a recent report from Andreas Joseph Krogh. We should discount the stadistinct value by whatever we've estimated the nulls fraction to be. (That is what will happen if we choose to use a negative stadistinct for a column that does have repeated entries, so this code path was just inconsistent.) In addition to fixing the stadistinct entries stored by several different ANALYZE code paths, adjust the logic where get_variable_numdistinct() forces an "all distinct" estimate on the basis of finding a relevant unique index. Unique indexes don't reject nulls, so there's no reason to assume that the null fraction doesn't apply. Back-patch to all supported branches. Back-patching is a bit of a judgment call, but this problem seems to affect only a few users (else we'd have identified it long ago), and it's bad enough when it does happen that destabilizing plan choices in a worse direction seems unlikely. Patch by me, with documentation wording suggested by Dean Rasheed Report: <VisenaEmail.26.df42f82acae38a58.156463942b8@tc7-visena> Discussion: <16143.1470350371@sss.pgh.pa.us>
* Revert CREATE INDEX ... INCLUDING ...Teodor Sigaev2016-04-08
| | | | | | It's not ready yet, revert two commits 690c543550b0d2852060c18d270cdb534d339d9a - unstable test output 386e3d7609c49505e079c40c65919d99feb82505 - patch itself
* CREATE INDEX ... INCLUDING (column[, ...])Teodor Sigaev2016-04-08
| | | | | | | | | | Now indexes (but only B-tree for now) can contain "extra" column(s) which doesn't participate in index structure, they are just stored in leaf tuples. It allows to use index only scan by using single index instead of two or more indexes. Author: Anastasia Lubennikova with minor editorializing by me Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
* Improve estimate of distinct values in estimate_num_groups().Dean Rasheed2016-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When adjusting the estimate for the number of distinct values from a rel in a grouped query to take into account the selectivity of the rel's restrictions, use a formula that is less likely to produce under-estimates. The old formula simply multiplied the number of distinct values in the rel by the restriction selectivity, which would be correct if the restrictions were fully correlated with the grouping expressions, but can produce significant under-estimates in cases where they are not well correlated. The new formula is based on the random selection probability, and so assumes that the restrictions are not correlated with the grouping expressions. This is guaranteed to produce larger estimates, and of course risks over-estimating in cases where the restrictions are correlated, but that has less severe consequences than under-estimating, which might lead to a HashAgg that consumes an excessive amount of memory. This could possibly be improved upon in the future by identifying correlated restrictions and using a hybrid of the old and new formulae. Author: Tomas Vondra, with some hacking be me Reviewed-by: Mark Dilger, Alexander Korotkov, Dean Rasheed and Tom Lane Discussion: http://www.postgresql.org/message-id/flat/56CD0381.5060502@2ndquadrant.com
* Guard against zero vardata.rel->tuples in estimate_hash_bucketsize().Tom Lane2016-03-27
| | | | | | | | | | If the referenced rel was proven empty, we'd compute 0/0 here, which results in the function returning NaN. That's a bit more serious than the other zero-divide case. Still, it only seems to be possible in HEAD, so no back-patch. Per report from Piotr Stefaniak. I looked through the rest of selfuncs.c and found no other likely trouble spots.
* Clamp adjusted ndistinct to positive integer in estimate_hash_bucketsize().Tom Lane2016-03-27
| | | | | | | | | | | | This avoids a possible divide-by-zero in the following calculation, and rounding the number to an integer seems like saner behavior anyway. Assuming IEEE math, the division would yield +Infinity which would get replaced by 1.0 at the bottom of the function, so nothing really interesting would ensue; but avoiding divide-by-zero seems like a good idea on general principles. Per report from Piotr Stefaniak. No back-patch since this seems mostly cosmetic.
* Support CREATE ACCESS METHODAlvaro Herrera2016-03-23
| | | | | | | | | | | | | | | This enables external code to create access methods. This is useful so that extensions can add their own access methods which can be formally tracked for dependencies, so that DROP operates correctly. Also, having explicit support makes pg_dump work correctly. Currently only index AMs are supported, but we expect different types to be added in the future. Authors: Alexander Korotkov, Petr Jelínek Reviewed-By: Teodor Sigaev, Petr Jelínek, Jim Nasby Commitfest-URL: https://commitfest.postgresql.org/9/353/ Discussion: https://www.postgresql.org/message-id/CAPpHfdsXwZmojm6Dx+TJnpYk27kT4o7Ri6X_4OSWcByu1Rm+VA@mail.gmail.com
* Give pull_var_clause() reject/recurse/return behavior for WindowFuncs too.Tom Lane2016-03-10
| | | | | | | | | | All along, this function should have treated WindowFuncs in a manner similar to Aggrefs, ie with an option whether or not to recurse into them. By not considering the case, it was always recursing, which is OK for most callers (although I suspect that the case in prepare_sort_from_pathkeys might represent a bug). But now we need return-without-recursing behavior as well. There are also more than a few callers that should never see a WindowFunc, and now we'll get some error checking on that.
* Refactor pull_var_clause's API to make it less tedious to extend.Tom Lane2016-03-10
| | | | | | | | | | | | | | | | | | | | In commit 1d97c19a0f748e94 and later c1d9579dd8bf3c92, we extended pull_var_clause's API by adding enum-type arguments. That's sort of a pain to maintain, though, because it means every time we add a new behavior we must touch every last one of the call sites, even if there's a reasonable default behavior that most of them could use. Let's switch over to using a bitmask of flags, instead; that seems more maintainable and might save a nanosecond or two as well. This commit changes no behavior in itself, though I'm going to follow it up with one that does add a new behavior. In passing, remove flatten_tlist(), which has not been used since 9.1 and would otherwise need the same API changes. Removing these enums means that optimizer/tlist.h no longer needs to depend on optimizer/var.h. Changing that caused a number of C files to need addition of #include "optimizer/var.h" (probably we can thank old runs of pgrminclude for that); but on balance it seems like a good change anyway.
* Restructure index access method API to hide most of it at the C level.Tom Lane2016-01-17
| | | | | | | | | | | | | | | | | | | | | | | | This patch reduces pg_am to just two columns, a name and a handler function. All the data formerly obtained from pg_am is now provided in a C struct returned by the handler function. This is similar to the designs we've adopted for FDWs and tablesample methods. There are multiple advantages. For one, the index AM's support functions are now simple C functions, making them faster to call and much less error-prone, since the C compiler can now check function signatures. For another, this will make it far more practical to define index access methods in installable extensions. A disadvantage is that SQL-level code can no longer see attributes of index AMs; in particular, some of the crosschecks in the opr_sanity regression test are no longer possible from SQL. We've addressed that by adding a facility for the index AM to perform such checks instead. (Much more could be done in that line, but for now we're content if the amvalidate functions more or less replace what opr_sanity used to do.) We might also want to expose some sort of reporting functionality, but this patch doesn't do that. Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily editorialized on by me.
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Add some more defenses against silly estimates to gincostestimate().Tom Lane2016-01-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A report from Andy Colson showed that gincostestimate() was not being nearly paranoid enough about whether to believe the statistics it finds in the index metapage. The problem is that the metapage stats (other than the pending-pages count) are only updated by VACUUM, and in the worst case could still reflect the index's original empty state even when it has grown to many entries. We attempted to deal with that by scaling up the stats to match the current index size, but if nEntries is zero then scaling it up still gives zero. Moreover, the proportion of pages that are entry pages vs. data pages vs. pending pages is unlikely to be estimated very well by scaling if the index is now orders of magnitude larger than before. We can improve matters by expanding the use of the rule-of-thumb estimates I introduced in commit 7fb008c5ee59b040: if the index has grown by more than a cutoff amount (here set at 4X growth) since VACUUM, then use the rule-of-thumb numbers instead of scaling. This might not be exactly right but it seems much less likely to produce insane estimates. I also improved both the scaling estimate and the rule-of-thumb estimate to account for numPendingPages, since it's reasonable to expect that that is accurate in any case, and certainly pages that are in the pending list are not either entry or data pages. As a somewhat separate issue, adjust the estimation equations that are concerned with extra fetches for partial-match searches. These equations suppose that a fraction partialEntries / numEntries of the entry and data pages will be visited as a consequence of a partial-match search. Now, it's physically impossible for that fraction to exceed one, but our estimate of partialEntries is mostly bunk, and our estimate of numEntries isn't exactly gospel either, so we could arrive at a silly value. In the example presented by Andy we were coming out with a value of 100, leading to insane cost estimates. Clamp the fraction to one to avoid that. Like the previous patch, back-patch to all supported branches; this problem can be demonstrated in one form or another in all of them.
* Make gincostestimate() cope with hypothetical GIN indexes.Tom Lane2015-12-01
| | | | | | | | | | | | | | | | | | | | | | | We tried to fetch statistics data from the index metapage, which does not work if the index isn't actually present. If the index is hypothetical, instead extrapolate some plausible internal statistics based on the index page count provided by the index-advisor plugin. There was already some code in gincostestimate() to invent internal stats in this way, but since it was only meant as a stopgap for pre-9.1 GIN indexes that hadn't been vacuumed since upgrading, it was pretty crude. If we want it to support index advisors, we should try a little harder. A small amount of testing says that it's better to estimate the entry pages as 90% of the index, not 100%. Also, estimating the number of entries (keys) as equal to the heap tuple count could be wildly wrong in either direction. Instead, let's estimate 100 entries per entry page. Perhaps someday somebody will want the index advisor to be able to provide these numbers more directly, but for the moment this should serve. Problem report and initial patch by Julien Rouhaud; modified by me to invent less-bogus internal statistics. Back-patch to all supported branches, since we've supported index advisors since 9.0.
* Allow planner to use expression-index stats for function calls in WHERE.Tom Lane2015-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, a function call appearing at the top level of WHERE had a hard-wired selectivity estimate of 0.3333333, a kludge conveniently dated in the source code itself to July 1992. The expectation at the time was that somebody would soon implement estimator support functions analogous to those for operators; but no such code has appeared, nor does it seem likely to in the near future. We do have an alternative solution though, at least for immutable functions on single relations: creating an expression index on the function call will allow ANALYZE to gather stats about the function's selectivity. But the code in clause_selectivity() failed to make use of such data even if it exists. Refactor so that that will happen. I chose to make it try this technique for any clause type for which clause_selectivity() doesn't have a special case, not just functions. To avoid adding unnecessary overhead in the common case where we don't learn anything new, make selfuncs.c provide an API that hooks directly to examine_variable() and then var_eq_const(), rather than the previous coding which laboriously constructed an OpExpr only so that it could be expensively deconstructed again. I preserved the behavior that the default estimate for a function call is 0.3333333. (For any other expression node type, it's 0.5, as before.) I had originally thought to make the default be 0.5 across the board, but changing a default estimate that's survived for twenty-three years seems like something not to do without a lot more testing than I care to put into it right now. Per a complaint from Jehan-Guillaume de Rorthais. Back-patch into 9.5, but not further, at least for the moment.
* Reduce number of bytes examined by convert_one_string_to_scalar().Tom Lane2015-08-23
| | | | | | | | | | | | | | | | | | | | | Previously, convert_one_string_to_scalar() would examine up to 20 bytes of the input string, producing a scalar conversion with theoretical precision far greater than is of any possible use considering the other limitations on the accuracy of the resulting selectivity estimate. (I think this choice might pre-date the caller-level logic that strips any common prefix of the strings; before that, there could have been value in scanning the strings far enough to use all the precision available in a double.) Aside from wasting cycles to little purpose, this choice meant that the "denom" variable could grow to as much as 256^21 = 3.74e50, which could overflow in some non-IEEE float arithmetics. While we don't really support any machines with non-IEEE arithmetic anymore, this still seems like quite an unnecessary platform dependency. Limit the scan to 12 bytes instead, thus limiting "denom" to 256^13 = 2.03e31, a value more likely to be computable everywhere. Per testing by Greg Stark, which showed overflow failures in our standard regression tests on VAX.
* Avoid some zero-divide hazards in the planner.Tom Lane2015-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | Although I think on all modern machines floating division by zero results in Infinity not SIGFPE, we still don't want infinities running around in the planner's costing estimates; too much risk of that leading to insane behavior. grouping_planner() failed to consider the possibility that final_rel might be known dummy and hence have zero rowcount. (I wonder if it would be better to set a rows estimate of 1 for dummy relations? But at least in the back branches, changing this convention seems like a bad idea, so I'll leave that for another day.) Make certain that get_variable_numdistinct() produces a nonzero result. The case that can be shown to be broken is with stadistinct < 0.0 and small ntuples; we did not prevent the result from rounding to zero. For good luck I applied clamp_row_est() to all the nonconstant return values. In ExecChooseHashTableSize(), Assert that we compute positive nbuckets and nbatch. I know of no reason to think this isn't the case, but it seems like a good safety check. Per reports from Piotr Stefaniak. Back-patch to all active branches.
* Revoke support for strxfrm() that write past the specified array length.Noah Misch2015-07-08
| | | | | | | This formalizes a decision implicit in commit 4ea51cdfe85ceef8afabceb03c446574daa0ac23 and adds clean detection of affected systems. Vendor updates are available for each such known bug. Back-patch to 9.5, where the aforementioned commit first appeared.
* Support GROUPING SETS, CUBE and ROLLUP.Andres Freund2015-05-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This SQL standard functionality allows to aggregate data by different GROUP BY clauses at once. Each grouping set returns rows with columns grouped by in other sets set to NULL. This could previously be achieved by doing each grouping as a separate query, conjoined by UNION ALLs. Besides being considerably more concise, grouping sets will in many cases be faster, requiring only one scan over the underlying data. The current implementation of grouping sets only supports using sorting for input. Individual sets that share a sort order are computed in one pass. If there are sets that don't share a sort order, additional sort & aggregation steps are performed. These additional passes are sourced by the previous sort step; thus avoiding repeated scans of the source data. The code is structured in a way that adding support for purely using hash aggregation or a mix of hashing and sorting is possible. Sorting was chosen to be supported first, as it is the most generic method of implementation. Instead of, as in an earlier versions of the patch, representing the chain of sort and aggregation steps as full blown planner and executor nodes, all but the first sort are performed inside the aggregation node itself. This avoids the need to do some unusual gymnastics to handle having to return aggregated and non-aggregated tuples from underlying nodes, as well as having to shut down underlying nodes early to limit memory usage. The optimizer still builds Sort/Agg node to describe each phase, but they're not part of the plan tree, but instead additional data for the aggregation node. They're a convenient and preexisting way to describe aggregation and sorting. The first (and possibly only) sort step is still performed as a separate execution step. That retains similarity with existing group by plans, makes rescans fairly simple, avoids very deep plans (leading to slow explains) and easily allows to avoid the sorting step if the underlying data is sorted by other means. A somewhat ugly side of this patch is having to deal with a grammar ambiguity between the new CUBE keyword and the cube extension/functions named cube (and rollup). To avoid breaking existing deployments of the cube extension it has not been renamed, neither has cube been made a reserved keyword. Instead precedence hacking is used to make GROUP BY cube(..) refer to the CUBE grouping sets feature, and not the function cube(). To actually group by a function cube(), unlikely as that might be, the function name has to be quoted. Needs a catversion bump because stored rules may change. Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
* Add new OID alias type regnamespaceAndrew Dunstan2015-05-09
| | | | | | Catalog version bumped Kyotaro HORIGUCHI
* Add new OID alias type regroleAndrew Dunstan2015-05-09
| | | | | | | | | | | | | | The new type has the scope of whole the database cluster so it doesn't behave the same as the existing OID alias types which have database scope, concerning object dependency. To avoid confusion constants of the new type are prohibited from appearing where dependencies are made involving it. Also, add a note to the docs about possible MVCC violation and optimization issues, which are general over the all reg* types. Kyotaro Horiguchi
* Avoid unused-variable warning in non-assert builds.Tom Lane2015-03-04
| | | | Oversight in my commit b9896198cfbc1b0cd0c631d2af72ffe34bd4c7e5.
* Fix cost estimation for indexscans on expensive indexed expressions.Tom Lane2015-03-03
| | | | | | | | | | | | | | | | | | | | | | genericcostestimate() and friends used the cost of the entire indexqual expressions as the charge for initial evaluation of indexscan arguments. But of course the index column is not evaluated, only the other side of the qual expression, so this was a bad overestimate if the index column was an expensive expression. To fix, refactor the logic in this area so that there's a single routine charged with deconstructing index quals and figuring out what is the index column and what is the comparison expression. This is more or less free in the case of btree indexes, since btcostestimate() was doing equivalent deconstruction already. It probably adds a bit of new overhead in the cases of other index types, but not a lot. (In the case of GIN I think I saved something by getting rid of code that wasn't aware that the index column associations were already available "for free".) Per recent gripe from Jeff Janes. Arguably this is a bug fix, but I'm hesitant to back-patch because of the possibility of destabilizing plan choices that people may be happy with.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* BRIN: Block Range IndexesAlvaro Herrera2014-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with the query quals; all pages in the range are returned in a lossy TID bitmap if the quals are consistent with the values in the summary tuple, otherwise not. Normal index scans are not supported because these indexes do not store TIDs. As new tuples are added into the index, the summary information is updated (if the block range in which the tuple is added is already summarized) or not; in the latter case, a subsequent pass of VACUUM or the brin_summarize_new_values() function will create the summary information. For data types with natural 1-D sort orders, the summary info consists of the maximum and the minimum values of each indexed column within each page range. This type of operator class we call "Minmax", and we supply a bunch of them for most data types with B-tree opclasses. Since the BRIN code is generalized, other approaches are possible for things such as arrays, geometric types, ranges, etc; even for things such as enum types we could do something different than minmax with better results. In this commit I only include minmax. Catalog version bumped due to new builtin catalog entries. There's more that could be done here, but this is a good step forwards. Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera, with contribution by Heikki Linnakangas. Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas. Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo. PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* C comments: remove odd blank lines after #ifdef WIN32 linesBruce Momjian2014-03-13
|
* Items on GIN data pages are no longer always 6 bytes; update gincostestimate.Heikki Linnakangas2014-03-12
| | | | Also improve the comments a bit.
* Use SnapshotDirty rather than an active snapshot to probe index endpoints.Tom Lane2014-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there are lots of uncommitted tuples at the end of the index range, get_actual_variable_range() ends up fetching each one and doing an MVCC visibility check on it, until it finally hits a visible tuple. This is bad enough in isolation, considering that we don't need an exact answer only an approximate one. But because the tuples are not yet committed, each visibility check does a TransactionIdIsInProgress() test, which involves scanning the ProcArray. When multiple sessions do this concurrently, the ensuing contention results in horrid performance loss. 20X overall throughput loss on not-too-complicated queries is easy to demonstrate in the back branches (though someone's made it noticeably less bad in HEAD). We can dodge the problem fairly effectively by using SnapshotDirty rather than a normal MVCC snapshot. This will cause the index probe to take uncommitted tuples as good, so that we incur only one tuple fetch and test even if there are many such tuples. The extent to which this degrades the estimate is debatable: it's possible the result is actually a more accurate prediction than before, if the endmost tuple has become committed by the time we actually execute the query being planned. In any case, it's not very likely that it makes the estimate a lot worse. SnapshotDirty will still reject tuples that are known committed dead, so we won't give bogus answers if an invalid outlier has been deleted but not yet vacuumed from the index. (Because btrees know how to mark such tuples dead in the index, we shouldn't have a big performance problem in the case that there are many of them at the end of the range.) This consideration motivates not using SnapshotAny, which was also considered as a fix. Note: the back branches were using SnapshotNow instead of an MVCC snapshot, but the problem and solution are the same. Per performance complaints from Bartlomiej Romanski, Josh Berkus, and others. Back-patch to 9.0, where the issue was introduced (by commit 40608e7f949fb7e4025c0ddd5be01939adc79eec).
* Do ScalarArrayOp estimation correctly when array is a stable expression.Tom Lane2014-02-21
| | | | | | | | | | | | | | Most estimation functions apply estimate_expression_value to see if they can reduce an expression to a constant; the key difference is that it allows evaluation of stable as well as immutable functions in hopes of ending up with a simple Const node. scalararraysel didn't get the memo though, and neither did gincost_opexpr/gincost_scalararrayopexpr. Fix that, and remove a now-unnecessary estimate_expression_value step in the subsidiary function scalararraysel_containment. Per complaint from Alexey Klyukin. Back-patch to 9.3. The problem goes back further, but I'm hesitant to change estimation behavior in long-stable release branches.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Fix failure with whole-row reference to a subquery.Tom Lane2013-11-11
| | | | | | Simple oversight in commit 1cb108efb0e60d87e4adec38e7636b6e8efbeb57 --- recursively examining a subquery output column is only sane if the original Var refers to a single output column. Found by Kevin Grittner.
* Don't use SnapshotNow in get_actual_variable_range.Robert Haas2013-07-25
| | | | | | | | Instead, use the active snapshot. Per Tom Lane, this function is most interested in knowing the range of tuples our scan will actually see. This is another step towards full removal of SnapshotNow.
* Fix booltestsel() for case where we have NULL stats but not MCV stats.Tom Lane2013-07-24
| | | | | | | | | | | | In a boolean column that contains mostly nulls, ANALYZE might not find enough non-null values to populate the most-common-values stats, but it would still create a pg_statistic entry with stanullfrac set. The logic in booltestsel() for this situation did the wrong thing for "col IS NOT TRUE" and "col IS NOT FALSE" tests, forgetting that null values would satisfy these tests (so that the true selectivity would be close to one, not close to zero). Per bug #8274. Fix by Andrew Gierth, some comment-smithing by me.
* pgindent run for release 9.3Bruce Momjian2013-05-29
| | | | | This is the first run of the Perl-based pgindent script. Also update pgindent instructions.