aboutsummaryrefslogtreecommitdiff
path: root/src/include/statistics/statistics.h
Commit message (Collapse)AuthorAge
* Initial pgindent and pgperltidy run for v13.Tom Lane2020-05-14
| | | | | | | | | | | Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit c9d297751, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Fix choose_best_statistics to check clauses individuallyTomas Vondra2019-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When picking the best extended statistics object for a list of clauses, it's not enough to look at attnums extracted from the clause list as a whole. Consider for example this query with OR clauses: SELECT * FROM t WHERE (t.a = 1) OR (t.b = 1) OR (t.c = 1) with a statistics defined on columns (a,b). Relying on attnums extracted from the whole OR clause, we'd consider the statistics usable. That does not work, as we see the conditions as a single OR-clause, referencing an attribute not covered by the statistic, leading to empty list of clauses to be estimated using the statistics and an assert failure. This changes choose_best_statistics to check which clauses are actually covered, and only using attributes from the fully covered ones. For the previous example this means the statistics object will not be considered as compatible with the OR-clause. Backpatch to 12, where MCVs were introduced. The issue does not affect older versions because functional dependencies don't handle OR clauses. Author: Tomas Vondra Reviewed-by: Dean Rasheed Reported-By: Manuel Rigger Discussion: https://postgr.es/m/CA+u7OA7H5rcE2=8f263w4NZD6ipO_XOrYB816nuLXbmSTH9pQQ@mail.gmail.com Backpatch-through: 12
* Allow setting statistics target for extended statisticsTomas Vondra2019-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building statistics, we need to decide how many rows to sample and how accurate the resulting statistics should be. Until now, it was not possible to explicitly define statistics target for extended statistics objects, the value was always computed from the per-attribute targets with a fallback to the system-wide default statistics target. That's a bit inconvenient, as it ties together the statistics target set for per-column and extended statistics. In some cases it may be useful to require larger sample / higher accuracy for extended statics (or the other way around), but with this approach that's not possible. So this commit introduces a new command, allowing to specify statistics target for individual extended statistics objects, overriding the value derived from per-attribute targets (and the system default). ALTER STATISTICS stat_name SET STATISTICS target_value; When determining statistics target for an extended statistics object we first look at this explicitly set value. When this value is -1, we fall back to the old formula, looking at the per-attribute targets first and then the system default. This means the behavior is backwards compatible with older PostgreSQL releases. Author: Tomas Vondra Discussion: https://postgr.es/m/20190618213357.vli3i23vpkset2xd@development Reviewed-by: Kirk Jamison, Dean Rasheed
* Fix inconsistencies and typos in the treeMichael Paquier2019-07-29
| | | | | | | | This is numbered take 8, and addresses again a set of issues with code comments, variable names and unreferenced variables. Author: Alexander Lakhin Discussion: https://postgr.es/m/b137b5eb-9c95-9c2f-586e-38aba7d59788@gmail.com
* Fix typos.Amit Kapila2019-05-26
| | | | | | | Reported-by: Alexander Lakhin Author: Alexander Lakhin Reviewed-by: Amit Kapila and Tom Lane Discussion: https://postgr.es/m/7208de98-add8-8537-91c0-f8b089e2928c@gmail.com
* Phase 2 pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | | Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
* Initial pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
* Fix mvdistinct and dependencies size calculationsTomas Vondra2019-04-21
| | | | | | | | | | | | | | | | | | | | | | The formulas used to calculate size while (de)serializing mvndistinct and functional dependencies were based on offset() of the structs. But that is incorrect, because the structures are not copied directly, we we copy the individual fields directly. At the moment this works fine, because there is no alignment padding on any platform we support. But it might break if we ever added some fields into any of the structs, for example. It's also confusing. Fixed by reworking the macros to directly sum sizes of serialized fields. The macros are now useful only for serialiation, so there is no point in keeping them in the public header file. So make them private by moving them to the .c files. Also adds a couple more asserts to check the serialization, and fixes an incorrect allocation of MVDependency instead of (MVDependency *). Reported-By: Tom Lane Discussion: https://postgr.es/m/29785.1555365602@sss.pgh.pa.us
* Fix deserialization of pg_mcv_list valuesTomas Vondra2019-03-28
| | | | | | | | | | | | | | | | | | There were multiple issues in deserialization of pg_mcv_list values. Firstly, the data is loaded from syscache, but the deserialization was performed after ReleaseSysCache(), at which point the data might have already disappeared. Fixed by moving the calls in statext_mcv_load, and using the same NULL-handling code as existing stats. Secondly, the deserialized representation used pointers into the serialized representation. But that is also unsafe, because the data may disappear at any time. Fixed by reworking and simplifying the deserialization code to always copy all the data. And thirdly, when deserializing values for types passed by value, the code simply did memcpy(d,s,typlen) which however does not work on bigendian machines. Fixed by using fetch_att/store_att_byval.
* Minor improvements for the multivariate MCV listsTomas Vondra2019-03-27
| | | | | | | | | | | | | | The MCV build should always call get_mincount_for_mcv_list(), as the there is no other logic to decide whether the MCV list represents all the data. So just remove the (ngroups > nitems) condition. Also, when building MCV lists, the number of items was limited by the statistics target (i.e. up to 10000). But when deserializing the MCV list, a different value (8192) was used to check the input, causing an error. Simply ensure that the same value is used in both places. This should have been included in 7300a69950, but I forgot to include it in that commit.
* Add support for multivariate MCV listsTomas Vondra2019-03-27
| | | | | | | | | | | | | | | | Introduce a third extended statistic type, supported by the CREATE STATISTICS command - MCV lists, a generalization of the statistic already built and used for individual columns. Compared to the already supported types (n-distinct coefficients and functional dependencies), MCV lists are more complex, include column values and allow estimation of much wider range of common clauses (equality and inequality conditions, IS NULL, IS NOT NULL etc.). Similarly to the other types, a new pseudo-type (pg_mcv_list) is used. Author: Tomas Vondra Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
* Rename nodes/relation.h to nodes/pathnodes.h.Tom Lane2019-01-29
| | | | | | | | | | | | | The old name of this file was never a very good indication of what it was for. Now that there's also access/relation.h, we have a potential confusion hazard as well, so let's rename it to something more apropos. Per discussion, "pathnodes.h" is reasonable, since a good fraction of the file is Path node definitions. While at it, tweak a couple of other headers that were gratuitously importing relation.h into modules that don't need it. Discussion: https://postgr.es/m/7719.1548688728@sss.pgh.pa.us
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Phase 2 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Initial pgindent run with pg_bsd_indent version 2.0.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new indent version includes numerous fixes thanks to Piotr Stefaniak. The main changes visible in this commit are: * Nicer formatting of function-pointer declarations. * No longer unexpectedly removes spaces in expressions using casts, sizeof, or offsetof. * No longer wants to add a space in "struct structname *varname", as well as some similar cases for const- or volatile-qualified pointers. * Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely. * Fixes bug where comments following declarations were sometimes placed with no space separating them from the code. * Fixes some odd decisions for comments following case labels. * Fixes some cases where comments following code were indented to less than the expected column 33. On the less good side, it now tends to put more whitespace around typedef names that are not listed in typedefs.list. This might encourage us to put more effort into typedef name collection; it's not really a bug in indent itself. There are more changes coming after this round, having to do with comment indentation and alignment of lines appearing within parentheses. I wanted to limit the size of the diffs to something that could be reviewed without one's eyes completely glazing over, so it seemed better to split up the changes as much as practical. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Rename function for consistencyAlvaro Herrera2017-06-15
| | | | | | | Avoid using prefix "staext" when everything else uses "statext". Author: Kyotaro HORIGUCHI Discussion: https://postgr.es/m/20170615.140041.165731947.horiguchi.kyotaro@lab.ntt.co.jp
* Collect and use multi-column dependency statsSimon Riggs2017-04-05
| | | | | | | | | | | | | | | | | | Follow on patch in the multi-variate statistics patch series. CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t; ANALYZE; will collect dependency stats on (a, b) and then use the measured dependency in subsequent query planning. Commit 7b504eb282ca2f5104b5c00b4f05a3ef6bb1385b added CREATE STATISTICS with n-distinct coefficients. These are now specified using the mutually exclusive option WITH (ndistinct). Author: Tomas Vondra, David Rowley Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas and many other comments and contributions Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com
* Fix uninitialized memory propagation mistakesAlvaro Herrera2017-03-27
| | | | | | | | | | | | | Valgrind complains that some uninitialized bytes are being passed around by the extended statistics code since commit 7b504eb282ca2f, as reported by Andres Freund. Silence it. Tomas Vondra submitted a patch which he verified to fix the complaints in his machine; however I messed with it a bit before pushing, so any remaining problems are likely my (Álvaro's) fault. Author: Tomas Vondra Discussion: https://postgr.es/m/20170325211031.4xxoptigqxm2emn2@alap3.anarazel.de
* Implement multivariate n-distinct coefficientsAlvaro Herrera2017-03-24
Add support for explicitly declared statistic objects (CREATE STATISTICS), allowing collection of statistics on more complex combinations that individual table columns. Companion commands DROP STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are added too. All this DDL has been designed so that more statistic types can be added later on, such as multivariate most-common-values and multivariate histograms between columns of a single table, leaving room for permitting columns on multiple tables, too, as well as expressions. This commit only adds support for collection of n-distinct coefficient on user-specified sets of columns in a single table. This is useful to estimate number of distinct groups in GROUP BY and DISTINCT clauses; estimation errors there can cause over-allocation of memory in hashed aggregates, for instance, so it's a worthwhile problem to solve. A new special pseudo-type pg_ndistinct is used. (num-distinct estimation was deemed sufficiently useful by itself that this is worthwhile even if no further statistic types are added immediately; so much so that another version of essentially the same functionality was submitted by Kyotaro Horiguchi: https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp though this commit does not use that code.) Author: Tomas Vondra. Some code rework by Álvaro. Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes, Ideriha Takeshi Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql