diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2018-12-14 12:52:49 -0500 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2018-12-14 12:52:49 -0500 |
commit | 5e09280057a4c3f5db297348ea3e044c9c5f4ef8 (patch) | |
tree | a153ceede13d3b807d48d420896b6763d44c9086 /src/include/commands | |
parent | 8fb569e978af3995f0dd6b0033758ec571aab0c1 (diff) | |
download | postgresql-5e09280057a4c3f5db297348ea3e044c9c5f4ef8.tar.gz postgresql-5e09280057a4c3f5db297348ea3e044c9c5f4ef8.zip |
Make pg_statistic and related code account more honestly for collations.
When we first put in collations support, we basically punted on teaching
pg_statistic, ANALYZE, and the planner selectivity functions about that.
They've just used DEFAULT_COLLATION_OID independently of the actual
collation of the data. It's time to improve that, so:
* Add columns to pg_statistic that record the specific collation associated
with each statistics slot.
* Teach ANALYZE to use the column's actual collation when comparing values
for statistical purposes, and record this in the appropriate slot. (Note
that type-specific typanalyze functions are now expected to fill
stats->stacoll with the appropriate collation, too.)
* Teach assorted selectivity functions to use the actual collation of
the stats they are looking at, instead of just assuming it's
DEFAULT_COLLATION_OID.
This should give noticeably better results in selectivity estimates for
columns with nondefault collations, at least for query clauses that use
that same collation (which would be the default behavior in most cases).
It's still true that comparisons with explicit COLLATE clauses different
from the stored data's collation won't be well-estimated, but that's no
worse than before. Also, this patch does make the first step towards
doing better with that, which is that it's now theoretically possible to
collect stats for a collation other than the column's own collation.
Patch by me; thanks to Peter Eisentraut for review.
Discussion: https://postgr.es/m/14706.1544630227@sss.pgh.pa.us
Diffstat (limited to 'src/include/commands')
-rw-r--r-- | src/include/commands/vacuum.h | 11 |
1 files changed, 8 insertions, 3 deletions
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h index 2f4303e40d8..dfff23ac55b 100644 --- a/src/include/commands/vacuum.h +++ b/src/include/commands/vacuum.h @@ -52,9 +52,11 @@ * careful to allocate any pointed-to data in anl_context, which will NOT * be CurrentMemoryContext when compute_stats is called. * - * Note: for the moment, all comparisons done for statistical purposes - * should use the database's default collation (DEFAULT_COLLATION_OID). - * This might change in some future release. + * Note: all comparisons done for statistical purposes should use the + * underlying column's collation (attcollation), except in situations + * where a noncollatable container type contains a collatable type; + * in that case use the type's default collation. Be sure to record + * the appropriate collation in stacoll. *---------- */ typedef struct VacAttrStats *VacAttrStatsP; @@ -78,11 +80,13 @@ typedef struct VacAttrStats * because some index opclasses store a different type than the underlying * column/expression. Instead use attrtypid, attrtypmod, and attrtype for * information about the datatype being fed to the typanalyze function. + * Likewise, use attrcollid not attr->attcollation. */ Form_pg_attribute attr; /* copy of pg_attribute row for column */ Oid attrtypid; /* type of data being analyzed */ int32 attrtypmod; /* typmod of data being analyzed */ Form_pg_type attrtype; /* copy of pg_type row for attrtypid */ + Oid attrcollid; /* collation of data being analyzed */ MemoryContext anl_context; /* where to save long-lived data */ /* @@ -103,6 +107,7 @@ typedef struct VacAttrStats float4 stadistinct; /* # distinct values */ int16 stakind[STATISTIC_NUM_SLOTS]; Oid staop[STATISTIC_NUM_SLOTS]; + Oid stacoll[STATISTIC_NUM_SLOTS]; int numnumbers[STATISTIC_NUM_SLOTS]; float4 *stanumbers[STATISTIC_NUM_SLOTS]; int numvalues[STATISTIC_NUM_SLOTS]; |