aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt/selfuncs.c
Commit message (Collapse)AuthorAge
...
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-05
| | | | back-stamped for this.
* Fix regex_fixed_prefix() to cope reasonably well with regex patterns of theTom Lane2007-01-03
| | | | | | | | | | form '^(foo)$'. Before, these could never be optimized into indexscans. The recent changes to make psql and pg_dump generate such patterns (for \d commands and -t and related switches, respectively) therefore represented a big performance hit for people with large pg_class catalogs, as seen in recent gripe from Erik Jones. While at it, be more paranoid about case-sensitivity checking in multibyte encodings, and fix some other corner cases in which a regex might be interpreted too liberally.
* Restructure operator classes to allow improved handling of cross-data-typeTom Lane2006-12-23
| | | | | | | | | | | | | | | | cases. Operator classes now exist within "operator families". While most families are equivalent to a single class, related classes can be grouped into one family to represent the fact that they are semantically compatible. Cross-type operators are now naturally adjunct parts of a family, without having to wedge them into a particular opclass as we had done originally. This commit restructures the catalogs and cleans up enough of the fallout so that everything still works at least as well as before, but most of the work needed to actually improve the planner's behavior will come later. Also, there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way to create a new family right now is to allow CREATE OPERATOR CLASS to make one by default. I owe some more documentation work, too. But that can all be done in smaller pieces once this infrastructure is in place.
* Fix some planner bugs exposed by reports from Arjen van der Meijden. TheseTom Lane2006-12-15
| | | | | | | | | | | | | | | | | | | | | | | | are all in new-in-8.2 logic associated with indexability of ScalarArrayOpExpr (IN-clauses) or amortization of indexscan costs across repeated indexscans on the inside of a nestloop. In particular: Fix some logic errors in the estimation for multiple scans induced by a ScalarArrayOpExpr indexqual. Include a small cost component in bitmap index scans to reflect the costs of manipulating the bitmap itself; this is mainly to prevent a bitmap scan from appearing to have the same cost as a plain indexscan for fetching a single tuple. Also add a per-index-scan-startup CPU cost component; while prior releases were clearly too pessimistic about the cost of repeated indexscans, the original 8.2 coding allowed the cost of an indexscan to effectively go to zero if repeated often enough, which is overly optimistic. Pay some attention to index correlation when estimating costs for a nestloop inner indexscan: this is significant when the plan fetches multiple heap tuples per iteration, since high correlation means those tuples are probably on the same or adjacent heap pages.
* pgindent run for 8.2.Bruce Momjian2006-10-04
|
* Change patternsel (LIKE/regex selectivity estimation) so that if thereTom Lane2006-09-20
| | | | | | | | | is a large enough histogram, it will use the number of matches in the histogram to derive a selectivity estimate, rather than the admittedly pretty bogus heuristics involving examining the pattern contents. I set 'large enough' at 100, but perhaps we should change that later. Also apply the same technique in contrib/ltree's <@ and @> estimator. Per discussion with Stefan Kaltenbrunner and Matteo Beccati.
* Improve usage of effective_cache_size parameter by assuming that all theTom Lane2006-09-19
| | | | | | | | | | | tables in the query compete for cache space, not just the one we are currently costing an indexscan for. This seems more realistic, and it definitely will help in examples recently exhibited by Stefan Kaltenbrunner. To get the total size of all the tables involved, we must tweak the handling of 'append relations' a bit --- formerly we looked up information about the child tables on-the-fly during set_append_rel_pathlist, but it needs to be done before we start doing any cost estimation, so push it into the add_base_rels_to_query scan.
* Work around bug in strxfmt() but in MS VS2005.Bruce Momjian2006-07-26
| | | | William ZHANG
* Add a fudge factor to genericcostestimate() to prevent the planner fromTom Lane2006-07-24
| | | | | | | thinking that indexes of different sizes are equally attractive. Per gripe from Jim Nasby. (I remain unconvinced that there's such a problem in existing releases, but CVS HEAD definitely has got a problem because of its new count-only-leaf-pages approach to indexscan costing.)
* Remove 576 references of include files that were not needed.Bruce Momjian2006-07-14
|
* Fix oversight in planning for multiple indexscans driven byTom Lane2006-07-01
| | | | | | | | ScalarArrayOpExpr index quals: we were estimating the right total number of rows returned, but treating the index-access part of the cost as if a single scan were fetching that many consecutive index tuples. Actually we should treat it as a multiple indexscan, and if there are enough of 'em the Mackert-Lohman discount should kick in.
* Make the planner estimate costs for nestloop inner indexscans on the basisTom Lane2006-06-06
| | | | | | | | | | | | | | | | | | | | | that the Mackert-Lohmann formula applies across all the repetitions of the nestloop, not just each scan independently. We use the M-L formula to estimate the number of pages fetched from the index as well as from the table; that isn't what it was designed for, but it seems reasonably applicable anyway. This makes large numbers of repetitions look much cheaper than before, which accords with many reports we've received of overestimation of the cost of a nestloop. Also, change the index access cost model to charge random_page_cost per index leaf page touched, while explicitly not counting anything for access to metapage or upper tree pages. This may all need tweaking after we get some field experience, but in simple tests it seems to be giving saner results than before. The main thing is to get the infrastructure in place to let cost_index() and amcostestimate functions take repeated scans into account at all. Per my recent proposal. Note: this patch changes pg_proc.h, but I did not force initdb because the changes are basically cosmetic --- the system does not look into pg_proc to decide how to call an index amcostestimate function, and there's no way to call such a function from SQL at all.
* Add a GUC parameter seq_page_cost, and use that everywhere we formerlyTom Lane2006-06-05
| | | | | | | | assumed that a sequential page fetch has cost 1.0. This patch doesn't in itself change the system's behavior at all, but it opens the door to people adopting other units of measurement for EXPLAIN costs. Also, if we ever decide it's worth inventing per-tablespace access cost settings, this change provides a workable intellectual framework for that.
* GIN: Generalized Inverted iNdex.Teodor Sigaev2006-05-02
| | | | text[], int4[], Tsearch2 support for GIN.
* Avoid assuming that statistics for a parent relation reflect the properties ofTom Lane2006-05-02
| | | | | | | | | | | | | the union of its child relations as well. This might have been a good idea when it was originally coded, but it's a fatally bad idea when inheritance is being used for partitioning. It's better to have no stats at all than completely misleading stats. Per report from Mark Liberman. The bug arguably exists all the way back, but I've only patched HEAD and 8.1 because we weren't particularly trying to support partitioning before 8.1. Eventually we ought to look at deriving union statistics instead of just punting, but for now the drop kick looks good.
* Generalize mcv_selectivity() to support both VAR OP CONST and CONST OP VARTom Lane2006-04-27
| | | | | | cases. This was not needed in the existing uses within selfuncs.c, but if we're gonna export it for general use, the extra generality seems helpful. Motivated by looking at ltree example.
* If we're going to expose VariableStatData for contrib modules to use,Tom Lane2006-04-27
| | | | then we should export a reasonable set of the supporting routines too.
* Move ltree parentsel() selectivity function into /contrib/ltree.Bruce Momjian2006-04-26
|
* Enhanced containment selectivity function for /contrib/ltreeBruce Momjian2006-04-26
| | | | Matteo Beccati
* Eliminate some no-longer-needed workarounds for palloc's old behaviorTom Lane2006-04-20
| | | | | | | | of rejecting palloc(0). Also, tweak like_selectivity() to avoid assuming the presented pattern is nonempty; although that assumption is valid, it doesn't really help much, and the new coding is more correct anyway since it properly handles redundant wildcards. In combination these changes should eliminate a Coverity warning noted by Martijn.
* Update copyright for 2006. Update scripts.Bruce Momjian2006-03-05
|
* Allow row comparisons to be used as indexscan qualifications.Tom Lane2006-01-25
| | | | This completes the project to upgrade our handling of row comparisons.
* Add selectivity-calculation code for RowCompareExpr nodes. Simplistic,Tom Lane2006-01-14
| | | | but a lot better than nothing at all ...
* Improve patternsel() by applying the operator itself to each valueTom Lane2006-01-10
| | | | | | | | | listed in the column's most-common-values statistics entry. This gives us an exact selectivity result for the portion of the column population represented by the MCV list, which can be a big leg up in accuracy if that's a large fraction of the population. The heuristics involving pattern contents and prefix are applied only to the part of the population not included in the MCV list.
* Teach planner and executor to handle ScalarArrayOpExpr as an indexableTom Lane2005-11-25
| | | | | | | | | | | qualification when the underlying operator is indexable and useOr is true. That is, indexkey op ANY (ARRAY[...]) is effectively translated into an OR combination of one indexscan for each array element. This only works for bitmap index scans, of course, since regular indexscans no longer support OR'ing of scans. There are still some loose ends to clean up before changing 'x IN (list)' to translate as a ScalarArrayOpExpr; for instance predtest.c ought to be taught about it. But this gets the basic functionality in place.
* Re-run pgindent, fixing a problem where comment lines after a blankBruce Momjian2005-11-22
| | | | | | | | | comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.
* R-tree is dead ... long live GiST.Tom Lane2005-11-07
|
* Standard pgindent run for 8.1.Bruce Momjian2005-10-15
|
* Document that get_attstatsslot/free_attstatsslot only need to be passedTom Lane2005-10-11
| | | | | | | | valid type information if they are asked to fetch the values part of a pg_statistic slot; these arguments are unneeded if fetching only the numbers part. Use this to save a catcache lookup in btcostestimate, which is looking like a bit of a hotspot in recent profiling. Not a big savings, but since it's essentially free, might as well do it.
* Clean up possibly-uninitialized-variable warnings reported by gcc 4.x.Tom Lane2005-09-24
|
* Suppress signed-vs-unsigned-char warnings.Tom Lane2005-09-24
|
* Remove unnecessary parentheses in assignments.Bruce Momjian2005-07-21
| | | | | Add spaces where needed. Reference time interval variables as tinterval.
* Add time/date macros for code clarity:Bruce Momjian2005-07-21
| | | | | | | #define DAYS_PER_YEAR 365.25 #define MONTHS_PER_YEAR 12 #define DAYS_PER_MONTH 30 #define HOURS_PER_DAY 24
* Add 'day' field to INTERVAL so 1 day interval can be distinguished fromBruce Momjian2005-07-20
| | | | | | | | | | | | | | | | 24 hours. This is very helpful for daylight savings time: select '2005-05-03 00:00:00 EST'::timestamp with time zone + '24 hours'; ?column? ---------------------- 2005-05-04 01:00:00-04 select '2005-05-03 00:00:00 EST'::timestamp with time zone + '1 day'; ?column? ---------------------- 2005-05-04 01:00:00-04 Michael Glaesemann
* Improve comments for AdjustIntervalForTypmod.Bruce Momjian2005-07-12
| | | | Blank line adjustments.
* Clean up the rather historically encumbered interface to now() andTom Lane2005-06-29
| | | | | | | | current time: provide a GetCurrentTimestamp() function that returns current time in the form of a TimestampTz, instead of separate time_t and microseconds fields. This is what all the callers really want anyway, and it eliminates low-level dependencies on AbsoluteTime, which is a deprecated datatype that will have to disappear eventually.
* Change the planner to allow indexscan qualification clauses to useTom Lane2005-06-13
| | | | | | | | | nonconsecutive columns of a multicolumn index, as per discussion around mid-May (pghackers thread "Best way to scan on-disk bitmaps"). This turns out to require only minimal changes in btree, and so far as I can see none at all in GiST. btcostestimate did need some work, but its original assumption that index selectivity == heap selectivity was quite bogus even before this.
* Separate predicate-testing code out of indxpath.c, making it a moduleTom Lane2005-06-10
| | | | | in its own right. As proposed by Simon Riggs, but with some editorializing of my own.
* Remove planner's private fields from Query struct, and put them intoTom Lane2005-06-05
| | | | | | | | a new PlannerInfo struct, which is passed around instead of the bare Query in all the planning code. This commit is essentially just a code-beautification exercise, but it does open the door to making larger changes to the planner data structures without having to muck with the widely-known Query struct.
* patternsel() was improperly stripping RelabelType from the derivedTom Lane2005-06-01
| | | | | | expressions it constructed, causing scalarineqsel to become confused if the underlying variable was of a domain type. Per report from Kevin Grittner.
* Remove support for OR'd indexscans internal to a single IndexScan planTom Lane2005-04-25
| | | | | | | | node, as this behavior is now better done as a bitmap OR indexscan. This allows considerable simplification in nodeIndexscan.c itself as well as several planner modules concerned with indexscan plan generation. Also we can improve the sharing of code between regular and bitmap indexscans, since they are now working with nigh-identical Plan nodes.
* Completion of project to use fixed OIDs for all system catalogs andTom Lane2005-04-14
| | | | | | | indexes. Replace all heap_openr and index_openr calls by heap_open and index_open. Remove runtime lookups of catalog OID numbers in various places. Remove relcache's support for looking up system catalogs by name. Bulky but mostly very boring patch ...
* Second try at making examine_variable and friends behave sanely inTom Lane2005-04-01
| | | | | | | cases with binary-compatible relabeling. My first try was implicitly assuming that all operators scalarineqsel is used for have binary- compatible datatypes on both sides ... which is very wrong of course. Per report from Michael Fuhr.
* First steps towards index scans with heap access decoupled from indexTom Lane2005-03-27
| | | | | | | | | | access: define new index access method functions 'amgetmulti' that can fetch multiple TIDs per call. (The functions exist but are totally untested as yet.) Since I was modifying pg_am anyway, remove the no-longer-needed 'rel' parameter from amcostestimate functions, and also remove the vestigial amowner column that was creating useless work for Alvaro's shared-object-dependencies project. Initdb forced due to changes in pg_am.
* Fix a pair of related issues with estimation of inequalities that involveTom Lane2005-03-26
| | | | | | | | binary-compatible relabeling of one or both operands. examine_variable should avoid stripping RelabelType from non-variable expressions, so that they will continue to have the correct type; and convert_to_scalar should just use that type and ignore the other input type. This isn't perfect but it beats failing entirely. Per example from Michael Fuhr.
* Rename canonical encodings, per Peter:Bruce Momjian2005-03-07
| | | | | | | | | UNICODE => UTF8 ALT => WIN866 WIN => WIN1251 TCVN => WIN1258 The old codes continue to work.
* Revise hash join code so that we can increase the number of batchesTom Lane2005-03-06
| | | | | | | on-the-fly, and thereby avoid blowing out memory when the planner has underestimated the hash table size. Hash join will now obey the work_mem limit with some faithfulness. Per my recent proposal (hash aggregate part isn't done yet though).
* Adjust estimate_num_groups() to not clamp per-relation group countTom Lane2005-02-01
| | | | | | | estimate to less than the number of values estimated for any one grouping Var, as suggested by Manfred. This is intuitively right, and what's more it puts the plan choices in the subselect regression test back the way they were before ...
* When dealing with multiple grouping columns coming from the same table,Tom Lane2005-01-28
| | | | | | | | | clamp the estimated number of groups to table row count over 10, instead of table row count; this reflects a heuristic that people probably won't group over a near-unique set of columns, and the knowledge that we don't currently have any way to estimate the correlation of the columns better than guessing. This change creates a trivial plan change in one of the regression tests.
* Tag appropriate files for rc3PostgreSQL Daemon2004-12-31
| | | | | | | | Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...