aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Improve pruning of a default partitionAlvaro Herrera2019-08-04
| | | | | | | | | | | | | | | | | | | When querying a partitioned table containing a default partition, we were wrongly deciding to include it in the scan too early in the process, failing to exclude it in some cases. If we reinterpret the PruneStepResult.scan_default flag slightly, we can do a better job at detecting that it can be excluded. The change is that we avoid setting the flag for that pruning step unless the step absolutely requires the default partition to be scanned (in contrast with the previous arrangement, which was to set it unless the step was able to prune it). So get_matching_partitions() must explicitly check the partition that each returned bound value corresponds to in order to determine whether the default one needs to be included, rather than relying on the flag from the final step result. Author: Yuzuko Hosoya <hosoya.yuzuko@lab.ntt.co.jp> Reviewed-by: Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> Discussion: https://postgr.es/m/00e601d4ca86$932b8bc0$b982a340$@lab.ntt.co.jp
* Fix representation of hash keys in Hash/HashJoin nodes.Andres Freund2019-08-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 5f32b29c1819 I changed the creation of HashState.hashkeys to actually use HashState as the parent (instead of HashJoinState, which was incorrect, as they were executed below HashState), to fix the problem of hashkeys expressions otherwise relying on slot types appropriate for HashJoinState, rather than HashState as would be correct. That reliance was only introduced in 12, which is why it previously worked to use HashJoinState as the parent (although I'd be unsurprised if there were problematic cases). Unfortunately that's not a sufficient solution, because before this commit, the to-be-hashed expressions referenced inner/outer as appropriate for the HashJoin, not Hash. That didn't have obvious bad consequences, because the slots containing the tuples were put into ecxt_innertuple when hashing a tuple for HashState (even though Hash doesn't have an inner plan). There are less common cases where this can cause visible problems however (rather than just confusion when inspecting such executor trees). E.g. "ERROR: bogus varno: 65000", when explaining queries containing a HashJoin where the subsidiary Hash node's hash keys reference a subplan. While normally hashkeys aren't displayed by EXPLAIN, if one of those expressions references a subplan, that subplan may be printed as part of the Hash node - which then failed because an inner plan was referenced, and Hash doesn't have that. It seems quite possible that there's other broken cases, too. Fix the problem by properly splitting the expression for the HashJoin and Hash nodes at plan time, and have them reference the proper subsidiary node. While other workarounds are possible, fixing this correctly seems easy enough. It was a pretty ugly hack to have ExecInitHashJoin put the expression into the already initialized HashState, in the first place. I decided to not just split inner/outer hashkeys inside make_hashjoin(), but also to separate out hashoperators and hashcollations at plan time. Otherwise we would have ended up having two very similar loops, one at plan time and the other during executor startup. The work seems to more appropriately belong to plan time, anyway. Reported-By: Nikita Glukhov, Alexander Korotkov Author: Andres Freund Reviewed-By: Tom Lane, in an earlier version Discussion: https://postgr.es/m/CAPpHfdvGVegF_TKKRiBrSmatJL2dR9uwFCuR+teQ_8tEXU8mxg@mail.gmail.com Backpatch: 12-
* Remove superfluous newlines in function prototypes.Andres Freund2019-07-31
| | | | | | | | | | | | | | | These were introduced by pgindent due to fixe to broken indentation (c.f. 8255c7a5eeba8). Previously the mis-indentation of function prototypes was creatively used to reduce indentation in a few places. As that formatting only exists in master and REL_12_STABLE, it seems better to fix it in both, rather than having some odd indentation in v12 that somebody might copy for future patches or such. Author: Andres Freund Discussion: https://postgr.es/m/20190728013754.jwcbe5nfyt3533vx@alap3.anarazel.de Backpatch: 12-
* Allow table AM's to use rd_amcache, too.Heikki Linnakangas2019-07-30
| | | | | | | | | | | The rd_amcache allows an index AM to cache arbitrary information in a relcache entry. This commit moves the cleanup of rd_amcache so that it can also be used by table AMs. Nothing takes advantage of that yet, but I'm sure it'll come handy for anyone writing new table AMs. Backpatch to v12, where table AM interface was introduced. Reviewed-by: Julien Rouhaud
* Don't build extended statistics on inheritance treesTomas Vondra2019-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | When performing ANALYZE on inheritance trees, we collect two samples for each relation - one for the relation alone, and one for the inheritance subtree (relation and its child relations). And then we build statistics on each sample, so for each relation we get two sets of statistics. For regular (per-column) statistics this works fine, because the catalog includes a flag differentiating statistics built from those two samples. But we don't have such flag in the extended statistics catalogs, and we ended up updating the same row twice, triggering this error: ERROR: tuple already updated by self The simplest solution is to disable extended statistics on inheritance trees, which is what this commit is doing. In the future we may need to do something similar to per-column statistics, but that requires adding a flag to the catalog - and that's not backpatchable. Moreover, the current selectivity estimation code only works with individual relations, so building statistics on inheritance trees would be pointless anyway. Author: Tomas Vondra Backpatch-to: 10- Discussion: https://postgr.es/m/20190618231233.GA27470@telsasoft.com Reported-by: Justin Pryzby
* Fix busted logic for parallel lock grouping in TopoSort().Tom Lane2019-07-29
| | | | | | | | | | | | | | | | | | | | | A "break" statement erroneously left behind by commit a1c1af2a1 caused TopoSort to do the wrong thing if a lock's wait list contained multiple members of the same locking group. Because parallel workers don't normally need any locks not already taken by their leader, this is very hard --- maybe impossible --- to hit in production. Still, if it did happen, the queries involved in an otherwise-resolvable deadlock would block until canceled. In addition to removing the bogus "break", add an Assert showing that the conflicting uses of the beforeConstraints[] array (for both counts and flags) don't overlap, and add some commentary explaining why not; because it's not obvious without explanation, IMHO. Original report and patch from Rui Hai Jiang; additional assert and commentary by me. Back-patch to 9.6 where the bug came in. Discussion: https://postgr.es/m/CAEri+mLd3bpHLyW+a9pSe1y=aEkeuJpwBSwvo-+m4n7-ceRmXw@mail.gmail.com
* Fix handling of expressions and predicates in REINDEX CONCURRENTLYMichael Paquier2019-07-29
| | | | | | | | | | | | | | | | | | | | | | | When copying the definition of an index rebuilt concurrently for the new entry, the index information was taken directly from the old index using the relation cache. In this case, predicates and expressions have some post-processing to prepare things for the planner, which loses some information including the collations added in any of them. This inconsistency can cause issues when attempting for example a table rewrite, and makes the new indexes rebuilt concurrently inconsistent with the old entries. In order to fix the problem, fetch expressions and predicates directly from the catalog of the old entry, and fill in IndexInfo for the new index with that. This makes the process more consistent with DefineIndex(), and the code is refactored with the addition of a routine to create an IndexInfo node. Reported-by: Manuel Rigger Author: Michael Paquier Discussion: https://postgr.es/m/CA+u7OA5Hp0ra235F3czPom_FyAd-3+XwSJmX95r1+sRPOJc9VQ@mail.gmail.com Backpatch-through: 12
* Avoid macro clash with LLVM 9.Thomas Munro2019-07-29
| | | | | | | | | | Early previews of LLVM 9 reveal that our Min() macro causes compiler errors in LLVM headers reached by the #include directives in llvmjit_inline.cpp. Let's just undefine it. Per buildfarm animal seawasp. Back-patch to 11. Reviewed-by: Fabien Coelho, Tom Lane Discussion: https://postgr.es/m/20190606173216.GA6306%40alvherre.pgsql
* Tweak our special-case logic for the IANA "Factory" timezone.Tom Lane2019-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pg_timezone_names() tries to avoid showing the "Factory" zone in the view, mainly because that has traditionally had a very long "abbreviation" such as "Local time zone must be set--see zic manual page", so that showing it messes up psql's formatting of the whole view. Since tzdb version 2016g, IANA instead uses the abbreviation "-00", which is sane enough that there's no reason to discriminate against it. On the other hand, it emerges that FreeBSD and possibly other packagers are so wedded to backwards compatibility that they hack the IANA data to keep the old spelling --- and not just that old spelling, but even older spellings that IANA used back in the stone age. This caused the filter logic to fail to suppress "Factory" at all on such platforms, though the formatting problem is definitely real in that case. To solve both problems, get rid of the hard-wired assumption about exactly what Factory's abbreviation is, and instead reject abbreviations exceeding 31 characters. This will allow Factory to appear in the view if and only if it's using the modern abbreviation. In passing, simplify the code we add to zic.c to support "zic -P" to remove its now-obsolete hacks to not print the Factory zone's abbreviation. Unlike pg_timezone_names(), there's no reason for that code to support old/nonstandard timezone data. Since we generally prefer to keep timezone-related behavior the same in all branches, and since this is arguably a bug fix, back-patch to all supported branches. Discussion: https://postgr.es/m/3961.1564086915@sss.pgh.pa.us
* Fix loss of fractional digits for large values in cash_numeric().Tom Lane2019-07-26
| | | | | | | | | | | | | | | | Money values exceeding about 18 digits (depending on lc_monetary) could be inaccurately converted to numeric, due to select_div_scale() deciding it didn't need to compute any fractional digits. Force its hand by setting the dscale of one division input to equal the number of fractional digits we need. In passing, rearrange the logic to not do useless work in locales where money values are considered integral. Per bug #15925 from Slawomir Chodnicki. Back-patch to all supported branches. Discussion: https://postgr.es/m/15925-da9953e2674bb5c8@postgresql.org
* Fix slot type handling for Agg nodes performing internal sorts.Andres Freund2019-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 15d8f8312 we assert that - and since 7ef04e4d2cb2, 4da597edf1 rely on - the slot type for an expression's ecxt_{outer,inner,scan}tuple not changing, unless explicitly flagged as such. That allows to either skip deforming (for a virtual tuple slot) or optimize the code for JIT accelerated deforming appropriately (for other known slot types). This assumption was sometimes violated for grouping sets, when nodeAgg.c internally uses tuplesorts, and the child node doesn't return a TTSOpsMinimalTuple type slot. Detect that case, and flag that the outer slot might not be "fixed". It's probably worthwhile to optimize this further in the future, and more granularly determine whether the slot is fixed. As we already instantiate per-phase transition and equal expressions, we could cheaply set the slot type appropriately for each phase. But that's a separate change from this bugfix. This commit does include a very minor optimization by avoiding to create a slot for handling tuplesorts, if no such sorts are performed. Previously we created that slot unnecessarily in the common case of computing all grouping sets via hashing. The code looked too confusing without that, as the conditions for needing a sort slot and flagging that the slot type isn't fixed, are the same. Reported-By: Ashutosh Sharma Author: Andres Freund Discussion: https://postgr.es/m/CAE9k0PmNaMD2oHTEAhRyxnxpaDaYkuBYkLa1dpOpn=RS0iS2AQ@mail.gmail.com Backpatch: 12-, where the bug was introduced in 15d8f8312
* Fix failures to ignore \r when reading Windows-style newlines.Tom Lane2019-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | libpq failed to ignore Windows-style newlines in connection service files. This normally wasn't a problem on Windows itself, because fgets() would convert \r\n to just \n. But if libpq were running inside a program that changes the default fopen mode to binary, it would see the \r's and think they were data. In any case, it's project policy to ignore \r in text files unconditionally, because people sometimes try to use files with DOS-style newlines on Unix machines, where the C library won't hide that from us. Hence, adjust parseServiceFile() to ignore \r as well as \n at the end of the line. In HEAD, go a little further and make it ignore all trailing whitespace, to match what it's always done with leading whitespace. In HEAD, also run around and fix up everyplace where we have newline-chomping code to make all those places look consistent and uniformly drop \r. It is not clear whether any of those changes are fixing live bugs. Most of the non-cosmetic changes are in places that are reading popen output, and the jury is still out as to whether popen on Windows can return \r\n. (The Windows-specific code in pipe_read_line seems to think so, but our lack of support for this elsewhere suggests maybe it's not a problem in practice.) Hence, I desisted from applying those changes to back branches, except in run_ssl_passphrase_command() which is new enough and little-tested enough that we'd probably not have heard about any problems there. Tom Lane and Michael Paquier, per bug #15827 from Jorge Gustavo Rocha. Back-patch the parseServiceFile() change to all supported branches, and the run_ssl_passphrase_command() change to v11 where that was added. Discussion: https://postgr.es/m/15827-e6ba53a3a7ed543c@postgresql.org
* Fix system column accesses in ON CONFLICT ... RETURNING.Andres Freund2019-07-24
| | | | | | | | | | | | | | | | | After 277cb789836 ON CONFLICT ... SET ... RETURNING failed with ERROR: virtual tuple table slot does not have system attributes when taking the update path, as the slot used to insert into the table (and then process RETURNING) was defined to be a virtual slot in that commit. Virtual slots don't support system columns except for tableoid and ctid, as the other system columns are AM dependent. Fix that by using a slot of the table's type. Add tests for system column accesses in ON CONFLICT ... RETURNING. Reported-By: Roby, bisected to the relevant commit by Jeff Janes Author: Andres Freund Discussion: https://postgr.es/m/73436355-6432-49B1-92ED-1FE4F7E7E100@finefun.com.au Backpatch: 12-, where the bug was introduced in 277cb789836
* Use full 64-bit XID for checking if a deleted GiST page is old enough.Heikki Linnakangas2019-07-24
| | | | | | | | | | | Otherwise, after a deleted page gets even older, it becomes unrecyclable again. B-tree has the same problem, and has had since time immemorial, but let's at least fix this in GiST, where this is new. Backpatch to v12, where GiST page deletion was introduced. Reviewed-by: Andrey Borodin Discussion: https://www.postgresql.org/message-id/835A15A5-F1B4-4446-A711-BF48357EB602%40yandex-team.ru
* Refactor checks for deleted GiST pages.Heikki Linnakangas2019-07-24
| | | | | | | | | | | | The explicit check in gistScanPage() isn't currently really necessary, as a deleted page is always empty, so the loop would fall through without doing anything, anyway. But it's a marginal optimization, and it gives a nice place to attach a comment to explain how it works. Backpatch to v12, where GiST page deletion was introduced. Reviewed-by: Andrey Borodin Discussion: https://www.postgresql.org/message-id/835A15A5-F1B4-4446-A711-BF48357EB602%40yandex-team.ru
* Check that partitions are not in use when dropping constraintsAlvaro Herrera2019-07-23
| | | | | | | | | | | | | | | | | | | | | | If the user creates a deferred constraint in a partition, and in a transaction they cause the constraint's trigger execution to be deferred until commit time *and* drop the constraint, then when commit time comes the queued trigger will fail to run because the trigger object will have been dropped. This is explained because when a constraint gets dropped in a partitioned table, the recursion to drop the ones in partitions is done by the dependency mechanism, not by ALTER TABLE traversing the recursion tree as in all other cases. In the non-partitioned case, this problem is avoided by checking that the table is not "in use" by alter-table; other alter-table subcommands that recurse to partitions do that check for each partition. But the dependency mechanism doesn't have a way to do that. Fix the problem by applying the same check to all partitions during ALTER TABLE's "prep" phase, which correctly raises the necessary error. Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> Discussion: https://postgr.es/m/CAKcux6nZiO9-eEpr1ZD84bT1mBoVmeZkfont8iSpcmYrjhGWgA@mail.gmail.com
* Install dependencies to prevent dropping partition key columns.Tom Lane2019-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The logic in ATExecDropColumn that rejects dropping partition key columns is quite an inadequate defense, because it doesn't execute in cases where a column needs to be dropped due to cascade from something that only the column, not the whole partitioned table, depends on. That leaves us with a badly broken partitioned table; even an attempt to load its relcache entry will fail. We really need to have explicit pg_depend entries that show that the column can't be dropped without dropping the whole table. Hence, add those entries. In v12 and HEAD, bump catversion to ensure that partitioned tables will have such entries. We can't do that in released branches of course, so in v10 and v11 this patch affords protection only to partitioned tables created after the patch is installed. Given the lack of field complaints (this bug was found by fuzz-testing not by end users), that's probably good enough. In passing, fix ATExecDropColumn and ATPrepAlterColumnType messages to be more specific about which partition key column they're complaining about. Per report from Manuel Rigger. Back-patch to v10 where partitioned tables were added. Discussion: https://postgr.es/m/CA+u7OA4JKCPFrdrAbOs7XBiCyD61XJxeNav4LefkSmBLQ-Vobg@mail.gmail.com Discussion: https://postgr.es/m/31920.1562526703@sss.pgh.pa.us
* Use column collation for extended statisticsTomas Vondra2019-07-20
| | | | | | | | | | | | | | | | | | | The current extended statistics code was a bit confused which collation to use. When building the statistics, the collations defined as default for the data types were used (since commit 5e0928005). The MCV code was however using the column collations for MCV serialization, and then DEFAULT_COLLATION_OID when computing estimates. So overall the code was using all three possible options, inconsistently. This uses the column colation everywhere - this makes it consistent with what 5e0928005 did for regular stats. We however do not track the collations in a catalog, because we can derive them from column-level information. This may need to change in the future, e.g. after allowing statistics on expressions. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu Backpatch-to: 12
* Rework examine_opclause_expression to use varonleftTomas Vondra2019-07-20
| | | | | | | | | | | | | | | | | | | | | | | The examine_opclause_expression function needs to return information on which side of the operator we found the Var, but the variable was called "isgt" which is rather misleading (it assumes the operator is either less-than or greater-than, but it may be equality or something else). Other places in the planner use a variable called "varonleft" for this purpose, so just adopt the same convention here. The code also assumed we don't care about this flag for equality, as (Var = Const) and (Const = Var) should be the same thing. But that does not work for cross-type operators, in which case we need to pass the parameters to the procedure in the right order. So just use the same code for all types of expressions. This means we don't need to care about the selectivity estimation function anymore, at least not in this code. We should only get the supported cases here (thanks to statext_is_compatible_clause). Reviewed-by: Tom Lane Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu Backpatch-to: 12
* Fix error in commit e6feef57.Jeff Davis2019-07-18
| | | | | | | I was careless passing a datum directly to DATE_NOT_FINITE without calling DatumGetDateADT() first. Backpatch-through: 9.4
* Fix daterange canonicalization for +/- infinity.Jeff Davis2019-07-18
| | | | | | | | | | | | | | | | | | | The values 'infinity' and '-infinity' are a part of the DATE type itself, so a bound of the date 'infinity' is not the same as an unbounded/infinite range. However, it is still wrong to try to canonicalize such values, because adding or subtracting one has no effect. Fix by treating 'infinity' and '-infinity' the same as unbounded ranges for the purposes of canonicalization (but not other purposes). Backpatch to all versions because it is inconsistent with the documented behavior. Note that this could be an incompatibility for applications relying on the behavior contrary to the documentation. Author: Laurenz Albe Reviewed-by: Thomas Munro Discussion: https://postgr.es/m/77f24ea19ab802bc9bc60ddbb8977ee2d646aec1.camel%40cybertec.at Backpatch-through: 9.4
* Fix nbtree metapage cache upgrade bug.Peter Geoghegan2019-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 857f9c36cda, which taught nbtree VACUUM to avoid unnecessary index scans, bumped the nbtree version number from 2 to 3, while adding the ability for nbtree indexes to be upgraded on-the-fly. Various assertions that assumed that an nbtree index was always on version 2 had to be changed to accept any supported version (version 2 or 3 on Postgres 11). However, a few assertions were missed in the initial commit, all of which were in code paths that cache a local copy of the metapage metadata, where the index had been expected to be on the current version (no longer version 2) as a generic sanity check. Rather than simply update the assertions, follow-up commit 0a64b45152b intentionally made the metapage caching code update the per-backend cached metadata version without changing the on-disk version at the same time. This could even happen when the planner needed to determine the height of a B-Tree for costing purposes. The assertions only fail on Postgres v12 when upgrading from v10, because they were adjusted to use the authoritative shared memory metapage by v12's commit dd299df8. To fix, remove the cache-only upgrade mechanism entirely, and update the assertions themselves to accept any supported version (go back to using the cached version in v12). The fix is almost a full revert of commit 0a64b45152b on the v11 branch. VACUUM only considers the authoritative metapage, and never bothers with a locally cached version, whereas everywhere else isn't interested in the metapage fields that were added by commit 857f9c36cda. It seems unlikely that this bug has affected any user on v11. Reported-By: Christoph Berg Bug: #15896 Discussion: https://postgr.es/m/15896-5b25e260fdb0b081%40postgresql.org Backpatch: 11-, where VACUUM was taught to avoid unnecessary index scans.
* Simplify bitmap updates in multivariate MCV codeTomas Vondra2019-07-18
| | | | | | | | | | | | | | | | | When evaluating clauses on a multivariate MCV list, we build a bitmap tracking how the clauses match each item of the MCV list. When updating the bitmap we need to consider the current value (tracking how the item matches preceding clauses), match for the current clause and whether the clauses are connected by AND or OR. Until now the logic was copied on every place updating the bitmap, which was not quite readable. So just move it to a separate function and call it where needed. Backpatch to 12, where the code was introduced. While not a bugfix, this should make maintenance and future backpatches easier. Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
* Fix handling of NULLs in MCV items and constantsTomas Vondra2019-07-18
| | | | | | | | | | | | | | | | | | | There were two issues in how the extended statistics handled NULL values in opclauses. Firstly, the code was oblivious to the possibility that Const may be NULL (constisnull=true) in which case the constvalue is undefined. We need to treat this as a mismatch, and not call the proc. Secondly, the MCV item itself may contain NULL values too - the code already did check that, and updated the match bitmap accordingly, but failed to ensure we won't call the operator procedure anyway. It did work for AND-clauses, because in that case false in the bitmap stops evaluation of further clauses. But for OR-clauses ir was not easy to get incorrect estimates or even trigger a crash. This fixes both issues by extending the existing check so that it looks at constisnull too, and making sure it skips calling the procedure. Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu
* Fix handling of opclauses in extended statisticsTomas Vondra2019-07-18
| | | | | | | | | | | | | | | | | We expect opclauses to have exactly one Var and one Const, but the code was checking the Const by calling is_pseudo_constant_clause() which is incorrect - we need a proper constant. Fixed by using plain IsA(x,Const) to check type of the node. We need to do these checks in two places, so move it into a separate function that can be called in both places. Reported by Andreas Seltenreich, based on crash reported by sqlsmith. Backpatch to v12, where this code was introduced. Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu Backpatch-to: 12
* Remove unnecessary TYPECACHE_GT_OPR lookupTomas Vondra2019-07-18
| | | | | | | | | | | The TYPECACHE_GT_OPR is not needed (it used to be in older version of the MCV code), but the compiler failed to detect this as the result was used in a fmgr_info() call, populating a FmgrInfo entry. Backpatch to v12, where this code was introduced. Discussion: https://postgr.es/m/8736jdhbhc.fsf%40ansel.ydns.eu Backpatch-to: 12
* Fix thinko in construction of old_conpfeqop list.Tom Lane2019-07-16
| | | | | | | | | | | | | | | | | | | | | | | This should lappend the OIDs, not lcons them; the existing code produced a list in reversed order. This is harmless for single-key FKs or FKs where all the key columns are of the same type, which probably explains how it went unnoticed. But if those conditions are not met, ATAddForeignKeyConstraint would make the wrong decision about whether an existing FK needs to be revalidated. I think it would almost always err in the safe direction by revalidating a constraint that didn't need it. You could imagine scenarios where the pfeqop check was fooled by swapping the types of two FK columns in one ALTER TABLE, but that case would probably be rejected by other tests, so it might be impossible to get to the worst-case scenario where an FK should be revalidated and isn't. (And even then, it's likely to be fine, unless there are weird inconsistencies in the equality behavior of the replacement types.) However, this is a performance bug at least. Noted while poking around to see whether lcons calls could be converted to lappend. This bug is old, dating to commit cb3a7c2b9, so back-patch to all supported branches.
* Correct nbtsplitloc.c comment.Peter Geoghegan2019-07-15
| | | | | | | | | | The logic just added by commit e3899ffd falls back on a 50:50 page split in the event of a new item that's just to the right of our provisional "many duplicates" split point. Fix a comment that incorrectly claimed that the new item had to be just to the left of our provisional split point. Backpatch: 12-, just like commit e3899ffd.
* Fix pathological nbtree split point choice issue.Peter Geoghegan2019-07-15
| | | | | | | | | | | | | | | | | | | | | | Specific ever-decreasing insertion patterns could cause successive unbalanced nbtree page splits. Problem cases involve a large group of duplicates to the left, and ever-decreasing insertions to the right. To fix, detect the situation by considering the newitem offset before performing a split using nbtsplitloc.c's "many duplicates" strategy. If the new item was inserted just to the right of our provisional "many duplicates" split point, infer ever-decreasing insertions and fall back on a 50:50 (space delta optimal) split. This seems to barely affect cases that already had acceptable space utilization. An alternative fix also seems possible. Instead of changing nbtsplitloc.c split choice logic, we could instead teach _bt_truncate() to generate a new value for new high keys by interpolating from the lastleft and firstright key values. That would certainly be a more elegant fix, but it isn't suitable for backpatching. Discussion: https://postgr.es/m/CAH2-WznCNvhZpxa__GqAa1fgQ9uYdVc=_apArkW2nc-K3O7_NA@mail.gmail.com Backpatch: 12-, where the nbtree page split enhancements were introduced.
* Fix get_actual_variable_range() to cope with broken HOT chains.Tom Lane2019-07-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3ca930fc3 modified get_actual_variable_range() to use a new "SnapshotNonVacuumable" snapshot type for selecting tuples that it would consider valid. However, because that snapshot type can accept recently-dead tuples, this caused a bug when using a recently-created index: we might accept a recently-dead tuple that is an early member of a broken HOT chain and does not actually match the index entry. Then, the data extracted from the heap tuple would not necessarily be an endpoint value of the column; it could even be NULL, leading to get_actual_variable_range() itself reporting "found unexpected null value in index". Even without an error, this could lead to poor plan choices due to an erroneous notion of the endpoint value. We can improve matters by changing the code to use the index-only scan technique (which didn't exist when get_actual_variable_range was originally written). If any of the tuples in a HOT chain are live enough to satisfy SnapshotNonVacuumable, we take the data from the index entry, ignoring what is in the heap. This fixes the problem without changing the live-vs-dead-tuple behavior from what was intended by commit 3ca930fc3. A side benefit is that for static tables we might not have to touch the heap at all (when the extremal value is in an all-visible page). In addition, we can save some overhead by not having to create a complete ExecutorState, and we don't need to run FormIndexDatum, avoiding more cycles as well as the possibility of failure for indexes on expressions. (I'm not sure that this code would ever be used to determine the extreme value of an expression, in the current state of the planner; but it's definitely possible that lower-order columns of the selected index could be expressions. So one could construct perhaps-artificial examples in which the old code unexpectedly failed due to trying to compute an expression's value for a now-dead row.) Per report from Manuel Rigger. Back-patch to v11 where commit 3ca930fc3 came in. Discussion: https://postgr.es/m/CA+u7OA7W4NWEhCvftdV6_8bbm2vgypi5nuxfnSEJQqVKFSUoMg@mail.gmail.com
* Fix RANGE partition pruning with multiple boolean partition keysDavid Rowley2019-07-12
| | | | | | | | | | | | | | | | | | | | | | | | | | match_clause_to_partition_key incorrectly would return PARTCLAUSE_UNSUPPORTED if a bool qual could not be matched to the current partition key. This was a problem, as it causes the calling function to discard the qual and not try to match it to any other partition key. If there was another partition key which did match this qual, then the qual would not be checked again and we could fail to prune some partitions. The worst this could do was to cause partitions not to be pruned when they could have been, so there was no danger of incorrect query results here. Fix this by changing match_boolean_partition_clause to have it return a PartClauseMatchStatus rather than a boolean value. This allows it to communicate if the qual is unsupported or if it just does not match this particular partition key, previously these two cases were treated the same. Now, if match_clause_to_partition_key is unable to match the qual to any other qual type then we can simply return the value from the match_boolean_partition_clause call so that the calling function properly treats the qual as either unmatched or unsupported. Reported-by: Rares Salcudean Reviewed-by: Amit Langote Backpatch-through: 11 where partition pruning was introduced Discussion: https://postgr.es/m/CAHp_FN2xwEznH6oyS0hNTuUUZKp5PvegcVv=Co6nBXJ+mC7Y5w@mail.gmail.com
* Fix variable initialization when using buffering build with GiSTMichael Paquier2019-07-10
| | | | | | | | | | | | | | | | This can cause valgrind to complain, as the flag marking a buffer as a temporary copy was not getting initialized. While on it, fill in with zeros newly-created buffer pages. This does not matter when loading a block from a temporary file, but it makes the push of an index tuple into a new buffer page safer. This has been introduced by 1d27dcf, so backpatch all the way down to 9.4. Author: Alexander Lakhin Discussion: https://postgr.es/m/15899-0d24fb273b3dd90c@postgresql.org Backpatch-through: 9.4
* Fix missing calls to table_finish_bulk_insert during COPY, take 2David Rowley2019-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 86b85044e abstracted calls to heap functions in COPY FROM to support a generic table AM. However, when performing a copy into a partitioned table, this commit neglected to call table_finish_bulk_insert for each partition. Before 86b85044e, when we always called the heap functions, there was no need to call heapam_finish_bulk_insert for partitions since it only did any work when performing a copy without WAL. For partitioned tables, this was unsupported anyway, so there was no issue. With pluggable storage, we can't make any assumptions about what the table AM might want to do in its equivalent function, so we'd better ensure we always call table_finish_bulk_insert each partition that's received a row. For now, we make the table_finish_bulk_insert call whenever we evict a CopyMultiInsertBuffer out of the CopyMultiInsertInfo. This does mean that it's possible that we call table_finish_bulk_insert multiple times per partition, which is not a problem other than being an inefficiency. Improving this requires a more invasive patch, so let's leave that for another day. This also changes things so that we no longer needlessly call table_finish_bulk_insert when performing a COPY FROM for a non-partitioned table when not using multi-inserts. Reported-by: Robert Haas Backpatch-through: 12 Discussion: https://postgr.es/m/CA+TgmoYK=6BpxiJ0tN-p9wtH0BTAfbdxzHhwou0mdud4+BkYuQ@mail.gmail.com
* Pass QueryEnvironment down to EvalPlanQual's EState.Thomas Munro2019-07-10
| | | | | | | | | | | Otherwise the executor can't see trigger transition tables during EPQ evaluation. Fixes bug #15900 and almost certainly also #15720. Back-patch to 10, where trigger transition tables landed. Author: Alex Aktsipetrov Reviewed-by: Thomas Munro, Tom Lane Discussion: https://postgr.es/m/15900-bc482754fe8d7415%40postgresql.org Discussion: https://postgr.es/m/15720-38c2b29e5d720187%40postgresql.org
* Propagate trigger arguments to partitionsAlvaro Herrera2019-07-09
| | | | | | | | | | | | | We were creating the cloned triggers with an empty list of arguments, losing the ones that had been specified by the user when creating the trigger in the partitioned table. Repair. This was forgotten in commit 86f575948c77. Author: Patrick McHardy Reviewed-by: Tomas Vondra Discussion: https://postgr.es/m/20190709130027.amr2cavjvo7rdvac@access1.trash.net Discussion: https://postgr.es/m/15752-123bc90287986de4@postgresql.org
* Add missing assertions for required table am callbacks.Amit Kapila2019-07-06
| | | | | | | | Reported-by: Ashwin Agrawal Author: Ashwin Agrawal Reviewed-by: Amit Kapila Backpatch-through: 12, where it was introduced Discussion: https://postgr.es/m/CALfoeisgdZhYDrJOukaBzvXfJOK2FQ0szVMK7dzmcy6w93iDUA@mail.gmail.com
* Remove unused variable in statext_mcv_serialize()Tomas Vondra2019-07-05
| | | | | | | | | | | The itemlen variable used to be referenced in multiple places, but since reworking the serialization code it's used only in one assert. Fixed by removing the variable and calling the macro from the assert directly. Backpatch to 12, where this code was introduced. Reported-by: Jeff Janes Discussion: https://postgr.es/m/CAMkU=1zc_ovH9NZd_9ovuiEWkF9yX06URUDdXCmgDydf-bqB5A@mail.gmail.com
* Simplify pg_mcv_list (de)serializationTomas Vondra2019-07-05
| | | | | | | | | | | | | | | | | | | | The serialization format of multivariate MCV lists included alignment in order to allow direct access to part of the serialized data, but despite multiple fixes (see for example commits d85e0f366a and ea4e1c0e8f) this proved to be problematic. This commit abandons alignment in the serialized format, and just copies everything during deserialization. We now also track amount of memory needed after deserialization (including alignment), which allows us to deserialize the MCV list in a single pass. Bump catversion, as this affects contents of pg_statistic_ext_data. Backpatch to 12, where multi-column MCV lists were introduced. Author: Tomas Vondra Reviewed-by: Tom Lane Discussion: https://postgr.es/m/2201.1561521148@sss.pgh.pa.us
* Fix pg_mcv_list_items() to produce text[]Tomas Vondra2019-07-05
| | | | | | | | | | | | | | | | | | The function pg_mcv_list_items() returns values stored in MCV items. The items may contain columns with different data types, so the function was generating text array-like representation, but in an ad-hoc way without properly escaping various characters etc. Fixed by simply building a text[] array, which also makes it easier to use from queries etc. Requires changes to pg_proc entry, so bump catversion. Backpatch to 12, where multi-column MCV lists were introduced. Author: Tomas Vondra Reviewed-by: Dean Rasheed Discussion: https://postgr.es/m/20190618205920.qtlzcu73whfpfqne@development
* Speed-up build of MCV lists with many distinct valuesTomas Vondra2019-07-05
| | | | | | | | | | | | | | | | | When building multi-column MCV lists, we compute base frequency for each item, i.e. a product of per-column frequencies for values from the item. As a value may be in multiple groups, the code was scanning the whole array of groups while adding items to the MCV list. This works fine as long as the number of distinct groups is small, but it's easy to trigger trigger O(N^2) behavior, especially after increasing statistics target. This commit precomputes frequencies for values in all columns, so that when computing the base frequency it's enough to make a simple bsearch lookup in the array. Backpatch to 12, where multi-column MCV lists were introduced. Discussion: https://postgr.es/m/20190618205920.qtlzcu73whfpfqne@development
* Don't remove surplus columns from GROUP BY for inheritance parentsDavid Rowley2019-07-03
| | | | | | | | | | | | | | | | | | | d4c3a156c added code to remove columns that were not part of a table's PRIMARY KEY constraint from the GROUP BY clause when all the primary key columns were present in the group by. This is fine to do since we know that there will only be one row per group coming from this relation. However, the logic failed to consider inheritance parent relations. These can have child relations without a primary key, but even if they did, they could duplicate one of the parent's rows or one from another child relation. In this case, those additional GROUP BY columns are required. Fix this by disabling the optimization for inheritance parent tables. In v11 and beyond, partitioned tables are fine since partitions cannot overlap and before v11 partitioned tables could not have a primary key. Reported-by: Manuel Rigger Discussion: http://postgr.es/m/CA+u7OA7VLKf_vEr6kLF3MnWSA9LToJYncgpNX2tQ-oWzYCBQAw@mail.gmail.com Backpatch-through: 9.6
* pgindent run prior to branching v12.Tom Lane2019-07-01
| | | | | pgperltidy and reformat-dat-files too, though the latter didn't find anything to change.
* Revert fix missing call to table_finish_bulk_insert during COPYDavid Rowley2019-07-02
| | | | | This reverts commits 4de60244e and b2d69806d. Further thought is required to make this work properly.
* Remove surplus call to table_finish_bulk_insertDavid Rowley2019-07-02
| | | | | | | | | | | 4de60244e added the call to table_finish_bulk_insert to the CopyMultiInsertBufferCleanup function. We use a CopyMultiInsertBuffer even for non-partitioned tables, so having the cleanup do that meant we would call table_finsh_bulk_insert twice when performing COPY FROM with a non-partitioned table. Here we can just remove the direct call in CopyFrom and let CopyMultiInsertBufferCleanup handle the call instead.
* Fix missing call to table_finish_bulk_insert during COPYDavid Rowley2019-07-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | 86b85044e abstracted calls to heap functions in COPY FROM to support a generic table AM. However, when performing a copy into a partitioned table, this commit neglected to call table_finish_bulk_insert for each partition. Before 86b85044e, when we always called the heap functions, there was no need to call heapam_finish_bulk_insert for partitions since it only did any work when performing a copy without WAL. For partitioned tables, this was unsupported anyway, so there was no issue. With pluggable storage, we can't make any assumptions about what the table AM might want to do in its equivalent function, so we'd better ensure we always call table_finish_bulk_insert each partition that's received a row. For now, we make the table_finish_bulk_insert call whenever we evict a CopyMultiInsertBuffer out of the CopyMultiInsertInfo. This does mean that it's possible that we call table_finish_bulk_insert multiple times per partition, which is not a problem other than being an inefficiency. Improving this requires a more invasive patch, so let's leave that for another day. In passing, move the table_finish_bulk_insert for the target of the COPY command so that it's only called when we're actually performing bulk inserts. We don't need to call this when inserting 1 row at a time. Reported-by: Robert Haas Discussion: https://postgr.es/m/CA+TgmoYK=6BpxiJ0tN-p9wtH0BTAfbdxzHhwou0mdud4+BkYuQ@mail.gmail.com
* Add missing serial commasPeter Eisentraut2019-07-01
|
* Fix many typos and inconsistenciesMichael Paquier2019-07-01
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/af27d1b3-a128-9d62-46e0-88f424397f44@gmail.com
* Don't read fields of a misaligned ExpandedObjectHeader or AnyArrayType.Noah Misch2019-06-30
| | | | | | | | | | | UBSan complains about this. Instead, cast to a suitable type requiring only 4-byte alignment. DatumGetAnyArrayP() already assumes one can cast between AnyArrayType and ArrayType, so this doesn't introduce a new assumption. Back-patch to 9.5, where AnyArrayType was introduced. Reviewed by Tom Lane. Discussion: https://postgr.es/m/20190629210334.GA1244217@rfd.leadboat.com
* Repair logic for reordering grouping sets optimization.Andrew Gierth2019-06-30
| | | | | | | | | | | | | | The logic in reorder_grouping_sets to order grouping set elements to match a pre-specified sort ordering was defective, resulting in unnecessary sort nodes (though the query output would still be correct). Repair, simplifying the code a little, and add a test. Per report from Richard Guo, though I didn't use their patch. Original bug seems to have been my fault. Backpatch back to 9.5 where grouping sets were introduced. Discussion: https://postgr.es/m/CAN_9JTzyjGcUjiBHxLsgqfk7PkdLGXiM=pwM+=ph2LsWw0WO1A@mail.gmail.com
* Fix breakage introduced in pg_lsn_in()Peter Eisentraut2019-06-30
| | | | | Using PG_RETURN_LSN() from non-fmgr pg_lsn_in_internal() happened to work on some platforms, but should just be a plain "return".