aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* Remove obsolete replication settings within TAP tests.Tom Lane2020-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PostgresNode.pm set "max_wal_senders = 5" for replication testing, but this seems to be slightly too low for our current test suite. Slower buildfarm members frequently report "number of requested standby connections exceeds max_wal_senders" failures, due to old walsenders not exiting instantaneously. Usually, the test does not fail overall because of automatic walreceiver restart, but sometimes the failure becomes visible; and in any case such retries slow down the test. That value came in with commit 89ac7004d, but was soon obsoleted by f6d6d2920, which raised the built-in default from zero to 10; so that PostgresNode.pm is actually setting it to less than the conservative built-in default. That seems pretty pointless, so let's remove the special setting and let the default prevail, in hopes of making the TAP tests more robust. Likewise, the setting "max_replication_slots = 5" is obsolete and can be removed. While here, reverse-engineer a comment about why we're choosing less-than-default values for some other settings. (Note: before v12, max_wal_senders counted against max_connections so that the latter setting also needs some fiddling with.) Back-patch to v10 where the subscription tests were added. It's likely that the older branches aren't pushing the boundaries of max_wal_senders, but I'm disinclined to spend time trying to figure out exactly when it started to be a problem. Discussion: https://postgr.es/m/723911.1601417626@sss.pgh.pa.us
* Fix memory leak in plpgsql's CALL processing.Tom Lane2020-09-29
| | | | | | | | | | | | | | | | | | | | | When executing a CALL or DO in a non-atomic context (i.e., not inside a function or query), plpgsql creates a new plan each time through, as a rather hacky solution to some resource management issues. But it failed to free this plan until exit of the current procedure or DO block, resulting in serious memory bloat in procedures that called other procedures many times. Fix by remembering to free the plan, and by being more honest about restoring the previous state (otherwise, recursive procedure calls have a problem). There was also a smaller leak associated with recalculation of the "target" list of output variables. Fix that by using the statement- lifespan context to hold non-permanent values. Back-patch to v11 where procedures were introduced. Pavel Stehule and Tom Lane Discussion: https://postgr.es/m/CAFj8pRDiiU1dqym+_P4_GuTWm76knJu7z9opWayBJTC0nQGUUA@mail.gmail.com
* Archive timeline history files in standby if archive_mode is set to "always".Fujii Masao2020-09-29
| | | | | | | | | | | | | | | | | | | | Previously the standby server didn't archive timeline history files streamed from the primary even when archive_mode is set to "always", while it archives the streamed WAL files. This could cause the PITR to fail because there was no required timeline history file in the archive. The cause of this issue was that walreceiver didn't mark those files as ready for archiving. This commit makes walreceiver mark those streamed timeline history files as ready for archiving if archive_mode=always. Then the archiver process archives the marked timeline history files. Back-patch to all supported versions. Reported-by: Grigory Smolkin Author: Grigory Smolkin, Fujii Masao Reviewed-by: David Zhang, Anastasia Lubennikova Discussion: https://postgr.es/m/54b059d4-2b48-13a4-6f43-95a087c92367@postgrespro.ru
* Fix progress reporting of REINDEX CONCURRENTLYMichael Paquier2020-09-29
| | | | | | | | | | | | | | | | This addresses a couple of issues with the so-said subject: - Report the correct parent relation with the index actually being rebuilt or validated. Previously, the command status remained set to the last index created for the progress of the index build and validation, which would be incorrect when working on a table that has more than one index. - Use the correct phase when waiting before the drop of the old indexes. Previously, this was reported with the same status as when waiting before the old indexes are marked as dead. Author: Matthias van de Meent, Michael Paquier Discussion: https://postgr.es/m/CAEze2WhqFgcwe1_tv=sFYhLWV2AdpfukumotJ6JNcAOQs3jufg@mail.gmail.com Backpatch-through: 12
* Assign collations in partition bound expressions.Tom Lane2020-09-28
| | | | | | | | | | Failure to do this can result in errors during evaluation of the bound expression, as illustrated by the new regression test. Back-patch to v12 where the ability for partition bounds to be expressions was added. Discussion: https://postgr.es/m/CAJV4CdrZ5mKuaEsRSbLf2URQ3h6iMtKD=hik8MaF5WwdmC9uZw@mail.gmail.com
* Revise RelationBuildRowSecurity() to avoid memory leaks.Tom Lane2020-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | This function leaked some memory while loading qual clauses for an RLS policy. While ordinarily negligible, that could build up in some repeated-reload cases, as reported by Konstantin Knizhnik. We can improve matters by borrowing the coding long used in RelationBuildRuleLock: build stringToNode's result directly in the target context, and remember to explicitly pfree the input string. This patch by no means completely guarantees zero leaks within this function, since we have no real guarantee that the catalog- reading subroutines it calls don't leak anything. However, practical tests suggest that this is enough to resolve the issue. In any case, any remaining leaks are similar to those risked by RelationBuildRuleLock and other relcache-loading subroutines. If we need to fix them, we should adopt a more global approach such as that used by the RECOVER_RELATION_BUILD_MEMORY hack. While here, let's remove the need for an expensive PG_TRY block by using MemoryContextSetParent to reparent an initially-short-lived context for the RLS data. Back-patch to all supported branches. Discussion: https://postgr.es/m/21356c12-8917-8249-b35f-1c447231922b@postgrespro.ru
* Fix handling of -d "connection string" in pg_dump/pg_restore.Tom Lane2020-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Parallel pg_dump failed if its -d parameter was a connection string containing any essential information other than host, port, or username. The same was true for pg_restore with --create. The reason is that these scenarios failed to preserve the connection string from the command line; the code felt free to replace that with just the database name when reconnecting from a pg_dump parallel worker or after creating the target database. By chance, parallel pg_restore did not suffer this defect, as long as you didn't say --create. In practice it seems that the error would be obvious only if the connstring included essential, non-default SSL or GSS parameters. This may explain why it took us so long to notice. (It also makes it very difficult to craft a regression test case illustrating the problem, since the test would fail in builds without those options.) Fix by refactoring so that ConnectDatabase always receives all the relevant options directly from the command line, rather than reconstructed values. Inject a different database name, when necessary, by relying on libpq's rules for handling multiple "dbname" parameters. While here, let's get rid of the essentially duplicate _connectDB function, as well as some obsolete nearby cruft. Per bug #16604 from Zsolt Ero. Back-patch to all supported branches. Discussion: https://postgr.es/m/16604-933f4b8791227b15@postgresql.org
* Fix missing fsync of SLRU directories.Thomas Munro2020-09-24
| | | | | | | | | | | | | Harmonize behavior by moving reponsibility for fsyncing directories down into slru.c. In 10 and later, only the multixact directories were missed (see commit 1b02be21), and in older branches all SLRUs were missed. Back-patch to all supported releases. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CA%2BhUKGLtsTUOScnNoSMZ-2ZLv%2BwGh01J6kAo_DM8mTRq1sKdSQ%40mail.gmail.com
* Avoid possible dangling-pointer access in tsearch_readline_callback.Tom Lane2020-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | tsearch_readline() saves the string pointer it returns to the caller for possible use in the associated error context callback. However, the caller will usually pfree that string sometime before it next calls tsearch_readline(), so that there is a window where an ereport will try to print an already-freed string. The built-in users of tsearch_readline() happen to all do that pfree at the bottoms of their loops, so that the window is effectively empty for them. However, this is not documented as a requirement, and contrib/dict_xsyn doesn't do it like that, so it seems likely that third-party dictionaries might have live bugs here. The practical consequences of this seem pretty limited in any case, since production builds wouldn't clobber the freed string immediately, besides which you'd not expect syntax errors in dictionary files being used in production. Still, it's clearly a bug waiting to bite somebody. Fix by pstrdup'ing the string to be saved for the error callback, and then pfree'ing it next time through. It's been like this for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/48A4FA71-524E-41B9-953A-FD04EF36E2E7@yesql.se
* Fix whitespacePeter Eisentraut2020-09-20
|
* Use factorial rather than numeric_fac in create_operator.sql.Tom Lane2020-09-18
| | | | | | | | | | | | These two SQL functions are aliases for the same C function, so this change has no semantic effect. However, because we dropped the numeric_fac alias in HEAD (commit 76f412ab3), operator definitions based on that one don't port forward, causing problems for cross-version upgrade tests based on the regression database. Patch all active back branches to dodge the problem. Discussion: https://postgr.es/m/449144.1600439950@sss.pgh.pa.us
* Update parallel BTree scan state when the scan keys can't be satisfied.Amit Kapila2020-09-17
| | | | | | | | | | | | | | For parallel btree scan to work for array of scan keys, it should reach BTPARALLEL_DONE state once for every distinct combination of array keys. This is required to ensure that the parallel workers don't try to seize blocks at the same time for different scan keys. We missed to update this state when we discovered that the scan keys can't be satisfied. Author: James Hunter Reviewed-by: Amit Kapila Tested-by: Justin Pryzby Backpatch-through: 10, where it was introduced Discussion: https://postgr.es/m/4248CABC-25E3-4809-B4D0-128E1BAABC3C@amazon.com
* Avoid unnecessary recursion to child tables in ALTER TABLE SET NOT NULL.Tom Lane2020-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a partitioned table's column is already marked NOT NULL, there is no need to examine its partitions, because we can rely on previous DDL to have enforced that the child columns are NOT NULL as well. (Unfortunately, the same cannot be said for traditional inheritance, so for now we have to restrict the optimization to partitioned tables.) Hence, we may skip recursing to child tables in this situation. The reason this case is worth worrying about is that when pg_dump dumps a partitioned table having a primary key, it will include the requisite NOT NULL markings in the CREATE TABLE commands, and then add the primary key as a separate step. The primary key addition generates a SET NOT NULL as a subcommand, just to be sure. So the situation where a SET NOT NULL is redundant does arise in the real world. Skipping the recursion does more than just save a few cycles: it means that a command such as "ALTER TABLE ONLY partition_parent ADD PRIMARY KEY" will take locks only on the partition parent table, not on the partitions. It turns out that parallel pg_restore is effectively assuming that that's true, and has little choice but to do so because the dependencies listed for such a TOC entry don't include the partitions. pg_restore could thus issue this ALTER while data restores on the partitions are still in progress. Taking unnecessary locks on the partitions not only hurts concurrency, but can lead to actual deadlock failures, as reported by Domagoj Smoljanovic. (A contributing factor in the deadlock is that TRUNCATE on a child partition wants a non-exclusive lock on the parent. This seems likewise unnecessary, but the fix for it is more invasive so we won't consider back-patching it. Fortunately, getting rid of one of these two poor behaviors is enough to remove the deadlock.) Although support for partitioned primary keys came in with v11, this patch is dependent on the SET NOT NULL refactoring done by commit f4a3fdfbd, so we can only patch back to v12. Patch by me; thanks to Alvaro Herrera and Amit Langote for review. Discussion: https://postgr.es/m/VI1PR03MB31670CA1BD9625C3A8C5DD05EB230@VI1PR03MB3167.eurprd03.prod.outlook.com
* Fix bogus cache-invalidation logic in logical replication worker.Tom Lane2020-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | The code recorded cache invalidation events by zeroing the "localreloid" field of affected cache entries. However, it's possible for an inval event to occur even while we have the entry open and locked. So an ill-timed inval could result in "cache lookup failed for relation 0" errors, if the worker's code tried to use the cleared field. We can fix that by creating a separate bool field to record whether the entry needs to be revalidated. (In the back branches, cram the bool into what had been padding space, to avoid an ABI break in the somewhat unlikely event that any extension is looking at this struct.) Also, rearrange the logic in logicalrep_rel_open so that it does the right thing in cases where table_open would fail. We should retry the lookup by name in that case, but we didn't. The real-world impact of this is probably small. In the first place, the error conditions are very low probability, and in the second place, the worker would just exit and get restarted. We only noticed because in a CLOBBER_CACHE_ALWAYS build, the failure can occur repeatedly, preventing the worker from making progress. Nonetheless, it's clearly a bug, and it impedes a useful type of testing; so back-patch to v10 where this code was introduced. Discussion: https://postgr.es/m/1032727.1600096803@sss.pgh.pa.us
* Fix interpolation in test name.Noah Misch2020-09-13
| | | | | | | A pre-commit review had reported the problem, but the fix reached only v10 and earlier. Back-patch to v11. Discussion: https://postgr.es/m/20200423.140546.1055476118690602079.horikyota.ntt@gmail.com
* Use the properly transformed RangeVar for expandTableLikeClause().Tom Lane2020-09-13
| | | | | | | | | | | | | | | transformCreateStmt() adjusts the transformed statement's RangeVar to specify the target schema explicitly, for the express reason of making sure that auxiliary statements derived by parse transformation operate on the right table. But the refactoring I did in commit 502898192 got this wrong and passed the untransformed RangeVar to expandTableLikeClause(). This could lead to assertion failures or weird misbehavior if the wrong table was accessed. Per report from Alexander Lakhin. Like the previous patch, back-patch to all supported branches. Discussion: https://postgr.es/m/05051f9d-b32b-cb35-6735-0e9f2ab86b5f@gmail.com
* Use _exit(2) for SIGQUIT during ProcessStartupPacket, too.Tom Lane2020-09-10
| | | | | | | | | | | | | | | | | | | | | Bring the signal handling for startup-packet collection into line with the policy established in commits bedadc732 and 8e19a8264, namely don't risk running atexit callbacks when handling SIGQUIT. Ideally, we'd not do so for SIGTERM or timeout interrupts either, but that change seems a bit too risky for the back branches. For now, just improve the comments in this area to describe the risk. Also relocate where BackendInitialize re-disables these interrupts, to minimize the code span where they're active. This doesn't buy a whole lot of safety, but it can't hurt. In passing, rename startup_die() to remove confusion about whether it is for the startup process. Like the previous commits, back-patch to all supported branches. Discussion: https://postgr.es/m/1850884.1599601164@sss.pgh.pa.us
* doc: Fix some grammar and inconsistenciesMichael Paquier2020-09-10
| | | | | | | | Some comments are fixed while on it. Author: Justin Pryzby Discussion: https://postgr.es/m/20200818171702.GK17022@telsasoft.com Backpatch-through: 9.6
* Make archiver's SIGQUIT handler exit via _exit().Tom Lane2020-09-09
| | | | | | | | | | | | | | | | | | Commit 8e19a8264 changed the SIGQUIT handlers of almost all server processes not to run atexit callbacks. The archiver process was skipped, perhaps because it's not connected to shared memory; but it's just as true here that running atexit callbacks in a signal handler is unsafe. So let's make it work like the rest. In HEAD and v13, we can use the common SignalHandlerForCrashExit handler. Before that, just tweak pgarch_exit to use _exit(2) explicitly. Like the previous commit, back-patch to all supported branches. Kyotaro Horiguchi, back-patching by me Discussion: https://postgr.es/m/1850884.1599601164@sss.pgh.pa.us
* Check default partitions constraints while descendingAlvaro Herrera2020-09-08
| | | | | | | | | | | | | | | | | | | | | | | Partitioning tuple route code assumes that the partition chosen while descending the partition hierarchy is always the correct one. This is true except when the partition is the default partition and another partition has been added concurrently: the partition constraint changes and we don't recheck it. This can lead to tuples mistakenly being added to the default partition that should have been rejected. Fix by rechecking the default partition constraint while descending the hierarchy. An isolation test based on the reproduction steps described by Hao Wu (with tweaks for extra coverage) is included. Backpatch to 12, where this bug came in with 898e5e3290a7. Reported by: Hao Wu <hawu@vmware.com> Author: Amit Langote <amitlangote09@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CA+HiwqFqBmcSSap4sFnCBUEL_VfOMmEKaQ3gwUhyfa4c7J_-nA@mail.gmail.com Discussion: https://postgr.es/m/DM5PR0501MB3910E97A9EDFB4C775CF3D75A42F0@DM5PR0501MB3910.namprd05.prod.outlook.com
* Fix misleading error message about inconsistent moving-aggregate types.Tom Lane2020-09-06
| | | | | | | | | | | | | We reported the wrong types when complaining that an aggregate's moving-aggregate implementation is inconsistent with its regular implementation. This was wrong since the feature was introduced, so back-patch to all supported branches. Jeff Janes Discussion: https://postgr.es/m/CAMkU=1x808LH=LPhZp9mNSP0Xd1xDqEd+XeGcvEe48dfE6xV=A@mail.gmail.com
* Remove useless lstat() call in pg_rewind.Tom Lane2020-09-06
| | | | | | | | | | | | | | | This is duplicative of an lstat that was just done by the calling function (traverse_datadir), besides which we weren't really doing anything with the results. There's not much point in checking to see if someone removed the file since the previous lstat, since the FILE_ACTION_REMOVE code would have to deal with missing-file cases anyway. Moreover, the "exists = false" assignment was a dead store; nothing was done with that value later. A syscall saved is a syscall earned, so back-patch to 9.5 where this code was introduced. Discussion: https://postgr.es/m/1221796.1599329320@sss.pgh.pa.us
* Make new authentication test case more robust.Tom Lane2020-09-04
| | | | | | | | | | | I happened to notice that the new test case I added in b55b4dad9 falls over if one runs "make check" repeatedly; though not in branches after v10. That's because it was assuming that tmp_check/pgpass wouldn't exist already. However, it's only been since v11 that the Makefiles forcibly remove all of tmp_check/ before starting a TAP run. This fix to unlink the file is therefore strictly necessary only in v10 ... but it seems wisest to do it across the board, rather than let the test rely on external logic to get the conditions right.
* Fix over-eager ping'ing in logical replication receiver.Tom Lane2020-09-04
| | | | | | | | | | | Commit 3f60f690f only partially fixed the broken-status-tracking issue in LogicalRepApplyLoop: we need ping_sent to have the same lifetime as last_recv_timestamp. The effects are much less serious than what that commit fixed, though. AFAICS this would just lead to extra ping requests being sent, once per second until the sender responds. Still, it's a bug, so backpatch to v10 as before. Discussion: https://postgr.es/m/959627.1599248476@sss.pgh.pa.us
* Collect attribute data on extension owned tables being dumpedAndrew Dunstan2020-09-04
| | | | | | | | | If this data is not collected, pg_dump segfaults if asked for column inserts. Fix by Fabrízio de Royes Mello Backpatch to release 12 where the bug was introduced.
* C comment: correct use of 64-"byte" cache line sizeBruce Momjian2020-09-04
| | | | | | | | Reported-by: Kelly Min Discussion: https://postgr.es/m/CAPSbxatOiQO90LYpSC3+svAU9-sHgDfEP4oFhcEUt_X=DqFA9g@mail.gmail.com Backpatch-through: 9.5
* Fix rare deadlock failure in create_am regression test.Tom Lane2020-09-04
| | | | | | | | | | | | | | | | | The "DROP ACCESS METHOD gist2" test will require locking the index to be dropped and then its table; while most ordinary operations lock a table first then its index. While no concurrent test scripts should be touching fast_emp4000, autovacuum might chance to be processing that table when the DROP runs, resulting in a deadlock failure. This is pretty rare but we see it in the buildfarm from time to time. To fix, acquire a lock on fast_emp4000 before issuing the DROP. Since the point of the exercise is mostly to prevent buildfarm failures, back-patch to 9.6 where this test was introduced. Discussion: https://postgr.es/m/839004.1599185607@sss.pgh.pa.us
* Avoid lockup of a parallel worker when reporting a long error message.Tom Lane2020-09-03
| | | | | | | | | | | | | | | | | | | | | Because sigsetjmp() will restore the initial state with signals blocked, the code path in bgworker.c for reporting an error and exiting would execute that way. Usually this is fairly harmless; but if a parallel worker had an error message exceeding the shared-memory communication buffer size (16K) it would lock up, because it would wait for a resume-sending signal from its parallel leader which it would never detect. To fix, just unblock signals at the appropriate point. This can be shown to fail back to 9.6. The lack of parallel query infrastructure makes it difficult to provide a simple test case for 9.5; but I'm pretty sure the issue exists in some form there as well, so apply the code change there too. Vignesh C, reviewed by Bharath Rupireddy, Robert Haas, and myself Discussion: https://postgr.es/m/CALDaNm1d1hHPZUg3xU4XjtWBOLCrA+-2cJcLpw-cePZ=GgDVfA@mail.gmail.com
* Fix typo in commentAlvaro Herrera2020-09-01
| | | | | | Introduced by 8b08f7d4820f; backpatch to 11. Discussion: https://postgr.es/m/20200812214918.GA30353@alvherre.pgsql
* Raise error on concurrent drop of partitioned indexAlvaro Herrera2020-09-01
| | | | | | | | | | | | | | | | | | We were already raising an error for DROP INDEX CONCURRENTLY on a partitioned table, albeit a different and confusing one: ERROR: DROP INDEX CONCURRENTLY must be first action in transaction Change that to throw a more comprehensible error: ERROR: cannot drop partitioned index \"%s\" concurrently Michael Paquier authored the test case for indexes on temporary partitioned tables. Backpatch to 11, where indexes on partitioned tables were added. Reported-by: Jan Mussler <jan.mussler@zalando.de> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/16594-d2956ca909585067@postgresql.org
* Teach libpq to handle arbitrary-length lines in .pgpass files.Tom Lane2020-09-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically there's been a hard-wired assumption here that no line of a .pgpass file could be as long as NAMEDATALEN*5 bytes. That's a bit shaky to start off with, because (a) there's no reason to suppose that host names fit in NAMEDATALEN, and (b) this figure fails to allow for backslash escape characters. However, it fails completely if someone wants to use a very long password, and we're now hearing reports of people wanting to use "security tokens" that can run up to several hundred bytes. Another angle is that the file is specified to allow comment lines, but there's no reason to assume that long comment lines aren't possible. Rather than guessing at what might be a more suitable limit, let's replace the fixed-size buffer with an expansible PQExpBuffer. That adds one malloc/free cycle to the typical use-case, but that's surely pretty cheap relative to the I/O this code has to do. Also, add TAP test cases to exercise this code, because there was no test coverage before. This reverts most of commit 2eb3bc588, as there's no longer a need for a warning message about overlength .pgpass lines. (I kept the explicit check for comment lines, though.) In HEAD and v13, this also fixes an oversight in 74a308cf5: there's not much point in explicit_bzero'ing the line buffer if we only do so in two of the three exit paths. Back-patch to all supported branches, except that the test case only goes back to v10 where src/test/authentication/ was added. Discussion: https://postgr.es/m/4187382.1598909041@sss.pgh.pa.us
* C comment: remove mention of use of t_hoff WAL structure memberBruce Momjian2020-08-31
| | | | | | | | Reported-by: Antonin Houska Discussion: https://postgr.es/m/21643.1595353537@antos Backpatch-through: 9.5
* Fix code for re-finding scan position in a multicolumn GIN index.Tom Lane2020-08-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | collectMatchBitmap() needs to re-find the index tuple it was previously looking at, after transiently dropping lock on the index page it's on. The tuple should still exist and be at its prior position or somewhere to the right of that, since ginvacuum never removes tuples but concurrent insertions could add one. However, there was a thinko in that logic, to the effect of expecting any inserted tuples to have the same index "attnum" as what we'd been scanning. Since there's no physical separation of tuples with different attnums, it's not terribly hard to devise scenarios where this fails, leading to transient "lost saved point in index" errors. (While I've duplicated this with manual testing, it seems impossible to make a reproducible test case with our available testing technology.) Fix by just continuing the scan when the attnum doesn't match. While here, improve the error message used if we do fail, so that it matches the wording used in btree for a similar case. collectMatchBitmap()'s posting-tree code path was previously not exercised at all by our regression tests. While I can't make a regression test that exhibits the bug, I can at least improve the code coverage here, so do that. The test case I made for this is an extension of one added by 4b754d6c1, so it only works in HEAD and v13; didn't seem worth trying hard to back-patch it. Per bug #16595 from Jesse Kinkead. This has been broken since multicolumn capability was added to GIN (commit 27cb66fdf), so back-patch to all supported branches. Discussion: https://postgr.es/m/16595-633118be8eef9ce2@postgresql.org
* Avoid pushing quals down into sub-queries that have grouping sets.Tom Lane2020-08-22
| | | | | | | | | | | | | | | | | | | | | | | | | | The trouble with doing this is that an apparently-constant subquery output column isn't really constant if it is a grouping column that appears in only some of the grouping sets. A qual using such a column would be subject to incorrect const-folding after push-down, as seen in bug #16585 from Paul Sivash. To fix, just disable qual pushdown altogether if the sub-query has nonempty groupingSets. While we could imagine far less restrictive solutions, there is not much point in working harder right now, because subquery_planner() won't move HAVING clauses to WHERE within such a subquery. If the qual stays in HAVING it's not going to be a lot more useful than if we'd kept it at the outer level. Having said that, this restriction could be removed if we used a parsetree representation that distinguished such outputs from actual constants, which is something I hope to do in future. Hence, make the patch a minimal addition rather than integrating it more tightly (e.g. by renumbering the existing items in subquery_is_pushdown_safe's comment). Back-patch to 9.5 where grouping sets were introduced. Discussion: https://postgr.es/m/16585-9d8c340d23ade8c1@postgresql.org
* Fix handling of CREATE TABLE LIKE with inheritance.Tom Lane2020-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a CREATE TABLE command uses both LIKE and traditional inheritance, Vars in CHECK constraints and expression indexes that are absorbed from a LIKE parent table tended to get mis-numbered, resulting in wrong answers and/or bizarre error messages (though probably not any actual crashes, thanks to validation occurring in the executor). In v12 and up, the same could happen to Vars in GENERATED expressions, even in cases with no LIKE clause but multiple traditional-inheritance parents. The cause of the problem for LIKE is that parse_utilcmd.c supposed it could renumber such Vars correctly during transformCreateStmt(), which it cannot since we have not yet accounted for columns added via inheritance. Fix that by postponing processing of LIKE INCLUDING CONSTRAINTS, DEFAULTS, GENERATED, INDEXES till after we've performed DefineRelation(). The error with GENERATED and multiple inheritance is a simple oversight in MergeAttributes(); it knows it has to renumber Vars in inherited CHECK constraints, but forgot to apply the same processing to inherited GENERATED expressions (a/k/a defaults). Per bug #16272 from Tom Gottfried. The non-GENERATED variants of the issue are ancient, presumably dating right back to the addition of CREATE TABLE LIKE; hence back-patch to all supported branches. Discussion: https://postgr.es/m/16272-6e32da020e9a9381@postgresql.org
* Fix a few typos in JIT comments and READMEDavid Rowley2020-08-21
| | | | | | | Reviewed-by: Abhijit Menon-Sen Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CAApHDvobgmCs6CohqhKTUf7D8vffoZXQTCBTERo9gbOeZmvLTw%40mail.gmail.com Backpatch-through: 11, where JIT was added
* Suppress unnecessary RelabelType nodes in yet more cases.Tom Lane2020-08-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a477bfc1d fixed eval_const_expressions() to ensure that it didn't generate unnecessary RelabelType nodes, but I failed to notice that some other places in the planner had the same issue. Really noplace in the planner should be using plain makeRelabelType(), for fear of generating expressions that should be equal() to semantically equivalent trees, but aren't. An example is that because canonicalize_ec_expression() failed to be careful about this, we could end up with an equivalence class containing both a plain Const, and a Const-with-RelabelType representing exactly the same value. So far as I can tell this led to no visible misbehavior, but we did waste a bunch of cycles generating and evaluating "Const = Const-with-RelabelType" to prove such entries are redundant. Hence, move the support function added by a477bfc1d to where it can be more generally useful, and use it in the places where planner code previously used makeRelabelType. Back-patch to v12, like the previous patch. While I have no concrete evidence of any real misbehavior here, it's certainly possible that I overlooked a case where equivalent expressions that aren't equal() could cause a user-visible problem. In any case carrying extra RelabelType nodes through planning to execution isn't very desirable. Discussion: https://postgr.es/m/1311836.1597781384@sss.pgh.pa.us
* Avoid non-constant format string argument to fprintf().Heikki Linnakangas2020-08-18
| | | | | | | | | As Tom Lane pointed out, it could defeat the compiler's printf() format string verification. Backpatch to v12, like that patch that introduced it. Discussion: https://www.postgresql.org/message-id/1069283.1597672779%40sss.pgh.pa.us
* Disable autovacuum for BRIN test tableAlvaro Herrera2020-08-17
| | | | | | | | This should improve stability in the tests. Per buildfarm member hyrax (CLOBBER_CACHE_ALWAYS) via Tom Lane. Discussion: https://postgr.es/m/871534.1597503261@sss.pgh.pa.us
* Fix printing last progress report line in client programs.Heikki Linnakangas2020-08-17
| | | | | | | | | | | | | | | | | A number of client programs have a "--progress" option that when printing to a TTY, updates the current line by printing a '\r' and overwriting it. After the last line, '\n' needs to be printed to move the cursor to the next line. pg_basebackup and pgbench got this right, but pg_rewind and pg_checksums were slightly wrong. pg_rewind printed the newline to stdout instead of stderr, and pg_checksums printed the newline even when not printing to a TTY. Fix them, and also add a 'finished' argument to pg_basebackup's progress_report() function, to keep it consistent with the other programs. Backpatch to v12. pg_rewind's newline was broken with the logging changes in commit cc8d415117 in v12, and pg_checksums was introduced in v12. Discussion: https://www.postgresql.org/message-id/82b539e5-ae33-34b0-1aee-22b3379fd3eb@iki.fi
* Move new LOCKTAG_DATABASE_FROZEN_IDS to end of enum LockTagType.Noah Misch2020-08-15
| | | | | | | | | | | Several PGXN modules reference LockTagType values; renumbering would force a recompile of those modules. Oversight in back-patch of today's commit 566372b3d6435639e4cc4476d79b8505a0297c87. Back-patch to released branches, v12 through 9.5. Reported by Tom Lane. Discussion: https://postgr.es/m/921383.1597523945@sss.pgh.pa.us
* Prevent concurrent SimpleLruTruncate() for any given SLRU.Noah Misch2020-08-15
| | | | | | | | | | | | | | | | | The SimpleLruTruncate() header comment states the new coding rule. To achieve this, add locktype "frozenid" and two LWLocks. This closes a rare opportunity for data loss, which manifested as "apparent wraparound" or "could not access status of transaction" errors. Data loss is more likely in pg_multixact, due to released branches' thin margin between multiStopLimit and multiWrapLimit. If a user's physical replication primary logged ": apparent wraparound" messages, the user should rebuild standbys of that primary regardless of symptoms. At less risk is a cluster having emitted "not accepting commands" errors or "must be vacuumed" warnings at some point. One can test a cluster for this data loss by running VACUUM FREEZE in every database. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
* Be more careful about the shape of hashable subplan clauses.Tom Lane2020-08-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nodeSubplan.c expects that the testexpr for a hashable ANY SubPlan has the form of one or more OpExprs whose LHS is an expression of the outer query's, while the RHS is an expression over Params representing output columns of the subquery. However, the planner only went as far as verifying that the clauses were all binary OpExprs. This works 99.99% of the time, because the clauses have the right shape when emitted by the parser --- but it's possible for function inlining to break that, as reported by PegoraroF10. To fix, teach the planner to check that the LHS and RHS contain the right things, or more accurately don't contain the wrong things. Given that this has been broken for years without anyone noticing, it seems sufficient to just give up hashing when it happens, rather than go to the trouble of commuting the clauses back again (which wouldn't necessarily work anyway). While poking at that, I also noticed that nodeSubplan.c had a baked-in assumption that the number of hash clauses is identical to the number of subquery output columns. Again, that's fine as far as parser output goes, but it's not hard to break it via function inlining. There seems little reason for that assumption though --- AFAICS, the only thing it's buying us is not having to store the number of hash clauses explicitly. Adding code to the planner to reject such cases would take more code than getting nodeSubplan.c to cope, so I fixed it that way. This has been broken for as long as we've had hashable SubPlans, so back-patch to all supported branches. Discussion: https://postgr.es/m/1549209182255-0.post@n3.nabble.com
* pg_dump: fix dependencies on FKs to partitioned tablesAlvaro Herrera2020-08-14
| | | | | | | | | | | | | | | | | | | | | | | | Parallel-restoring a foreign key that references a partitioned table with several levels of partitions can fail: pg_restore: while PROCESSING TOC: pg_restore: from TOC entry 6684; 2606 29166 FK CONSTRAINT fk fk_a_fkey postgres pg_restore: error: could not execute query: ERROR: there is no unique constraint matching given keys for referenced table "pk" Command was: ALTER TABLE fkpart3.fk ADD CONSTRAINT fk_a_fkey FOREIGN KEY (a) REFERENCES fkpart3.pk(a); This happens in parallel restore mode because some index partitions aren't yet attached to the topmost partitioned index that the FK uses, and so the index is still invalid. The current code marks the FK as dependent on the first level of index-attach dump objects; the bug is fixed by recursively marking the FK on their children. Backpatch to 12, where FKs to partitioned tables were introduced. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/3170626.1594842723@sss.pgh.pa.us Backpatch: 12-master
* Fix postmaster's behavior during smart shutdown.Tom Lane2020-08-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the postmaster has immediately killed all "optional" background processes, and subsequently refused to launch new ones while it's waiting for foreground client processes to exit. No doubt this seemed like an OK policy at some point; but it's a pretty bad one now, because it makes for a seriously degraded environment for the remaining clients: * Parallel queries are killed, and new ones fail to launch. (And our parallel-query infrastructure utterly fails to deal with the case in a reasonable way --- it just hangs waiting for workers that are not going to arrive. There is more work needed in that area IMO.) * Autovacuum ceases to function. We can tolerate that for awhile, but if bulk-update queries continue to run in the surviving client sessions, there's eventually going to be a mess. In the worst case the system could reach a forced shutdown to prevent XID wraparound. * The bgwriter and walwriter are also stopped immediately, likely resulting in performance degradation. Hence, let's rearrange things so that the only immediate change in behavior is refusing to let in new normal connections. Once the last normal connection is gone, shut everything down as though we'd received a "fast" shutdown. To implement this, remove the PM_WAIT_BACKUP and PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY while normal connections remain. A subsidiary state variable tracks whether or not we're letting in new connections in those states. This also allows having just one copy of the logic for killing child processes in smart and fast shutdown modes. I moved that logic into PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS. Back-patch to 9.6 where parallel query was added. In principle this'd be a good idea in 9.5 as well, but the risk/reward ratio is not as good there, since lack of autovacuum is not a problem during typical uses of smart shutdown. Per report from Bharath Rupireddy. Patch by me, reviewed by Thomas Munro Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com
* Fix typo in test comment.Heikki Linnakangas2020-08-14
|
* Handle new HOT chains in index-build table scansAlvaro Herrera2020-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | When a table is scanned by heapam_index_build_range_scan (née IndexBuildHeapScan) and the table lock being held allows concurrent data changes, it is possible for new HOT chains to sprout in a page that were unknown when the scan of a page happened. This leads to an error such as ERROR: failed to find parent tuple for heap-only tuple at (X,Y) in table "tbl" because the root tuple was not present when we first obtained the list of the page's root tuples. This can be fixed by re-obtaining the list of root tuples, if we see that a heap-only tuple appears to point to a non-existing root. This was reported by Anastasia as occurring for BRIN summarization (which exists since 9.5), but I think it could theoretically also happen with CREATE INDEX CONCURRENTLY (much older) or REINDEX CONCURRENTLY (very recent). It seems a happy coincidence that BRIN forces us to backpatch this all the way to 9.5. Reported-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Diagnosed-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Co-authored-by: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/602d8487-f0b2-5486-0088-0f372b2549fa@postgrespro.ru Backpatch: 9.5 - master
* BRIN: Handle concurrent desummarization properlyAlvaro Herrera2020-08-12
| | | | | | | | | | | | | | | If a page range is desummarized at just the right time concurrently with an index walk, BRIN would raise an error indicating index corruption. This is scary and unhelpful; silently returning that the page range is not summarized is sufficient reaction. This bug was introduced by commit 975ad4e602ff as additional protection against a bug whose actual fix was elsewhere. Backpatch equally. Reported-By: Anastasia Lubennikova <a.lubennikova@postgrespro.ru> Diagnosed-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/2588667e-d07d-7e10-74e2-7e1e46194491@postgrespro.ru Backpatch: 9.5 - master
* Stamp 12.4.REL_12_4Tom Lane2020-08-10
|
* Empty search_path in logical replication apply worker and walsender.Noah Misch2020-08-10
| | | | | | | | | | | | | | This is like CVE-2018-1058 commit 582edc369cdbd348d68441fc50fa26a84afd0c1a. Today, a malicious user of a publisher or subscriber database can invoke arbitrary SQL functions under an identity running replication, often a superuser. This fix may cause "does not exist" or "no schema has been selected to create in" errors in a replication process. After upgrading, consider watching server logs for these errors. Objects accruing schema qualification in the wake of the earlier commit are unlikely to need further correction. Back-patch to v10, which introduced logical replication. Security: CVE-2020-14349