aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Fix misbehavior with expression indexes on ON COMMIT DELETE ROWS tables.Tom Lane2019-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We implement ON COMMIT DELETE ROWS by truncating tables marked that way, which requires also truncating/rebuilding their indexes. But RelationTruncateIndexes asks the relcache for up-to-date copies of any index expressions, which may cause execution of eval_const_expressions on them, which can result in actual execution of subexpressions. This is a bad thing to have happening during ON COMMIT. Manuel Rigger reported that use of a SQL function resulted in crashes due to expectations that ActiveSnapshot would be set, which it isn't. The most obvious fix perhaps would be to push a snapshot during PreCommit_on_commit_actions, but I think that would just open the door to more problems: CommitTransaction explicitly expects that no user-defined code can be running at this point. Fortunately, since we know that no tuples exist to be indexed, there seems no need to use the real index expressions or predicates during RelationTruncateIndexes. We can set up dummy index expressions instead (we do need something that will expose the right data type, as there are places that build index tupdescs based on this), and just ignore predicates and exclusion constraints. In a green field it'd likely be better to reimplement ON COMMIT DELETE ROWS using the same "init fork" infrastructure used for unlogged relations. That seems impractical without catalog changes though, and even without that it'd be too big a change to back-patch. So for now do it like this. Per private report from Manuel Rigger. This has been broken forever, so back-patch to all supported branches.
* Small code simplificationPeter Eisentraut2019-11-29
| | | | FLOAT8PASSBYVAL can be used instead of USE_FLOAT8_BYVAL here.
* Make allow_system_table_mods settable at run timePeter Eisentraut2019-11-29
| | | | | | | | | | | | | Make allow_system_table_mods settable at run time by superusers. It was previously postmaster start only. We don't want to make system catalog DDL wide-open, but there are occasionally useful things to do like setting reloptions or statistics on a busy system table, and blocking those doesn't help anyone. Also, this enables the possibility of writing a test suite for this setting. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/8b00ea5e-28a7-88ba-e848-21528b632354%402ndquadrant.com
* Remove any-user DML capability from allow_system_table_modsPeter Eisentraut2019-11-29
| | | | | | | | | | | | Previously, allow_system_table_mods allowed a non-superuser to do DML on a system table without further permission checks. This has been removed, as it was quite inconsistent with the rest of the meaning of this setting. (Since allow_system_table_mods was previously only accessible with a server restart, it is unlikely that anyone was using this possibility.) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/8b00ea5e-28a7-88ba-e848-21528b632354%402ndquadrant.com
* Add error position to an error messagePeter Eisentraut2019-11-29
| | | | | Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/6e7aa4a1-be6a-1a75-b1f9-83a678e5184a@2ndquadrant.com
* Remove unnecessary clauses_attnums variableTomas Vondra2019-11-28
| | | | | | | | | | Commit c676e659b2 reworked how choose_best_statistics() picks the best extended statistics, but failed to remove clauses_attnums which is now unnecessary. So get rid of it and backpatch to 12, same as c676e659b2. Author: Tomas Vondra Discussion: https://postgr.es/m/CA+u7OA7H5rcE2=8f263w4NZD6ipO_XOrYB816nuLXbmSTH9pQQ@mail.gmail.com Backpatch-through: 12
* Fix choose_best_statistics to check clauses individuallyTomas Vondra2019-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When picking the best extended statistics object for a list of clauses, it's not enough to look at attnums extracted from the clause list as a whole. Consider for example this query with OR clauses: SELECT * FROM t WHERE (t.a = 1) OR (t.b = 1) OR (t.c = 1) with a statistics defined on columns (a,b). Relying on attnums extracted from the whole OR clause, we'd consider the statistics usable. That does not work, as we see the conditions as a single OR-clause, referencing an attribute not covered by the statistic, leading to empty list of clauses to be estimated using the statistics and an assert failure. This changes choose_best_statistics to check which clauses are actually covered, and only using attributes from the fully covered ones. For the previous example this means the statistics object will not be considered as compatible with the OR-clause. Backpatch to 12, where MCVs were introduced. The issue does not affect older versions because functional dependencies don't handle OR clauses. Author: Tomas Vondra Reviewed-by: Dean Rasheed Reported-By: Manuel Rigger Discussion: https://postgr.es/m/CA+u7OA7H5rcE2=8f263w4NZD6ipO_XOrYB816nuLXbmSTH9pQQ@mail.gmail.com Backpatch-through: 12
* Remove useless "return;" linesAlvaro Herrera2019-11-28
| | | | Discussion: https://postgr.es/m/20191128144653.GA27883@alvherre.pgsql
* Fix typo in comment.Etsuro Fujita2019-11-27
|
* Allow access to child table statistics if user can read parent table.Tom Lane2019-11-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fix for CVE-2017-7484 disallowed use of pg_statistic data for planning purposes if the user would not be able to select the associated column and a non-leakproof function is to be applied to the statistics values. That turns out to disable use of pg_statistic data in some common cases involving inheritance/partitioning, where the user does have permission to select from the parent table that was actually named in the query, but not from a child table whose stats are needed. Since, in non-corner cases, the user *can* select the child table's data via the parent, this restriction is not actually useful from a security standpoint. Improve the logic so that we also check the permissions of the originally-named table, and allow access if select permission exists for that. When checking access to stats for a simple child column, we can map the child column number back to the parent, and perform this test exactly (including not allowing access if the child column isn't exposed by the parent). For expression indexes, the current logic just insists on whole-table select access, and this patch allows access if the user can select the whole parent table. In principle, if the child table has extra columns, this might allow access to stats on columns the user can't read. In practice, it's unlikely that the planner is going to do any stats calculations involving expressions that are not visible to the query, so we'll ignore that fine point for now. Perhaps someday we'll improve that logic to detect exactly which columns are used by an expression index ... but today is not that day. Back-patch to v11. The issue was created in 9.2 and up by the CVE-2017-7484 fix, but this patch depends on the append_rel_array[] planner data structure which only exists in v11 and up. In practice the issue is most urgent with partitioned tables, so fixing v11 and later should satisfy much of the practical need. Dilip Kumar and Amit Langote, with some kibitzing by me Discussion: https://postgr.es/m/3876.1531261875@sss.pgh.pa.us
* Add safeguards for pg_fsync() called with incorrectly-opened fdsMichael Paquier2019-11-26
| | | | | | | | | | | | | | | | | | | | | | | | On some platforms, fsync() returns EBADFD when opening a file descriptor with O_RDONLY (read-only), leading ultimately now to a PANIC to prevent data corruption. This commit adds a new sanity check in pg_fsync() based on fcntl() to make sure that we don't repeat again mistakes with incorrectly-set file descriptors so as problems are detected at an early stage. Without that, such errors could only be detected after running Postgres on a specific supported platform for the culprit code path, which could take some time before being found. b8e19b93 was a fix for such a problem, which got undetected for more than 5 years, and a586cc4b fixed another similar issue. Note that the new check added works as well when fsync=off is configured, so as all regression tests would detect problems as long as assertions are enabled. fcntl() being not available on Windows, the new checks do not happen there. Author: Michael Paquier Reviewed-by: Mark Dilger Discussion: https://postgr.es/m/20191009062640.GB21379@paquier.xyz
* Don't shut down Gather[Merge] early under Limit.Amit Kapila2019-11-26
| | | | | | | | | | | | | | | | | | | Revert part of commit 19df1702f5. Early shutdown was added by that commit so that we could collect statistics from workers, but unfortunately, it interacted badly with rescans. The problem is that we ended up destroying the parallel context which is required for rescans. This leads to rescans of a Limit node over a Gather node to produce unpredictable results as it tries to access destroyed parallel context. By reverting the early shutdown code, we might lose statistics in some cases of Limit over Gather [Merge], but that will require further study to fix. Reported-by: Jerry Sievers Diagnosed-by: Thomas Munro Author: Amit Kapila, testcase by Vignesh C Backpatch-through: 9.6 Discussion: https://postgr.es/m/87ims2amh6.fsf@jsievers.enova.com
* Use procsignal_sigusr1_handler for auxiliary processes.Robert Haas2019-11-25
| | | | | | | | | | | | | | | AuxiliaryProcessMain does ProcSignalInit, so one might expect that auxiliary processes would need to respond to SendProcSignal, but none of the auxiliary processes do that. Change them to use procsignal_sigusr1_handler instead of their own private handlers so that they do. Besides seeming more correct, this is also less code. It shouldn't make any functional difference right now because, as far as we know, there are no current cases where SendProcSignal targets an auxiliary process, but there are plans to change that in the future. Andres Freund Discussion: http://postgr.es/m/20181030051643.elbxjww5jjgnjaxg@alap3.anarazel.de
* Refactor WAL file-reading code into WALRead()Alvaro Herrera2019-11-25
| | | | | | | | | | | | | | | | | | | | | XLogReader, walsender and pg_waldump all had their own routines to read data from WAL files to memory, with slightly different approaches according to the particular conditions of each environment. There's a lot of commonality, so we can refactor that into a single routine WALRead in XLogReader, and move the differences to a separate (simpler) callback that just opens the next WAL-segment. This results in a clearer (ahem) code flow. The error reporting needs are covered by filling in a new error-info struct, WALReadError, and it's the caller's responsibility to act on it. The backend has WALReadRaiseError() to do so. We no longer ever need to seek in this interface; switch to using pg_pread(). Author: Antonin Houska, with contributions from Álvaro Herrera Reviewed-by: Michaël Paquier, Kyotaro Horiguchi Discussion: https://postgr.es/m/14984.1554998742@spoje.net
* Fix unportable printf format introduced in commit 9290ad198.Tom Lane2019-11-25
| | | | | | | "%ld" is not an acceptable format spec for int64 variables, though it accidentally works on most non-Windows 64-bit platforms. Follow the lead of commit 6a1cd8b92, and use "%lld" with an explicit cast to long long. Per buildfarm.
* Make the order of the header file includes consistent.Amit Kapila2019-11-25
| | | | | | | | | Similar to commits 14aec03502, 7e735035f2 and dddf4cdc33, this commit makes the order of header file inclusion consistent in more places. Author: Vignesh C Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
* Fix inconsistent variable name in static function of mac8.cMichael Paquier2019-11-25
| | | | | | | Both argument names were reversed in the declaration of the function. Author: Ranier Vilela Discussion: https://postgr.es/m/MN2PR18MB292755AEFF9A9144B220ABEEE34B0@MN2PR18MB2927.namprd18.prod.outlook.com
* Refactor reloption handling for index AMs in-coreMichael Paquier2019-11-25
| | | | | | | | | | | | | | | | | | This reworks the reloption parsing and build of a couple of index AMs by creating new structures for each index AM's options. This split was already done for BRIN, GIN and GiST (which actually has a fillfactor parameter), but not for hash, B-tree and SPGiST which relied on StdRdOptions due to an overlap with the default option set. This saves a couple of bytes for rd_options in each relcache entry with indexes making use of relation options, and brings more consistency between all index AMs. While on it, add a couple of AssertMacro() calls to make sure that utility macros to grab values of reloptions are used with the expected index AM. Author: Nikolay Shaplov Reviewed-by: Amit Langote, Michael Paquier, Álvaro Herrera, Dent John Discussion: https://postgr.es/m/4127670.gFlpRb6XCm@x200m
* Doc: improve discussion of race conditions involved in LISTEN.Tom Lane2019-11-24
| | | | | | | | The user docs didn't really explain how to use LISTEN safely, so clarify that. Also clean up some fuzzy-headed explanations in comments. No code changes. Discussion: https://postgr.es/m/3ac7f397-4d5f-be8e-f354-440020675694@gmail.com
* Avoid assertion failure with LISTEN in a serializable transaction.Tom Lane2019-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If LISTEN is the only action in a serializable-mode transaction, and the session was not previously listening, and the notify queue is not empty, predicate.c reported an assertion failure. That happened because we'd acquire the transaction's initial snapshot during PreCommit_Notify, which was called *after* predicate.c expects any such snapshot to have been established. To fix, just swap the order of the PreCommit_Notify and PreCommit_CheckForSerializationFailure calls during CommitTransaction. This will imply holding the notify-insertion lock slightly longer, but the difference could only be meaningful in serializable mode, which is an expensive option anyway. It appears that this is just an assertion failure, with no consequences in non-assert builds. A snapshot used only to scan the notify queue could not have been involved in any serialization conflicts, so there would be nothing for PreCommit_CheckForSerializationFailure to do except assign it a prepareSeqNo and set the SXACT_FLAG_PREPARED flag. And given no conflicts, neither of those omissions affect the behavior of ReleasePredicateLocks. This admittedly once-over-lightly analysis is backed up by the lack of field reports of trouble. Per report from Mark Dilger. The bug is old, so back-patch to all supported branches; but the new test case only goes back to 9.6, for lack of adequate isolationtester infrastructure before that. Discussion: https://postgr.es/m/3ac7f397-4d5f-be8e-f354-440020675694@gmail.com Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
* Stabilize NOTIFY behavior by transmitting notifies before ReadyForQuery.Tom Lane2019-11-24
| | | | | | | | | | | | | | | | | | | | | | This patch ensures that, if any notify messages were received during a just-finished transaction, they get sent to the frontend just before not just after the ReadyForQuery message. With libpq and other client libraries that act similarly, this guarantees that the client will see the notify messages as available as soon as it thinks the transaction is done. This probably makes no difference in practice, since in realistic use-cases the application would have to cope with asynchronous arrival of notify events anyhow. However, it makes it a lot easier to build cross-session-notify test cases with stable behavior. I'm a bit surprised now that we've not seen any buildfarm instability with the test cases added by commit b10f40bf0. Tests that I intend to add in an upcoming bug fix are definitely unstable without this. Back-patch to 9.6, which is as far back as we can do NOTIFY testing with the isolationtester infrastructure. Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
* Stabilize the results of pg_notification_queue_usage().Tom Lane2019-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function wasn't touched in commit 51004c717, but that turns out to be a bad idea, because its results now include any dead space that exists in the NOTIFY queue on account of our being lazy about advancing the queue tail. Notably, the isolation tests now fail if run twice without a server restart between, because async-notify's first test of the function will already show a positive value. It seems likely that end users would be equally unhappy about the result's instability. To fix, just make the function call asyncQueueAdvanceTail before computing its result. That should end in producing the same value as before, and it's hard to believe that there's any practical use-case where pg_notification_queue_usage() is called so often as to create a performance degradation, especially compared to what we did before. Out of paranoia, also mark this function parallel-restricted (it was volatile, but parallel-safe by default, before). Although the code seems to work fine when run in a parallel worker, that's outside the design scope of async.c, and it's a bit scary to have intentional side-effects happening in a parallel worker. There seems no plausible use-case where it'd be important to try to parallelize this, so let's not take any risk of introducing new bugs. In passing, re-pgindent async.c and run reformat-dat-files on pg_proc.dat, just because I'm a neatnik. Discussion: https://postgr.es/m/13881.1574557302@sss.pgh.pa.us
* Remove debugging aidAlvaro Herrera2019-11-23
| | | | | | | This Assert(false) was not supposed to be in the committed copy. Reported by: Tom Lane Discussion: https://postgr.es/m/26476.1574525468@sss.pgh.pa.us
* Add object TRUNCATE hookJoe Conway2019-11-23
| | | | | | | | | | | | | All operations with acl permissions checks should have a corresponding hook so that, for example, mandatory access control (MAC) may be enforced by an extension. The command TRUNCATE is missing this hook, so add it. Patch by Yuli Khodorkovskiy with some editorialization by me. Based on the discussion not back-patched. A separate patch will exercise the hook in the sepgsql extension. Author: Yuli Khodorkovskiy Reviewed-by: Joe Conway Discussion: https://postgr.es/m/CAFL5wJcomybj1Xdw7qWmPJRpGuFukKgNrDb6uVBaCMgYS9dkaA%40mail.gmail.com
* Fix bogus tuple-slot management in logical replication UPDATE handling.Tom Lane2019-11-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | slot_modify_cstrings seriously abused the TupleTableSlot API by relying on a slot's underlying data to stay valid across ExecClearTuple. Since this abuse was also quite undocumented, it's little surprise that the case got broken during the v12 slot rewrites. As reported in bug #16129 from Ondřej Jirman, this could lead to crashes or data corruption when a logical replication subscriber processes a row update. Problems would only arise if the subscriber's table contained columns of pass-by-ref types that were not being copied from the publisher. Fix by explicitly copying the datum/isnull arrays from the source slot that the old row was in already. This ends up being about the same thing that happened pre-v12, but hopefully in a less opaque and fragile way. We might've caught the problem sooner if there were any test cases dealing with updates involving non-replicated or dropped columns. Now there are. Back-patch to v10 where this code came in. Even though the failure does not manifest before v12, IMO this code is too fragile to leave as-is. In any case we certainly want the additional test coverage. Patch by me; thanks to Tomas Vondra for initial investigation. Discussion: https://postgr.es/m/16129-a0c0f48e71741e5f@postgresql.org
* Defend against self-referential views in relation_is_updatable().Tom Lane2019-11-21
| | | | | | | | | | | | | | | | | | | | | While a self-referential view doesn't actually work, it's possible to create one, and it turns out that this breaks some of the information_schema views. Those views call relation_is_updatable(), which neglected to consider the hazards of being recursive. In older PG versions you get a "stack depth limit exceeded" error, but since v10 it'd recurse to the point of stack overrun and crash, because commit a4c35ea1c took out the expression_returns_set() call that was incidentally checking the stack depth. Since this function is only used by information_schema views, it seems like it'd be better to return "not updatable" than suffer an error. Hence, add tracking of what views we're examining, in just the same way that the nearby fireRIRrules() code detects self-referential views. I added a check_stack_depth() call too, just to be defensive. Per private report from Manuel Rigger. Back-patch to all supported versions.
* Remove configure --disable-float4-byvalPeter Eisentraut2019-11-21
| | | | | | | | | | | This build option was only useful to maintain compatibility for version-0 functions, but those are no longer supported, so this option can be removed. float4 is now always pass-by-value; the pass-by-reference code path is completely removed. Discussion: https://www.postgresql.org/message-id/flat/f3e1e576-2749-bbd7-2d57-3f9dcf75255a@2ndquadrant.com
* Make DROP DATABASE command generate less WAL records.Fujii Masao2019-11-21
| | | | | | | | | | | | | | | | | | Previously DROP DATABASE generated as many XLOG_DBASE_DROP WAL records as the number of tablespaces that the database to drop uses. This caused the scans of shared_buffers as many times as the number of the tablespaces during recovery because WAL replay of one XLOG_DBASE_DROP record needs that full scan. This could make the recovery time longer especially when shared_buffers is large. This commit changes DROP DATABASE so that it generates only one XLOG_DBASE_DROP record, and registers the information of all the tablespaces into it. Then, WAL replay of XLOG_DBASE_DROP record needs full scan of shared_buffers only once, and which may improve the recovery performance. Author: Fujii Masao Reviewed-by: Kirk Jamison, Simon Riggs Discussion: https://postgr.es/m/CAHGQGwF8YwNH0ZaL+2wjZPkj+ji9UhC+Z4ScnG97WKtVY5L9iw@mail.gmail.com
* Allow ALTER VIEW command to rename the column in the view.Fujii Masao2019-11-21
| | | | | | | | | ALTER TABLE RENAME COLUMN command always can be used to rename the column in the view, but it's reasonable to add that syntax to ALTER VIEW too. Author: Fujii Masao Reviewed-by: Ibrar Ahmed, Yu Kimura Discussion: https://postgr.es/m/CAHGQGwHoQMD3b-MqTLcp1MgdhCpOKU7QNRwjFooT4_d+ti5v6g@mail.gmail.com
* Track statistics for spilling of changes from ReorderBuffer.Amit Kapila2019-11-21
| | | | | | | | | | This adds the statistics about transactions spilled to disk from ReorderBuffer. Users can query the pg_stat_replication view to check these stats. Author: Tomas Vondra, with bug-fixes and minor changes by Dilip Kumar Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
* Provide statistics for hypothetical BRIN indexesMichael Paquier2019-11-21
| | | | | | | | | | | | | | | | | | | | | | Trying to use hypothetical indexes with BRIN currently fails when trying to access a relation that does not exist when looking for the statistics. With the current API, it is not possible to easily pass a value for pages_per_range down to the hypothetical index, so this makes use of the default value of BRIN_DEFAULT_PAGES_PER_RANGE, which should be fine enough in most cases. Being able to refine or enforce the hypothetical costs in more optimistic ways would require more refactoring by filling in the statistics when building IndexOptInfo in plancat.c. This would involve ABI breakages around the costing routines, something not fit for stable branches. This is broken since 7e534ad, so backpatch down to v10. Author: Julien Rouhaud, Heikki Linnakangas Reviewed-by: Álvaro Herrera, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CAOBaU_ZH0LKEA8VFCocr6Lpte1ab0b6FpvgS0y4way+RPSXfYg@mail.gmail.com Backpatch-through: 10
* Sync patternsel_common's operator selection logic with pattern_prefix's.Tom Lane2019-11-20
| | | | | | | | | | | | | | | | | | | | | Make patternsel_common() select the comparison operators to use with hardwired logic that matches pattern_prefix()'s new logic, eliminating its dependencies on particular index opfamilies. This shouldn't change any behavior, as it's just replacing runtime operator lookups with the same values hard-wired. But it makes these closely-related functions look more alike, and saving some runtime syscache lookups is worth something. Actually, it's not quite true that this is zero behavioral change: when estimating for a column of type "name", the comparison constant will be kept as "text" not coerced to "name". But that's more correct anyway, and it allows additional simplification of the coercion logic, again syncing this more closely with pattern_prefix(). Per consideration of a report from Manuel Rigger. Discussion: https://postgr.es/m/CA+u7OA7nnGYy8rY0vdTe811NuA+Frr9nbcBO9u2Z+JxqNaud+g@mail.gmail.com
* Fix HeapTupleSatisfiesNonVacuumable() comment.Peter Geoghegan2019-11-20
| | | | Oversight in commit 63746189b23.
* Reduce match_pattern_prefix()'s dependencies on index opfamilies.Tom Lane2019-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically, the planner's LIKE/regex index optimizations were only carried out for specific index opfamilies. That's never been a great idea from the standpoint of extensibility, but it didn't matter so much as long as we had no practical way to extend such behaviors anyway. With the addition of planner support functions, and in view of ongoing work to support additional table and index AMs, it seems like a good time to relax this. Hence, recast the decisions in match_pattern_prefix() so that rather than decide which operators to generate by looking at what the index opfamily contains, we decide which operators to generate a-priori and then see if the opfamily supports them. This is much more defensible from a semantic standpoint anyway, since we know the semantics of the chosen operators precisely, and we only need to assume that the opfamily correctly implements operators it claims to support. The existing "pattern" opfamilies put a crimp in this approach, since we need to select the pattern operators if we want those to work. So we still have to special-case those opfamilies. But that seems all right, since in view of the addition of collations, the pattern opfamilies seem like a legacy hack that nobody will be building on. The only immediate effect of this change, so far as the core code is concerned, is that anchored LIKE/regex patterns can be mapped onto BRIN index searches, and exact-match patterns can be mapped onto hash indexes, not only btree and spgist indexes as before. That's not a terribly exciting result, but it does fix an omission mentioned in the ancient comments here. Note: no catversion bump, even though this touches pg_operator.dat, because it's only adding OID macros not changing the contents of postgres.bki. Per consideration of a report from Manuel Rigger. Discussion: https://postgr.es/m/CA+u7OA7nnGYy8rY0vdTe811NuA+Frr9nbcBO9u2Z+JxqNaud+g@mail.gmail.com
* Fix corner-case failure in match_pattern_prefix().Tom Lane2019-11-19
| | | | | | | | | | | | | | | | | | | | | | | | The planner's optimization code for LIKE and regex operators could error out with a complaint like "no = operator for opfamily NNN" if someone created a binary-compatible index (for example, a bpchar_ops index on a text column) on the LIKE's left argument. This is a consequence of careless refactoring in commit 74dfe58a5. The old code in match_special_index_operator only accepted specific combinations of the pattern operator and the index opclass, thereby indirectly guaranteeing that the opclass would have a comparison operator with the same LHS input type as the pattern operator. While moving the logic out to a planner support function, I simplified that test in a way that no longer guarantees that. Really though we'd like an altogether weaker dependency on the opclass, so rather than put back exactly the old code, just allow lookup failure. I have in mind now to rewrite this logic completely, but this is the minimum change needed to fix the bug in v12. Per report from Manuel Rigger. Back-patch to v12 where the mistake came in. Discussion: https://postgr.es/m/CA+u7OA7nnGYy8rY0vdTe811NuA+Frr9nbcBO9u2Z+JxqNaud+g@mail.gmail.com
* Fix page modification outside of critical section in GINAlexander Korotkov2019-11-20
| | | | | | | | By oversight 52ac6cd2d0 makes ginDeletePage() sets pd_prune_xid of page to be deleted before entering the critical section. It appears that only versions 11 and later were affected by this oversight. Backpatch-through: 11
* Revise GIN READMEAlexander Korotkov2019-11-20
| | | | | | | | | | | | | | | | | We find GIN concurrency bugs from time to time. One of the problems here is that concurrency of GIN isn't well-documented in README. So, it might be even hard to distinguish design bugs from implementation bugs. This commit revised concurrency section in GIN README providing more details. Some examples are illustrated in ASCII art. Also, this commit add the explanation of how is tuple layout in internal GIN B-tree page different in comparison with nbtree. Discussion: https://postgr.es/m/CAPpHfduXR_ywyaVN4%2BOYEGaw%3DcPLzWX6RxYLBncKw8de9vOkqw%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Peter Geoghegan Backpatch-through: 9.4
* Fix traversing to the deleted GIN page via downlinkAlexander Korotkov2019-11-20
| | | | | | | | | | | | | | | Current GIN code appears to don't handle traversing to the deleted page via downlink. This commit fixes that by stepping right from the delete page like we do in nbtree. This commit also fixes setting 'deleted' flag to the GIN pages. Now other page flags are not erased once page is deleted. That helps to keep our assertions true if we arrive deleted page via downlink. Discussion: https://postgr.es/m/CAPpHfdvMvsw-NcE5bRS7R1BbvA4BxoDnVVjkXC5W0Czvy9LVrg%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Peter Geoghegan Backpatch-through: 9.4
* Fix deadlock between ginDeletePage() and ginStepRight()Alexander Korotkov2019-11-20
| | | | | | | | | | | | | | | | | When ginDeletePage() is about to delete page it locks its left sibling to revise the rightlink. So, it locks pages in right to left manner. Int he same time ginStepRight() locks pages in left to right manner, and that could cause a deadlock. This commit makes ginScanToDelete() keep exclusive lock on left siblings of currently investigated path. That elimites need to relock left sibling in ginDeletePage(). Thus, deadlock with ginStepRight() can't happen anymore. Reported-by: Chen Huajun Discussion: https://postgr.es/m/5c332bd1.87b6.16d7c17aa98.Coremail.chjischj%40163.com Author: Alexander Korotkov Reviewed-by: Peter Geoghegan Backpatch-through: 10
* Add logical_decoding_work_mem to limit ReorderBuffer memory usage.Amit Kapila2019-11-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of deciding to serialize a transaction merely based on the number of changes in that xact (toplevel or subxact), this makes the decisions based on amount of memory consumed by the changes. The memory limit is defined by a new logical_decoding_work_mem GUC, so for example we can do this SET logical_decoding_work_mem = '128kB' to reduce the memory usage of walsenders or set the higher value to reduce disk writes. The minimum value is 64kB. When adding a change to a transaction, we account for the size in two places. Firstly, in the ReorderBuffer, which is then used to decide if we reached the total memory limit. And secondly in the transaction the change belongs to, so that we can pick the largest transaction to evict (and serialize to disk). We still use max_changes_in_memory when loading changes serialized to disk. The trouble is we can't use the memory limit directly as there might be multiple subxact serialized, we need to read all of them but we don't know how many are there (and which subxact to read first). We do not serialize the ReorderBufferTXN entries, so if there is a transaction with many subxacts, most memory may be in this type of objects. Those records are not included in the memory accounting. We also do not account for INTERNAL_TUPLECID changes, which are kept in a separate list and not evicted from memory. Transactions with many CTID changes may consume significant amounts of memory, but we can't really do much about that. The current eviction algorithm is very simple - the transaction is picked merely by size, while it might be useful to also consider age (LSN) of the changes for example. With the new Generational memory allocator, evicting the oldest changes would make it more likely the memory gets actually pfreed. The logical_decoding_work_mem can be set in postgresql.conf, in which case it serves as the default for all publishers on that instance. Author: Tomas Vondra, with changes by Dilip Kumar and Amit Kapila Reviewed-by: Dilip Kumar and Amit Kapila Tested-By: Vignesh C Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
* nbtree: Tweak _bt_pgaddtup() comments.Peter Geoghegan2019-11-18
| | | | | | Make it clear that _bt_pgaddtup() truncates the first data item on an internal page because its key is supposed to be treated as minus infinity within _bt_compare().
* Further fix dumping of views that contain just VALUES(...).Tom Lane2019-11-16
| | | | | | | | | | | | | | | | | | | | | | | It turns out that commit e9f1c01b7 missed a case: we must print a VALUES clause in long format if get_query_def is given a resultDesc that would require the query's output column name(s) to be different from what the bare VALUES clause would produce. This applies in case an ALTER ... RENAME COLUMN has been done to a view that formerly could be printed in simple format, as shown in the added regression test case. It also explains bug #16119 from Dmitry Telpt, because it turns out that (unlike CREATE VIEW) CREATE MATERIALIZED VIEW fails to apply any column aliases it's given to the stored ON SELECT rule. So to get them to be printed, we have to account for the resultDesc renaming. It might be worth changing the matview code so that it creates the ON SELECT rule with the correct aliases; but we'd still need these messy checks in get_simple_values_rte to handle the case of a subsequent column rename, so any such change would be just neatnik-ism not a bug fix. Like the previous patch, back-patch to all supported branches. Discussion: https://postgr.es/m/16119-e64823f30a45a754@postgresql.org
* Properly determine length for on-disk TOAST valuesTomas Vondra2019-11-16
| | | | | | | | | | | | | In detoast_attr_slice, VARSIZE_ANY was used to compute compressed length of on-disk TOAST values. That's incorrect, because the varlena value may be just a TOAST pointer, producing either bogus value or crashing. This is likely why the code was crashing on big-endian machines before 540f31680913 replaced the VARSIZE with VARSIZE_ANY, which however only masked the issue. Reported-by: Rushabh Lathia Discussion: https://postgr.es/m/CAL-OGkthU9Gs7TZchf5OWaL-Gsi=hXqufTxKv9qpNG73d5na_g@mail.gmail.com
* Skip system attributes when applying mvdistinct statsTomas Vondra2019-11-16
| | | | | | | | | | | | | | | | | When estimating number of distinct groups, we failed to ignore system attributes when matching the group expressions to mvdistinct stats, causing failures like ERROR: negative bitmapset member not allowed Fix that by simply skipping anything that is not a regular attribute. Backpatch to PostgreSQL 10, where the extended stats were introduced. Bug: #16111 Reported-by: Tuomas Leikola Author: Tomas Vondra Backpatch-through: 10 Discussion: https://postgr.es/m/16111-687799584c3a7e73@postgresql.org
* Always call ExecShutdownNode() if appropriate.Thomas Munro2019-11-16
| | | | | | | | | | | | | | Call ExecShutdownNode() after ExecutePlan()'s loop, rather than at each break. We had forgotten to do that in one case. The omission caused intermittent "temporary file leak" warnings from multi-batch parallel hash joins with a LIMIT clause. Back-patch to 11. Though the problem exists in theory in earlier parallel query releases, nothing really depended on it. Author: Kyotaro Horiguchi Reviewed-by: Thomas Munro, Amit Kapila Discussion: https://postgr.es/m/20191111.212418.2222262873417235945.horikyota.ntt%40gmail.com
* Cleanup code in reloptions.h regarding reloption handlingMichael Paquier2019-11-14
| | | | | | | | | | | | reloptions.h includes since ba748f7 a set of macros to handle reloption types in a way similar to how parseRelOptions() works. They have never been used in the core code, and we have more simple methods now to parse and fill in rd_options for a given relation depending on its relkind, so remove this interface to simplify things. Per discussion between Amit Langote, Álvaro Herrera and me. Discussion: https://postgr.es/m/CA+HiwqE6zbNO92az6pp5GiTw4tr-9rfCE0t84whQSP+YwSKjMQ@mail.gmail.com
* Split handling of reloptions for partitioned tablesMichael Paquier2019-11-14
| | | | | | | | | | | | | | Partitioned tables do not have relation options yet, but, similarly to what's done for views which have their own parsing table, it could make sense to introduce new parameters for some of the existing default ones like fillfactor, autovacuum, etc. Splitting things has the advantage to make the information stored in rd_options include only the necessary information, reducing the amount of memory used for a relcache entry with partitioned tables if new reloptions are introduced at this level. Author: Nikolay Shaplov Reviewed-by: Amit Langote, Michael Paquier Discussion: https://postgr.es/m/1627387.Qykg9O6zpu@x200m
* Remove unused code from tuplesort.Andres Freund2019-11-13
| | | | | | | | | copytup_index() is unused, as tuplesort_putindextuplevalues() doesn't use COPYTUP(). Replace function body with an elog(ERROR), as already done e.g. for copytup_datum(). Author: Andres Freund Discussion: https://postgr.es/m/20191013144153.ooxrfglvnaocsrx2@alap3.anarazel.de
* Add missing check_collation_set call to bpcharne().Tom Lane2019-11-13
| | | | | | | | | | | | We should throw an error for indeterminate collation, but bpcharne() was missing that logic, resulting in a much less user-friendly error (either an assertion failure or "cache lookup failed for collation 0"). Per report from Manuel Rigger. Back-patch to v12 where the mistake came in, evidently in commit 5e1963fb7. (Before non-deterministic collations, this function wasn't collation sensitive.) Discussion: https://postgr.es/m/CA+u7OA4HOjtymxAbuGNh4-X_2R0Lw5n01tzvP8E5-i-2gQXYWA@mail.gmail.com
* Fix silly initializations (cosmetic only).Tom Lane2019-11-13
| | | | | | | | | | | | | Initializing a pointer to "false" isn't per project style, and reportedly some compilers warn about it (though I've not seen any such warnings in the buildfarm). Seems to have come in with commit ff11e7f4b, so back-patch to v12 where that was added. Didier Gautheron Discussion: https://postgr.es/m/CAJRYxu+XQuM0qnSqt1Ujztu6fBPzMMAT3VEn6W32rgKG6A2Fsw@mail.gmail.com