aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
* Put back mistakenly removed #include.Tom Lane2020-04-08
| | | | | | | | In commit 4dbcb3f84 I removed some code from parse_coerce.c, and also removed some apparently-no-longer-needed #includes. But removing datum.h broke some not-compiled-by-default code. Discussion: https://postgr.es/m/20200407205436.pyjhddw5bn5upvsu@development
* Remove testing for precise LSN/reserved bytes in new TAP testAlvaro Herrera2020-04-07
| | | | | | | | Trying to ensure that a slot's restart_lsn or amount of reserved bytes exactly match some specific values seems unnecessary, and fragile as shown by failures in multiple buildfarm members. Discussion: https://postgr.es/m/20200407232602.GA21559@alvherre.pgsql
* Support PrefetchBuffer() in recovery.Thomas Munro2020-04-08
| | | | | | | | | | | | | | | Provide PrefetchSharedBuffer(), a variant that takes SMgrRelation, for use in recovery. Rename LocalPrefetchBuffer() to PrefetchLocalBuffer() for consistency. Add a return value to all of these. In recovery, tolerate and report missing files, so we can handle relations unlinked before crash recovery began. Also report cache hits and misses, so that callers can do faster buffer lookups and better I/O accounting. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
* Allow partitionwise join to handle nested FULL JOIN USING cases.Tom Lane2020-04-07
| | | | | | | | | | | | | | | | This case didn't work because columns merged by FULL JOIN USING are represented in the parse tree by COALESCE expressions, and the logic for recognizing a partitionable join failed to match upper-level join clauses to such expressions. To fix, synthesize suitable COALESCE expressions and add them to the nullable_partexprs lists. This is pretty ugly and brute-force, but it gets the job done. (I have ambitions of rethinking the way outer-join output Vars are represented, so maybe that will provide a cleaner solution someday. For now, do this.) Amit Langote, reviewed by Justin Pryzby, Richard Guo, and myself Discussion: https://postgr.es/m/CA+HiwqG2WVUGmLJqtR0tPFhniO=H=9qQ+Z3L_ZC+Y3-EVQHFGg@mail.gmail.com
* Allow partitionwise joins in more cases.Etsuro Fujita2020-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | Previously, the partitionwise join technique only allowed partitionwise join when input partitioned tables had exactly the same partition bounds. This commit extends the technique to some cases when the tables have different partition bounds, by using an advanced partition-matching algorithm introduced by this commit. For both the input partitioned tables, the algorithm checks whether every partition of one input partitioned table only matches one partition of the other input partitioned table at most, and vice versa. In such a case the join between the tables can be broken down into joins between the matching partitions, so the algorithm produces the pairs of the matching partitions, plus the partition bounds for the join relation, to allow partitionwise join for computing the join. Currently, the algorithm works for list-partitioned and range-partitioned tables, but not hash-partitioned tables. See comments in partition_bounds_merge(). Ashutosh Bapat and Etsuro Fujita, most of regression tests by Rajkumar Raghuwanshi, some of the tests by Mark Dilger and Amul Sul, reviewed by Dmitry Dolgov and Amul Sul, with additional review at various points by Ashutosh Bapat, Mark Dilger, Robert Haas, Antonin Houska, Amit Langote, Justin Pryzby, and Tomas Vondra Discussion: https://postgr.es/m/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com
* Fix circle_in to accept "(x,y),r" as it's advertised to do.Tom Lane2020-04-07
| | | | | | | | | | | Our documentation describes four allowed input syntaxes for circles, but the regression tests tried only three ... with predictable consequences. Remarkably, this has been wrong since the circle datatype was added in 1997, but nobody noticed till now. David Zhang, with some help from me Discussion: https://postgr.es/m/332c47fa-d951-7574-b5cc-a8f7f7201202@highgo.ca
* snapshot scalability: Move delayChkpt from PGXACT to PGPROC.Andres Freund2020-04-07
| | | | | | | | | | | | | | | | | | | | | The goal of separating hotly accessed per-backend data from PGPROC into PGXACT is to make accesses fast (GetSnapshotData() in particular). But delayChkpt is not actually accessed frequently; only when starting a checkpoint. As it is frequently modified (multiple times in the course of a single transaction), storing it in the same cacheline as hotly accessed data unnecessarily dirties a contended cacheline. Therefore move delayChkpt to PGPROC. This is part of a larger series of patches intending to improve GetSnapshotData() scalability. It is committed and pushed separately, as it is independently beneficial (small but measurable win, limited by the other frequent modifications of PGXACT). Author: Andres Freund Reviewed-By: Robert Haas, Thomas Munro, David Rowley Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
* Track SLRU page hits in SimpleLruReadPage_ReadOnlyTomas Vondra2020-04-08
| | | | | | | | | SLRU page hits were tracked only in SimpleLruReadPage, but that's not enough because we may hit the page in SimpleLruReadPage_ReadOnly in which case we don't call SimpleLruReadPage at all. Reported-by: Kuntal Ghosh Discussion: https://postgr.es/m/20200119143707.gyinppnigokesjok@development
* Fix XLogReader FD leak that makes backends unusable after 2PC usage.Andres Freund2020-04-07
| | | | | | | | | | | | | | | | | | | | | | | | Before the fix every 2PC commit/abort leaked a file descriptor. As the files are opened using BasicOpenFile(), that quickly leads to the backend running out of file descriptors. Once enough 2PC abort/commit have caused enough FDs to leak, any IO in the backend will fail with "Too many open files", as BasicOpenFilePerm() will have triggered all open files known to fd.c to be closed. The leak causing the problem at hand is a consequence of 0dc8ead46, but is only exascerbated by it. Previously most XLogPageReadCB callbacks used static variables to cache one open file, but after the commit the cache is private to each XLogReader instance. There never was infrastructure to close FDs at the time of XLogReaderFree, but the way XLogReader was used limited the leak to one FD. This commit just closes the during XLogReaderFree() if the FD is stored in XLogReaderState.seg.ws_segno. This may not be the way to solve this medium/long term, but at least unbreaks 2PC. Discussion: https://postgr.es/m/20200406025651.fpzdb5yyb7qyhqko@alap3.anarazel.de
* Appease perlcriticAlvaro Herrera2020-04-07
| | | | Food for the gods must always be found somehow, even when the land starves.
* Remove nbtree BTreeTupleSetAltHeapTID() function.Peter Geoghegan2020-04-07
| | | | | | | Since heap TID is supposed to be just another key attribute to the implementation, it doesn't make much sense to have separate BTreeTupleSetNAtts() and BTreeTupleSetAltHeapTID() functions. Merge the two functions together. This slightly simplifies _bt_truncate().
* Allow users to limit storage reserved by replication slotsAlvaro Herrera2020-04-07
| | | | | | | | | | | | | | | | Replication slots are useful to retain data that may be needed by a replication system. But experience has shown that allowing them to retain excessive data can lead to the primary failing because of running out of space. This new feature allows the user to configure a maximum amount of space to be reserved using the new option max_slot_wal_keep_size. Slots that overrun that space are invalidated at checkpoint time, enabling the storage to be released. Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20170228.122736.123383594.horiguchi.kyotaro@lab.ntt.co.jp
* Allow psql's \g and \gx commands to transiently change \pset options.Tom Lane2020-04-07
| | | | | | | | | | | | | | | | | We invented \gx to allow the "\pset expanded" flag to be forced on for the duration of one command output, but that turns out to not be nearly enough to satisfy the demand for variant output formats. Hence, make it possible to change any pset option(s) for the duration of a single command output, by writing "option=value ..." inside parentheses, for example \g (format=csv csv_fieldsep='\t') somefile \gx can now be understood as a shorthand for including expanded=on inside the parentheses. Patch by me, expanding on a proposal by Pavel Stehule Discussion: https://postgr.es/m/CAFj8pRBx9OnBPRJVtfA5ycUpySge-XootAXAsv_4rrkHxJ8eRg@mail.gmail.com
* Implement waiting for given lsn at transaction startAlexander Korotkov2020-04-07
| | | | | | | | | | | | | | | | | | | | This commit adds following optional clause to BEGIN and START TRANSACTION commands. WAIT FOR LSN lsn [ TIMEOUT timeout ] New clause pospones transaction start till given lsn is applied on standby. This clause allows user be sure, that changes previously made on primary would be visible on standby. New shared memory struct is used to track awaited lsn per backend. Recovery process wakes up backend once required lsn is applied. Author: Ivan Kartyshov, Anna Akenteva Reviewed-by: Craig Ringer, Thomas Munro, Robert Haas, Kyotaro Horiguchi Reviewed-by: Masahiko Sawada, Ants Aasma, Dmitry Ivanov, Simon Riggs Reviewed-by: Amit Kapila, Alexander Korotkov Discussion: https://postgr.es/m/0240c26c-9f84-30ea-fca9-93ab2df5f305%40postgrespro.ru
* Support FETCH FIRST WITH TIESAlvaro Herrera2020-04-07
| | | | | | | | | | | | | | | | | | WITH TIES is an option to the FETCH FIRST N ROWS clause (the SQL standard's spelling of LIMIT), where you additionally get rows that compare equal to the last of those N rows by the columns in the mandatory ORDER BY clause. There was a proposal by Andrew Gierth to implement this functionality in a more powerful way that would yield more features, but the other patch had not been finished at this time, so we decided to use this one for now in the spirit of incremental development. Author: Surafel Temesgen <surafel3000@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Discussion: https://postgr.es/m/CALAY4q9ky7rD_A4vf=FVQvCGngm3LOes-ky0J6euMrg=_Se+ag@mail.gmail.com Discussion: https://postgr.es/m/87o8wvz253.fsf@news-spur.riddles.org.uk
* Adjust bytea get_bit/set_bit to use int8 not int4 for bit numbering.Tom Lane2020-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since the existing bit number argument can't exceed INT32_MAX, it's not possible for these functions to manipulate bits beyond the first 256MB of a bytea value. Lift that restriction by redeclaring the bit number arguments as int8 (which requires a catversion bump, hence is not back-patchable). The similarly-named functions for bit/varbit don't really have a problem because we restrict those types to at most VARBITMAXLEN bits; hence leave them alone. While here, extend the encode/decode functions in utils/adt/encode.c to allow dealing with values wider than 1GB. This is not a live bug or restriction in current usage, because no input could be more than 1GB, and since none of the encoders can expand a string more than 4X, the result size couldn't overflow uint32. But it might be desirable to support more in future, so make the input length values size_t and the potential-output-length values uint64. Also add some test cases to improve the miserable code coverage of these functions. Movead Li, editorialized some by me; also reviewed by Ashutosh Bapat Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
* Remove debugging elog from pgstat_recv_resetslrucounterTomas Vondra2020-04-07
| | | | Reported-by: Thomas Munro
* Minor improvements in Incremental Sort explainTomas Vondra2020-04-07
| | | | | | | | | Some places still used "Maximum" instead of "Peak" when displaying info about sort space, so fix that. Also, add a comment clarifying why it's correct to check the number of full/prefix sort groups. Author: James Coleman Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
* Prevent archive recovery from scanning non-existent WAL files.Fujii Masao2020-04-08
| | | | | | | | | | | | | | | | | | | Previously when there were multiple timelines listed in the history file of the recovery target timeline, archive recovery searched all of them, starting from the newest timeline to the oldest one, to find the segment to read. That is, archive recovery had to continuously fail scanning the segment until it reached the timeline that the segment belonged to. These scans for non-existent segment could be harmful on the recovery performance especially when archival area was located on the remote storage and each scan could take a long time. To address the issue, this commit changes archive recovery so that it skips scanning the timeline that the segment to read doesn't belong to. Author: Kyotaro Horiguchi, tweaked a bit by Fujii Masao Reviewed-by: David Steele, Pavel Suderevsky, Grigory Smolkin Discussion: https://postgr.es/m/16159-f5a34a3a04dc67e0@postgresql.org Discussion: https://postgr.es/m/20200129.120222.1476610231001551715.horikyota.ntt@gmail.com
* Consider Incremental Sort paths at additional placesTomas Vondra2020-04-07
| | | | | | | | | | | | | | | | | | | Commit d2d8a229bc introduced Incremental Sort, but it was considered only in create_ordered_paths() as an alternative to regular Sort. There are many other places that require sorted input and might benefit from considering Incremental Sort too. This patch modifies a number of those places, but not all. The concern is that just adding Incremental Sort to any place that already adds Sort may increase the number of paths considered, negatively affecting planning time, without any benefit. So we've taken a more conservative approach, based on analysis of which places do affect a set of queries that did seem practical. This means some less common queries may not benefit from Incremental Sort yet. Author: Tomas Vondra Reviewed-by: James Coleman Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
* Fix representation of SORT_TYPE_STILL_IN_PROGRESS.Tom Lane2020-04-06
| | | | | | | | | | | | | | | | | | | It turns out that the code did indeed rely on a zeroed TuplesortInstrumentation.sortMethod field to indicate "this worker never did anything", although it seems the issue only comes up during certain race-condition-y cases. Hence, rearrange the TuplesortMethod enum to restore SORT_TYPE_STILL_IN_PROGRESS to having the value zero, and add some comments reinforcing that that isn't optional. Also future-proof a loop over the possible values of the enum. sizeof(bits32) happened to be the correct limit value, but only by purest coincidence. Per buildfarm and local investigation. Discussion: https://postgr.es/m/12222.1586223974@sss.pgh.pa.us
* Introduce xid8-based functions to replace txid_XXX.Thomas Munro2020-04-07
| | | | | | | | | | | | | | | | | | | | | The txid_XXX family of fmgr functions exposes 64 bit transaction IDs to users as int8. Now that we have an SQL type xid8 for FullTransactionId, define a new set of functions including pg_current_xact_id() and pg_current_snapshot() based on that. Keep the old functions around too, for now. It's a bit sneaky to use the same C functions for both, but since the binary representation is identical except for the signedness of the type, and since older functions are the ones using the wrong signedness, and since we'll presumably drop the older ones after a reasonable period of time, it seems reasonable to switch to FullTransactionId internally and share the code for both. Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com> Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com> Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
* Add SQL type xid8 to expose FullTransactionId to users.Thomas Munro2020-04-07
| | | | | | | | | | | Similar to xid, but 64 bits wide. This new type is suitable for use in various system views and administration functions. Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com> Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com> Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
* Use INT64_FORMAT when formatting int64 values in explainTomas Vondra2020-04-07
| | | | Per report from lapwing.
* Fix failures in incremental_sort due to number of workersTomas Vondra2020-04-07
| | | | | | | | The last test in incremental_sort suite prints a parallel plan, but some of the buildfarm animals have custom max_parallel_workers_per_gather values, causing failures. Fixed by setting the GUC to an explicit value. Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
* Fix nbtree kill_prior_tuple posting list assert.Peter Geoghegan2020-04-06
| | | | | | | | | | | | | | | | | | | An assertion added by commit 0d861bbb checked that _bt_killitems() only processes a BTScanPosItem whose heap TID is contained in a posting list tuple when its page offset number still matches what is on the page (i.e. when it matches the posting list tuple's current offset number). This was only correct in the common case where the page can't have changed since we first read it. It was not correct in cases where we don't drop the buffer pin (and don't need to verify the page hasn't changed using its LSN). The latter category includes scans involving unlogged tables, and scans that use a non-MVCC snapshot, per the logic originally introduced by commit 2ed5b87f. The assertion still seems helpful. Fix it by taking cases where the page may have been concurrently modified into account. Reported-By: Anastasia Lubennikova, Alexander Lakhin Discussion: https://postgr.es/m/c4e38e9a-0f9c-8e53-e639-adf343f94472@postgrespro.ru
* Fix show_incremental_sort_info with force_parallel_modeTomas Vondra2020-04-06
| | | | | | | | When executed with force_parallel_mode=regress, the function was exiting too early and thus failed to print the worker stats. Fixed by making it more like show_sort_info. Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
* Implement Incremental SortTomas Vondra2020-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Incremental Sort is an optimized variant of multikey sort for cases when the input is already sorted by a prefix of the requested sort keys. For example when the relation is already sorted by (key1, key2) and we need to sort it by (key1, key2, key3) we can simply split the input rows into groups having equal values in (key1, key2), and only sort/compare the remaining column key3. This has a number of benefits: - Reduced memory consumption, because only a single group (determined by values in the sorted prefix) needs to be kept in memory. This may also eliminate the need to spill to disk. - Lower startup cost, because Incremental Sort produce results after each prefix group, which is beneficial for plans where startup cost matters (like for example queries with LIMIT clause). We consider both Sort and Incremental Sort, and decide based on costing. The implemented algorithm operates in two different modes: - Fetching a minimum number of tuples without check of equality on the prefix keys, and sorting on all columns when safe. - Fetching all tuples for a single prefix group and then sorting by comparing only the remaining (non-prefix) keys. We always start in the first mode, and employ a heuristic to switch into the second mode if we believe it's beneficial - the goal is to minimize the number of unnecessary comparions while keeping memory consumption below work_mem. This is a very old patch series. The idea was originally proposed by Alexander Korotkov back in 2013, and then revived in 2017. In 2018 the patch was taken over by James Coleman, who wrote and rewrote most of the current code. There were many reviewers/contributors since 2013 - I've done my best to pick the most active ones, and listed them in this commit message. Author: James Coleman, Alexander Korotkov Reviewed-by: Tomas Vondra, Andreas Karlsson, Marti Raudsepp, Peter Geoghegan, Robert Haas, Thomas Munro, Antonin Houska, Andres Freund, Alexander Kuzmenkov Discussion: https://postgr.es/m/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
* Re-stabilize infinite_recurse() test case.Tom Lane2020-04-06
| | | | | | | | | | | | | | | | | Since commit 8f59f6b9c0, CLOBBER_CACHE_ALWAYS buildfarm members have been failing this test case because the error message now sometimes includes an error cursor position. It seems largely just luck that that never happened before, and there are likely to be more ways it could happen in future. Hence, rather than trying to prevent it, adjust the test script to suppress that component of the report. At some point we might need to back-patch this, but refrain until there's a demonstrated need. (We'd need a different fix before v12, anyway, since VERBOSITY=sqlstate is a recent thing.) Tom Lane and Andres Freund Discussion: https://postgr.es/m/30675.1586111599@sss.pgh.pa.us
* Add logical replication support to replicate into partitioned tablesPeter Eisentraut2020-04-06
| | | | | | | | | | | | Mainly, this adds support code in logical/worker.c for applying replicated operations whose target is a partitioned table to its relevant partitions. Author: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Petr Jelinek <petr@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/CA+HiwqH=Y85vRK3mOdjEkqFK+E=ST=eQiHdpj43L=_eJMOOznQ@mail.gmail.com
* Allow autovacuum to log WAL usage statistics.Amit Kapila2020-04-06
| | | | | | | | | This commit allows autovacuum to log WAL usage statistics added by commit df3b181499. Author: Julien Rouhaud Reviewed-by: Dilip Kumar and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
* Refactor cluster.c to use new routine get_index_isclustered()Michael Paquier2020-04-06
| | | | | | | | | | | | | | | | This new cache lookup routine has been introduced in a40caf5, and more code paths can directly use it. Note that in cluster_rel(), the code was returning immediately if the tuple's entry in pg_index for the clustered index was not valid. This commit changes the code so as a lookup error is raised instead, something that could not happen from the start as we check for the existence of the index beforehand, while holding an exclusive lock on the parent table. Author: Justin Pryzby Reviewed-by: Álvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com
* Add the option to report WAL usage in EXPLAIN and auto_explain.Amit Kapila2020-04-06
| | | | | | | | | | | | | | | This commit adds a new option WAL similar to existing option BUFFERS in the EXPLAIN command. This option allows to include information on WAL record generation added by commit df3b181499 in EXPLAIN output. This also allows the WAL usage information to be displayed via the auto_explain module. A new parameter auto_explain.log_wal controls whether WAL usage statistics are printed when an execution plan is logged. This parameter has no effect unless auto_explain.log_analyze is enabled. Author: Julien Rouhaud Reviewed-by: Dilip Kumar and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
* Preserve clustered index after rewrites with ALTER TABLEMichael Paquier2020-04-06
| | | | | | | | | | | | | A table rewritten by ALTER TABLE would lose tracking of an index usable for CLUSTER. This setting is tracked by pg_index.indisclustered and is controlled by ALTER TABLE, so some extra work was needed to restore it properly. Note that ALTER TABLE only marks the index that can be used for clustering, and does not do the actual operation. Author: Amit Langote, Justin Pryzby Reviewed-by: Ibrar Ahmed, Michael Paquier Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com Backpatch-through: 9.5
* Recompute stack base in forked postmaster children.Andres Freund2020-04-05
| | | | | | | | | | | This is for the benefit of running postgres under the rr debugger. When using rr signal handlers running while a syscall is active use an alternative stack. As e.g. bgworkers are started from within signal handlers, the forked backend then has a different stack base than postmaster. Previously that subsequently lead to those processes triggering spurious "stack depth limit exceeded" errors. Discussion: https://postgr.es/m/20200327182217.ubrrl32lyfhxfwk5@alap3.anarazel.de
* Use TransactionXmin instead of RecentGlobalXmin in heap_abort_speculative().Andres Freund2020-04-05
| | | | | | | | | | | | | | | There's a very low risk that RecentGlobalXmin could be far enough in the past to be older than relfrozenxid, or even wrapped around. Luckily the consequences of that having happened wouldn't be too bad - the page wouldn't be pruned for a while. Avoid that risk by using TransactionXmin instead. As that's announced via MyPgXact->xmin, it is protected against wrapping around (see code comments for details around relfrozenxid). Author: Andres Freund Discussion: https://postgr.es/m/20200328213023.s4eyijhdosuc4vcj@alap3.anarazel.de Backpatch: 9.5-
* Fix recently introduced typo.Andres Freund2020-04-05
| | | | Reported-By: David Rowley
* Save errno across LWLockRelease() callsPeter Eisentraut2020-04-05
| | | | | | Fixup for "Drop slot's LWLock before returning from SaveSlotToPath()" Reported-by: Michael Paquier <michael@paquier.xyz>
* Further improve stability fix for partition_aggregate test.Tom Lane2020-04-05
| | | | | | | | | | | | | | | | | | Commit 7cb0a423f overlooked that the multi-level partition test table pagg_tab_ml still had an exactly even row split at its upper level of partitioning, so that some of the sub-aggregation plan steps still had exactly equal costs, leading to plan instability. Tweak the partition boundaries some more to make the row distribution unequal at both levels. This leads to more changes in the "expected" plan order than the previous round, but it seems fine. (Actually, I'm surprised that this didn't affect even more plans in this test: looking at the underlying costs shows that some of the parallel plan groups are *not* getting sorted by cost. Bug?) Per buildfarm member lousyjack, https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-04-04%2021%3A03%3A04 Discussion: https://postgr.es/m/24467.1585838693@sss.pgh.pa.us
* Add perl2host call missing from a new test file.Noah Misch2020-04-04
| | | | | | | Oversight in today's commit c6b92041d38512a4176ed76ad06f713d2e6c01a8. Per buildfarm member jacana. Discussion: http://postgr.es/m/20200404223212.GC3442685@rfd.leadboat.com
* Remove bogus Assert, add some regression test cases showing why.Tom Lane2020-04-04
| | | | | | | | | | | | | Commit 77ec5affb added an assertion to enforce_generic_type_consistency that boils down to "if the function result is polymorphic, there must be at least one polymorphic argument". This should be true for user-created functions, but there are built-in functions for which it's not true, as pointed out by Jaime Casanova. Hence, go back to the old behavior of leaving the return type alone. There's only a limited amount of stuff you can do with such a function result, but it does work to some extent; add some regression test cases to ensure we don't break that again. Discussion: https://postgr.es/m/CAJGNTeMbhtsCUZgJJ8h8XxAJbK7U2ipsX8wkHRtZRz-NieT8RA@mail.gmail.com
* Skip WAL for new relfilenodes, under wal_level=minimal.Noah Misch2020-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN. Future servers accept older WAL, so this bump is discretionary. Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
* Revert "Improve handling of parameter differences in physical replication"Peter Eisentraut2020-04-04
| | | | | | | | This reverts commit 246f136e76ecd26844840f2b2057e2c87ec9868d. That patch wasn't quite complete enough. Discussion: https://www.postgresql.org/message-id/flat/E1jIpJu-0007Ql-CL%40gemulon.postgresql.org
* Add infrastructure to track WAL usage.Amit Kapila2020-04-04
| | | | | | | | | | | | | | | | | | | | | | This allows gathering the WAL generation statistics for each statement execution. The three statistics that we collect are the number of WAL records, the number of full page writes and the amount of WAL bytes generated. This helps the users who have write-intensive workload to see the impact of I/O due to WAL. This further enables us to see approximately what percentage of overall WAL is due to full page writes. In the future, we can extend this functionality to allow us to compute the the exact amount of WAL data due to full page writes. This patch in itself is just an infrastructure to compute WAL usage data. The upcoming patches will expose this data via explain, auto_explain, pg_stat_statements and verbose (auto)vacuum output. Author: Kirill Bychik, Julien Rouhaud Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
* Include chunk overhead in hash table entry size estimate.Jeff Davis2020-04-03
| | | | | | | | Don't try to be precise about it, just use a constant 16 bytes of chunk overhead. Being smarter would require knowing the memory context where the chunk will be allocated, which is not known by all callers. Discussion: https://postgr.es/m/20200325220936.il3ni2fj2j2b45y5@alap3.anarazel.de
* Fix resource management bug with replication=database.Robert Haas2020-04-03
| | | | | | | | | | | | | | | Commit 0d8c9c1210c44b36ec2efcb223a1dfbe897a3661 allowed BASE_BACKUP to acquire a ResourceOwner without a transaction so that the backup manifest functionality could use a BufFile, but it overlooked the fact that when a walsender is used with replication=database, it might have a transaction in progress, because in that mode, SQL and replication commands can be mixed. Try to fix things up so that the two cleanup mechanisms don't conflict. Per buildfarm member serinus, which triggered the problem when CREATE_REPLICATION_SLOT failed from inside a transaction. It passed on the subsequent run, so evidently the failure doesn't happen every time.
* Be more careful about time_t vs. pg_time_t in basebackup.c.Robert Haas2020-04-03
| | | | | | | | | | | | | | | lapwing is complaining that about a call to pg_gmtime, saying that it "expected 'const pg_time_t *' but argument is of type 'time_t *'". I at first thought that the problem had someting to do with const, but Thomas Munro suggested that it might be just because time_t and pg_time_t are different identifers. lapwing is i686 rather than x86_64, and pg_time_t is always int64, so that seems like a good guess. There is other code that just casts time_t to pg_time_t without any conversion function, so try that approach here. Introduced in commit 0d8c9c1210c44b36ec2efcb223a1dfbe897a3661.
* pg_validatebackup: Fix 'make clean' to remove tmp_check.Robert Haas2020-04-03
| | | | | | Report by Tom Lane. Discussion: http://postgr.es/m/22394.1585951968@sss.pgh.pa.us
* pg_validatebackup: Adjust TAP tests to undo permissions change.Robert Haas2020-04-03
| | | | | | | It may be necessary to go further and remove this test altogether, but I'm going to try this fix first. It's not clear, at least to me, exactly how this is breaking buildfarm members, but it appears to be doing so.
* pg_validatebackup: Also use perl2host in TAP tests.Robert Haas2020-04-03
| | | | | | | | | Second try at getting the buildfarm to be happy with 003_corrution.pl as added by commit 0d8c9c1210c44b36ec2efcb223a1dfbe897a3661. Per suggestion from Álvaro Herrera. Discussion: http://postgr.es/m/20200403205412.GA8279@alvherre.pgsql