postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Reduce excessive dereferencing of function pointers	Peter Eisentraut	2017-09-07
\| \| \| \| \| \| \| \| \| \| \| \|	It is equivalent in ANSI C to write (*funcptr) () and funcptr(). These two styles have been applied inconsistently. After discussion, we'll use the more verbose style for plain function pointer variables, to make it clear that it's a variable, and the shorter style when the function pointer is in a struct (s.func() or s->func()), because then it's clear that it's not a plain function name, and otherwise the excessive punctuation makes some of those invocations hard to read. Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com
*	Even if some partitions are foreign, allow tuple routing.	Robert Haas	2017-09-07
\| \| \| \| \| \| \| \| \| \|	This doesn't allow routing tuple to the foreign partitions themselves, but it permits tuples to be routed to regular partitions despite the presence of foreign partitions in the same inheritance hierarchy. Etsuro Fujita, reviewed by Amit Langote and by me. Discussion: http://postgr.es/m/bc3db4c1-1693-3b8a-559f-33ad2b50b7ad@lab.ntt.co.jp
*	Fix handling of savepoint commands within multi-statement Query strings.	Tom Lane	2017-09-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Issuing a savepoint-related command in a Query message that contains multiple SQL statements led to a FATAL exit with a complaint about "unexpected state STARTED". This is a shortcoming of commit 4f896dac1, which attempted to prevent such misbehaviors in multi-statement strings; its quick hack of marking the individual statements as "not top-level" does the wrong thing in this case, and isn't a very accurate description of the situation anyway. To fix, let's introduce into xact.c an explicit model of what happens for multi-statement Query strings. This is an "implicit transaction block in progress" state, which for many purposes works like the normal TBLOCK_INPROGRESS state --- in particular, IsTransactionBlock returns true, causing the desired result that PreventTransactionChain will throw error. But in case of error abort it works like TBLOCK_STARTED, allowing the transaction to be cancelled without need for an explicit ROLLBACK command. Commit 4f896dac1 is reverted in toto, so that we go back to treating the individual statements as "top level". We could have left it as-is, but this allows sharpening the error message for PreventTransactionChain calls inside functions. Except for getting a normal error instead of a FATAL exit for savepoint commands, this patch should result in no user-visible behavioral change (other than that one error message rewording). There are some things we might want to do in the line of changing the appearance or wording of error and warning messages around this behavior, which would be much simpler to do now that it's an explicitly modeled state. But I haven't done them here. Although this fixes a long-standing bug, no backpatch. The consequences of the bug don't seem severe enough to justify the risk that this commit itself creates some new issue. Patch by me, but it owes something to previous investigation by Takayuki Tsunakawa, who also reported the bug in the first place. Also thanks to Michael Paquier for reviewing. Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F6BE40D@G01JPEXMBYT05
*	Further marginal hacking on generic atomic ops.	Tom Lane	2017-09-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the generic atomic ops that rely on a loop around a CAS primitive, there's no need to force the initial read of the "old" value to be atomic. In the typically-rare case that we get a torn value, that simply means that the first CAS attempt will fail; but it will update "old" to the atomically-read value, so the next attempt has a chance of succeeding. It was already being done that way in pg_atomic_exchange_u64_impl(), but let's duplicate the approach in the rest. (Given the current coding of the pg_atomic_read functions, this change is a no-op anyway on popular platforms; it only makes a difference where pg_atomic_read_u64_impl() is implemented as a CAS.) In passing, also remove unnecessary take-a-pointer-and-dereference-it coding in the pg_atomic_read functions. That seems to have been based on a misunderstanding of what the C standard requires. What actually matters is that the pointer be declared as pointing to volatile, which it is. I don't believe this will change the assembly code at all on x86 platforms (even ignoring the likelihood that these implementations get overridden by others); but it may help on less-mainstream CPUs. Discussion: https://postgr.es/m/13707.1504718238@sss.pgh.pa.us
*	Exclude special values in recovery_target_time	Simon Riggs	2017-09-07
\| \| \| \| \| \| \| \| \| \|	recovery_target_time accepts timestamp input, though does not allow use of special values, e.g. “today”. Report a useful error message for these cases. Reported-by: Piotr Stefaniak Author: Simon Riggs Discussion: https://postgr.es/m/CANP8+jJdKA+BkkYLWz9zAm16Y0s2ExBv0WfpAwXdTpPfWnA9Bg@mail.gmail.com
*	Merge duplicative code for \sf/\sv, \ef/\ev in psql/command.c.	Tom Lane	2017-09-06
\| \| \| \| \| \| \| \|	Saves ~150 lines, costs little. Fabien Coelho, reviewed by Victor Drobny Discussion: https://postgr.es/m/alpine.DEB.2.20.1703311958001.14355@lancre
*	Allow SET STATISTICS on expression indexes	Simon Riggs	2017-09-06
\| \| \| \| \| \| \| \| \| \| \| \| \|	Index columns are referenced by ordinal number rather than name, e.g. CREATE INDEX coord_idx ON measured (x, y, (z + t)); ALTER INDEX coord_idx ALTER COLUMN 3 SET STATISTICS 1000; Incompatibility note for release notes: \d+ for indexes now also displays Stats Target Authors: Alexander Korotkov, with contribution by Adrien NAYRAT Review: Adrien NAYRAT, Simon Riggs Wordsmith: Simon Riggs
*	Use more of gcc's __sync_fetch_and_xxx builtin functions for atomic ops.	Tom Lane	2017-09-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In addition to __sync_fetch_and_add, gcc offers __sync_fetch_and_sub, __sync_fetch_and_and, and __sync_fetch_and_or, which correspond directly to primitive atomic ops that we want. Testing shows that in some cases they generate better code than our generic implementations, so use them. We've assumed that configure's test for __sync_val_compare_and_swap is sufficient to allow assuming that __sync_fetch_and_add is available, so make the same assumption for these functions. Should that prove to be wrong, we can add more configure tests. Yura Sokolov, reviewed by Jesper Pedersen and myself Discussion: https://postgr.es/m/7f65886daca545067f82bf2b463b218d@postgrespro.ru
*	Remove duplicate reads from the inner loops in generic atomic ops.	Tom Lane	2017-09-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pg_atomic_compare_exchange_xxx functions are defined to update *expected to whatever they read from the target variable. Therefore, there's no need to do additional explicit reads after we've initialized the "old" variable. The actual benefit of this is somewhat debatable, but it seems fairly unlikely to hurt anything, especially since we will override the generic implementations in most performance-sensitive cases. Yura Sokolov, reviewed by Jesper Pedersen and myself Discussion: https://postgr.es/m/7f65886daca545067f82bf2b463b218d@postgrespro.ru
*	Clean up handling of dropped columns in NAMEDTUPLESTORE RTEs.	Tom Lane	2017-09-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The NAMEDTUPLESTORE patch piggybacked on the infrastructure for TABLEFUNC/VALUES/CTE RTEs, none of which can ever have dropped columns, so the possibility was ignored most places. Fix that, including adding a specification to parsenodes.h about what it's supposed to look like. In passing, clean up assorted comments that hadn't been maintained properly by said patch. Per bug #14799 from Philippe Beaudoin. Back-patch to v10. Discussion: https://postgr.es/m/20170906120005.25630.84360@wrigleys.postgresql.org
*	Add \gdesc psql command.	Tom Lane	2017-09-05
\| \| \| \| \| \| \| \| \| \| \| \| \|	This command acts somewhat like \g, but instead of executing the query buffer, it merely prints a description of the columns that the query result would have. (Of course, this still requires parsing the query; if parse analysis fails, you get an error anyway.) We accomplish this using an unnamed prepared statement, which should be invisible to psql users. Pavel Stehule, reviewed by Fabien Coelho Discussion: https://postgr.es/m/CAFj8pRBhYVvO34FU=EKb=nAF5t3b++krKt1FneCmR0kuF5m-QA@mail.gmail.com
*	Use lfirst_node() and linitial_node() where appropriate in planner.c.	Tom Lane	2017-09-05
\| \| \| \| \| \| \| \| \|	There's no particular reason to target this module for the first wholesale application of these macros; but we gotta start somewhere. Ashutosh Bapat and Jeevan Chalke Discussion: https://postgr.es/m/CAFjFpRcNr3r=u0ni=7A4GD9NnHQVq+dkFafzqo2rS6zy=dt1eg@mail.gmail.com
*	Remove endof macro	Peter Eisentraut	2017-09-05
\| \| \| \| \| \| \| \|	It has not been used in a long time, and it doesn't seem safe anyway, so drop it. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
*	Remove unnecessary parentheses in return statements	Peter Eisentraut	2017-09-05
\| \| \| \| \| \| \| \|	The parenthesized style has only been used in a few modules. Change that to use the style that is predominant across the whole tree. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
*	Remove our own definition of NULL	Peter Eisentraut	2017-09-05
\| \| \| \| \| \| \|	Surely everyone has that by now. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
*	Support retaining data dirs on successful TAP tests	Peter Eisentraut	2017-09-05
\| \| \| \| \| \| \| \| \| \| \| \| \|	This moves the data directories from using temporary directories with randomness in the directory name to a static name, to make it easier to debug. The data directory will be retained if tests fail or the test code dies/exits with failure, and is automatically removed on the next make check. If the environment variable PG_TEST_NOCLEAN is defined, the data directories will be retained regardless of test or exit status. Author: Daniel Gustafsson <daniel@yesql.se>
*	In psql, use PSQL_PAGER in preference to PAGER, if it's set.	Tom Lane	2017-09-05
\| \| \| \| \| \| \| \| \| \|	This allows the user's environment to set up a psql-specific choice of pager, in much the same way that we provide PSQL_EDITOR to allow a psql-specific override of the more widely known EDITOR variable. Pavel Stehule, reviewed by Thomas Munro Discussion: https://postgr.es/m/CAFj8pRD3RRk9S1eRbnGm_T6brc3Ss5mohraNzTSJquzx+pmtKA@mail.gmail.com
*	Correct base backup throttling	Alvaro Herrera	2017-09-05
\| \| \| \| \| \| \| \| \| \| \| \| \|	Throttling for sending a base backup in walsender is broken for the case where there is a lot of WAL traffic, because the latch used to put the walsender to sleep is also signalled by regular WAL traffic (and each signal causes an additional batch of data to be sent); the net effect is that there is no or little actual throttling. This is undesirable, so rewrite the sleep into a loop to achieve the desired effeect. Author: Jeff Janes, small tweaks by me Reviewed-by: Antonin Houska Discussion: https://postgr.es/m/CAMkU=1xH6mde-yL-Eo1TKBGNd0PB1-TMxvrNvqcAkN-qr2E9mw@mail.gmail.com
*	Add psql variables showing server version and psql version.	Tom Lane	2017-09-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already had a psql variable VERSION that shows the verbose form of psql's own version. Add VERSION_NAME to show the short form (e.g., "11devel") and VERSION_NUM to show the numeric form (e.g., 110000). Also add SERVER_VERSION_NAME and SERVER_VERSION_NUM to show the short and numeric forms of the server's version. (We'd probably add SERVER_VERSION with the verbose string if it were readily available; but adding another network round trip to get it seems too expensive.) The numeric forms, in particular, are expected to be useful for scripting purposes, now that psql can do conditional tests. Fabien Coelho, reviewed by Pavel Stehule Discussion: https://postgr.es/m/alpine.DEB.2.20.1704020917220.4632@lancre
*	Reformat psql's --help=variables output.	Tom Lane	2017-09-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous format with variable names and descriptions in separate columns was extremely constraining about the length of the descriptions. We'd dealt with that in several inconsistent ways over the years, including letting the lines run over 80 characters, breaking descriptions into multiple lines, or shoving the description onto a separate line. But it's been a long time since the output could realistically fit onto a single screen vertically, so let's just rely even more heavily on the pager to deal with the vertical distance, and split each entry into two (or more) lines, in the format variable-name variable description goes here Each variable name + description remains a single translatable string, in hopes of reducing translator confusion; we're just changing the embedded whitespace. I failed to resist the temptation to copy-edit one or two of the descriptions while at it. Discussion: https://postgr.es/m/2947.1504542679@sss.pgh.pa.us
*	Be more careful about newline-chomping in pgbench.	Tom Lane	2017-09-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	process_backslash_command would drop the last character of the input command on the assumption that it was a newline. Given a non newline terminated input file, this could result in dropping the last character of the command. Fix that by doing an actual test that we're removing a newline. While at it, allow for Windows newlines (\r\n), and suppress multiple newlines if any. I do not think either of those cases really occur, since (a) we read script files in text mode and (b) the lexer stops when it hits a newline. But it's cheap enough and it provides a stronger guarantee about what the result string looks like. This is just cosmetic, I think, since the possibly-overly-chomped line was only used for display not for further processing. So it doesn't seem necessary to back-patch. Fabien Coelho, reviewed by Nikolay Shaplov, whacked around a bit by me Discussion: https://postgr.es/m/alpine.DEB.2.20.1704171422500.4025@lancre
*	Fix some subtle problems in pgbench transaction stats counting.	Tom Lane	2017-09-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With --latency-limit, transactions might get skipped even beyond the transaction count limit specified by -t, throwing off the expected number of transactions and thus the denominator for later stats. Be sure to stop skipping transactions once -t is reached. Also, include skipped transactions in the "cnt" fields; this requires discounting them again in a couple of places, but most places are better off with this definition. In particular this is needed to get correct overall stats for the combination of -R/-L/-t options. Merge some more processing into processXactStats() to simplify this. In passing, add a check that --progress-timestamp is specified only when --progress is. We might consider back-patching this, but given that it only matters for a combination of options, and given the lack of field complaints, consensus seems to be not to bother. Fabien Coelho, reviewed by Nikolay Shaplov Discussion: https://postgr.es/m/alpine.DEB.2.20.1704171422500.4025@lancre
*	Adjust pgbench to allow non-ASCII characters in variable names.	Tom Lane	2017-09-04
\| \| \| \| \| \| \| \| \| \|	This puts it in sync with psql's notion of what is a valid variable name. Like psql, we document that "non-Latin letters" are allowed, but actually any non-ASCII character is accepted. Fabien Coelho Discussion: https://postgr.es/m/20170405.094548.1184280384967203518.t-ishii@sraoss.co.jp
*	Fix translatable string	Alvaro Herrera	2017-09-04
\| \| \| \|	Discussion: https://postgr.es/m/20170828130545.sdajqlpr37hmmd6a@alvherre.pgsql
*	Suppress compiler warnings in dshash.c.	Tom Lane	2017-09-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some compilers complain, not unreasonably, about left-shifting an int32 "1" and then assigning the result to an int64. In practice I sure hope that this data structure never gets large enough that an overflow would actually occur; but let's cast the constant to the right type to avoid the hazard. In passing, fix a typo in dshash.h. Amit Kapila, adjusted as per comment from Thomas Munro. Discussion: https://postgr.es/m/CAA4eK1+5vfVMYtjK_NX8O3-42yM3o80qdqWnQzGquPrbq6mb+A@mail.gmail.com
*	Fix macro-redefinition warning on MSVC.	Tom Lane	2017-09-03
\| \| \| \| \| \| \| \| \| \| \|	In commit 9d6b160d7, I tweaked pg_config.h.win32 to use "#define HAVE_LONG_LONG_INT_64 1" rather than defining it as empty, for consistency with what happens in an autoconf'd build. But Solution.pm injects another definition of that macro into ecpg_config.h, leading to justifiable (though harmless) compiler whining. Make that one consistent too. Back-patch, like the previous patch. Discussion: https://postgr.es/m/CAEepm=1dWsXROuSbRg8PbKLh0S=8Ou-V8sr05DxmJOF5chBxqQ@mail.gmail.com
*	Improve division of labor between execParallel.c and nodeGather[Merge].c.	Tom Lane	2017-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the responsibility for creating/destroying TupleQueueReaders into execParallel.c, to avoid duplicative coding in nodeGather.c and nodeGatherMerge.c. Also, instead of having DestroyTupleQueueReader do shm_mq_detach, do it in the caller (which is now only ExecParallelFinish). This means execParallel.c does both the attaching and detaching of the tuple-queue-reader shm_mqs, which seems less weird than the previous arrangement. These changes also eliminate a vestigial memory leak (of the pei->tqueue array). It's now demonstrable that rescans of Gather or GatherMerge don't leak memory. Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us
*	Add memory info to getrusage output	Peter Eisentraut	2017-09-01
\| \| \| \| \| \| \| \|	Add the maxrss field to the getrusage output (log_*_stats). This was previously omitted because of portability concerns, but we feel this might not be a concern anymore. based on patch by Justin Pryzby <pryzby@telsasoft.com>
*	Tighten up some code in RelationBuildPartitionDesc.	Robert Haas	2017-09-01
\| \| \| \| \| \| \| \| \| \|	This probably doesn't save anything meaningful in terms of performance, but making the code simpler is a good idea anyway. Code by Beena Emerson, extracted from a larger patch by Jeevan Ladhe, slightly adjusted by me. Discussion: http://postgr.es/m/CAOgcT0ONgwajdtkoq+AuYkdTPY9cLWWLjxt_k4SXue3eieAr+g@mail.gmail.com
*	Make [U]INT64CONST safe for use in #if conditions.	Tom Lane	2017-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of using a cast to force the constant to be the right width, assume we can plaster on an L, UL, LL, or ULL suffix as appropriate. The old approach to this is very hoary, dating from before we were willing to require compilers to have working int64 types. This fix makes the PG_INT64_MIN, PG_INT64_MAX, and PG_UINT64_MAX constants safe to use in preprocessor conditions, where a cast doesn't work. Other symbolic constants that might be defined using [U]INT64CONST are likewise safer than before. Also fix the SIZE_MAX macro to be similarly safe, if we are forced to provide a definition for that. The test added in commit 2e70d6b5e happens to do what we want even with the hack "(size_t) -1" definition, but we could easily get burnt on other tests in future. Back-patch to all supported branches, like the previous commits. Discussion: https://postgr.es/m/15883.1504278595@sss.pgh.pa.us
*	Ensure SIZE_MAX can be used throughout our code.	Tom Lane	2017-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pre-C99 platforms may lack <stdint.h> and thereby SIZE_MAX. We have a couple of places using the hack "(size_t) -1" as a fallback, but it wasn't universally available; which means the code added in commit 2e70d6b5e fails to compile everywhere. Move that hack to c.h so that we can rely on having SIZE_MAX everywhere. Per discussion, it'd be a good idea to make the macro's value safe for use in #if-tests, but that will take a bit more work. This is just a quick expedient to get the buildfarm green again. Back-patch to all supported branches, like the previous commit. Discussion: https://postgr.es/m/15883.1504278595@sss.pgh.pa.us
*	pg_dumpall: Add a -E flag to set the encoding, like pg_dump has.	Robert Haas	2017-09-01
\| \| \| \| \| \|	Michael Paquier, reviewed by Fabien Coelho Discussion: http://postgr.es/m/CAB7nPqQcYWmrm2n-dVaMUhYPKFU_DxQwPuUGuC4ZF+8B=dS5xQ@mail.gmail.com
*	Use group updates when setting transaction status in clog.	Robert Haas	2017-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 0e141c0fbb211bdd23783afa731e3eef95c9ad7a introduced a mechanism to reduce contention on ProcArrayLock by having a single process clear XIDs in the procArray on behalf of multiple processes, reducing the need to hand the lock around. A previous attempt to introduce a similar mechanism for CLogControlLock in ccce90b398673d55b0387b3de66639b1b30d451b crashed and burned, but the design problem which resulted in those failures is believed to have been corrected in this version. Amit Kapila, with some cosmetic changes by me. See the previous commit message for additional credits. Discussion: http://postgr.es/m/CAA4eK1KudxzgWhuywY_X=yeSAhJMT4DwCjroV5Ay60xaeB2Eew@mail.gmail.com
*	Fix two-phase commit test for recovery mode	Alvaro Herrera	2017-09-01
\| \| \| \| \| \| \| \| \|	The original code had a race condition because it never ensured the standby was caught up before proceeding; add a wait similar to every other place that does this. Author: Michaël Paquier Discussion: https://postgr.es/m/CAB7nPqTm9p+LCm1mVJYvgpwagRK+uibT-pKq0O2-paOWxT62jw@mail.gmail.com
*	Restore behavior for replication origin drop	Alvaro Herrera	2017-09-01
\| \| \| \| \| \| \| \| \| \|	Do for replication origins what the previous commit did for replication slots: restore the original behavior of replication origin drop to raise an error rather than blocking, because users might be depending on the original behavior. Maintain the blocking behavior when invoked internally from logical replication subscription handling. Discussion: https://postgr.es/m/20170830133922.tlpo3lgfejm4n2cs@alvherre.pgsql
*	Avoid race condition in logical replication test	Simon Riggs	2017-09-01
\| \| \| \| \| \|	Wait for slot to become inactive before continuing. Author: Petr Jelinek
*	Add a WAIT option to DROP_REPLICATION_SLOT	Alvaro Herrera	2017-09-01
\| \| \| \| \| \| \| \| \| \| \|	Commit 9915de6c1cb2 changed the default behavior of DROP_REPLICATION_SLOT so that it would wait until any session holding the slot active would release it, instead of raising an error. But users are already depending on the original behavior, so revert to it by default and add a WAIT option to invoke the new behavior. Per complaint from Simone Gotti, in Discussion: https://postgr.es/m/CAEvsy6Wgdf90O6pUvg2wSVXL2omH5OPC-38OD4Zzgk-FXavj3Q@mail.gmail.com
*	Fix assorted carelessness about Datum vs. int64 vs. uint64	Robert Haas	2017-09-01
\| \| \| \|	Bugs introduced by commit 81c5e46c490e2426db243eada186995da5bb0ba7
*	Try to repair poorly-considered code in previous commit.	Robert Haas	2017-08-31
\|
*	Introduce 64-bit hash functions with a 64-bit seed.	Robert Haas	2017-08-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will be useful for hash partitioning, which needs a way to seed the hash functions to avoid problems such as a hash index on a hash partitioned table clumping all values into a small portion of the bucket space; it's also useful for anything that wants a 64-bit hash value rather than a 32-bit hash value. Just in case somebody wants a 64-bit hash value that is compatible with the existing 32-bit hash values, make the low 32-bits of the 64-bit hash value match the 32-bit hash value when the seed is 0. Robert Haas and Amul Sul Discussion: http://postgr.es/m/CA+Tgmoafx2yoJuhCQQOL5CocEi-w_uG4S2xT0EtgiJnPGcHW3g@mail.gmail.com
*	Avoid memory leaks when a GatherMerge node is rescanned.	Tom Lane	2017-08-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rescanning a GatherMerge led to leaking some memory in the executor's query-lifespan context, because most of the node's working data structures were simply abandoned and rebuilt from scratch. In practice, this might never amount to much, given the cost of relaunching worker processes --- but it's still pretty messy, so let's fix it. We can rearrange things so that the tuple arrays are simply cleared and reused, and we don't need to rebuild the TupleTableSlots either, just clear them. One small complication is that because we might get a different number of workers on each iteration, we can't keep the old convention that the leader's gm_slots[] entry is the last one; the leader might clobber a TupleTableSlot that we need for a worker in a future iteration. Hence, adjust the logic so that the leader has slot 0 always, while the active workers have slots 1..n. Back-patch to v10 to keep all the existing versions of nodeGatherMerge.c in sync --- because of the renumbering of the slots, there would otherwise be a very large risk that any future backpatches in this module would introduce bugs. Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us
*	Expand partitioned tables in PartDesc order.	Robert Haas	2017-08-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we expanded the inheritance hierarchy in the order in which find_all_inheritors had locked the tables, but that turns out to block quite a bit of useful optimization. For example, a partition-wise join can't count on two tables with matching bounds to get expanded in the same order. Where possible, this change results in expanding partitioned tables in bound order. Bound order isn't well-defined for a list-partitioned table with a null-accepting partition or for a list-partitioned table where the bounds for a single partition are interleaved with other partitions. However, when expansion in bound order is possible, it opens up further opportunities for optimization, such as strength-reducing MergeAppend to Append when the expansion order matches the desired sort order. Patch by me, with cosmetic revisions by Ashutosh Bapat. Discussion: http://postgr.es/m/CA+TgmoZrKj7kEzcMSum3aXV4eyvvbh9WD=c6m=002WMheDyE3A@mail.gmail.com
*	Clean up shm_mq cleanup.	Tom Lane	2017-08-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The logic around shm_mq_detach was a few bricks shy of a load, because (contrary to the comments for shm_mq_attach) all it did was update the shared shm_mq state. That left us leaking a bit of process-local memory, but much worse, the on_dsm_detach callback for shm_mq_detach was still armed. That means that whenever we ultimately detach from the DSM segment, we'd run shm_mq_detach again for already-detached, possibly long-dead queues. This accidentally fails to fail today, because we only ever re-use a shm_mq's memory for another shm_mq, and multiple detach attempts on the last such shm_mq are fairly harmless. But it's gonna bite us someday, so let's clean it up. To do that, change shm_mq_detach's API so it takes a shm_mq_handle not the underlying shm_mq. This makes the callers simpler in most cases anyway. Also fix a few places in parallel.c that were just pfree'ing the handle structs rather than doing proper cleanup. Back-patch to v10 because of the risk that the revenant shm_mq_detach callbacks would cause a live bug sometime. Since this is an API change, it's too late to do it in 9.6. (We could make a variant patch that preserves API, but I'm not excited enough to do that.) Discussion: https://postgr.es/m/8670.1504192177@sss.pgh.pa.us
*	Improve code coverage of select_parallel test.	Tom Lane	2017-08-31
\| \| \| \| \|	Make sure that rescans of parallel indexscans are tested. Per code coverage report.
*	Remove to pre-8.2 coding convention for PG_MODULE_MAGIC	Peter Eisentraut	2017-08-30
\| \| \| \| \| \| \| \|	PG_MODULE_MAGIC has been around since 8.2, with 8.1 long since in EOL, so remove the mention of #ifdef guards for compiling against pre-8.2 sources from the documentation. Author: Daniel Gustafsson <daniel@yesql.se>
*	Code review for nodeGatherMerge.c.	Tom Lane	2017-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Comment the fields of GatherMergeState, and organize them a bit more sensibly. Comment GMReaderTupleBuffer more usefully too. Improve assorted other comments that were obsolete or just not very good English. Get rid of the use of a GMReaderTupleBuffer for the leader process; that was confusing, since only the "done" field was used, and that in a way redundant with need_to_scan_locally. In gather_merge_init, avoid calling load_tuple_array for already-known-exhausted workers. I'm not sure if there's a live bug there, but the case is unlikely to be well tested due to timing considerations. Remove some useless code, such as duplicating the tts_isempty test done by TupIsNull. Remove useless initialization of ps.qual, replacing that with an assertion that we have no qual to check. (If we did, the code would fail to check it.) Avoid applying heap_copytuple to a null tuple. While that fails to crash, it's confusing and it makes the code less legible not more so IMO. Propagate a couple of these changes into nodeGather.c, as well. Back-patch to v10, partly because of the possibility that the gather_merge_init change is fixing a live bug, but mostly to keep the branches in sync to ease future bug fixes.
*	Separate reinitialization of shared parallel-scan state from ExecReScan.	Tom Lane	2017-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the parallel executor logic did reinitialization of shared state within the ExecReScan code for parallel-aware scan nodes. This is problematic, because it means that the ExecReScan call has to occur synchronously (ie, during the parent Gather node's ReScan call). That is swimming very much against the tide so far as the ExecReScan machinery is concerned; the fact that it works at all today depends on a lot of fragile assumptions, such as that no plan node between Gather and a parallel-aware scan node is parameterized. Another objection is that because ExecReScan might be called in workers as well as the leader, hacky extra tests are needed in some places to prevent unwanted shared-state resets. Hence, let's separate this code into two functions, a ReInitializeDSM call and the ReScan call proper. ReInitializeDSM is called only in the leader and is guaranteed to run before we start new workers. ReScan is returned to its traditional function of resetting only local state, which means that ExecReScan's usual habits of delaying or eliminating child rescan calls are safe again. As with the preceding commit 7df2c1f8d, it doesn't seem to be necessary to make these changes in 9.6, which is a good thing because the FDW and CustomScan APIs are impacted. Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
*	Restore test case from a2b70c89ca1a5fcf6181d3c777d82e7b83d2de1b.	Tom Lane	2017-08-30
\| \| \| \| \| \| \| \|	Revert the reversion commits a20aac890 and 9b644745c. In the wake of commit 7df2c1f8d, we should get stable buildfarm results from this test; if not, I'd like to know sooner not later. Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
*	Force rescanning of parallel-aware scan nodes below a Gather[Merge].	Tom Lane	2017-08-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ExecReScan machinery contains various optimizations for postponing or skipping rescans of plan subtrees; for example a HashAgg node may conclude that it can re-use the table it built before, instead of re-reading its input subtree. But that is wrong if the input contains a parallel-aware table scan node, since the portion of the table scanned by the leader process is likely to vary from one rescan to the next. This explains the timing-dependent buildfarm failures we saw after commit a2b70c89c. The established mechanism for showing that a plan node's output is potentially variable is to mark it as depending on some runtime Param. Hence, to fix this, invent a dummy Param (one that has a PARAM_EXEC parameter number, but carries no actual value) associated with each Gather or GatherMerge node, mark parallel-aware nodes below that node as dependent on that Param, and arrange for ExecReScanGather[Merge] to flag that Param as changed whenever the Gather[Merge] node is rescanned. This solution breaks an undocumented assumption made by the parallel executor logic, namely that all rescans of nodes below a Gather[Merge] will happen synchronously during the ReScan of the top node itself. But that's fundamentally contrary to the design of the ExecReScan code, and so was doomed to fail someday anyway (even if you want to argue that the bug being fixed here wasn't a failure of that assumption). A follow-on patch will address that issue. In the meantime, the worst that's expected to happen is that given very bad timing luck, the leader might have to do all the work during a rescan, because workers think they have nothing to do, if they are able to start up before the eventual ReScan of the leader's parallel-aware table scan node has reset the shared scan state. Although this problem exists in 9.6, there does not seem to be any way for it to manifest there. Without GatherMerge, it seems that a plan tree that has a rescan-short-circuiting node below Gather will always also have one above it that will short-circuit in the same cases, preventing the Gather from being rescanned. Hence we won't take the risk of back-patching this change into 9.6. But v10 needs it. Discussion: https://postgr.es/m/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ@mail.gmail.com
*	Teach libpq to detect integer overflow in the row count of a PGresult.	Tom Lane	2017-08-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding more than 1 billion rows to a PGresult would overflow its ntups and tupArrSize fields, leading to client crashes. It'd be desirable to use wider fields on 64-bit machines, but because all of libpq's external APIs use plain "int" for row counters, that's going to be hard to accomplish without an ABI break. Given the lack of complaints so far, and the general pain that would be involved in using such huge PGresults, let's settle for just preventing the overflow and reporting a useful error message if it does happen. Also, for a couple more lines of code we can increase the threshold of trouble from INT_MAX/2 to INT_MAX rows. To do that, refactor pqAddTuple() to allow returning an error message that replaces the default assumption that it failed because of out-of-memory. Along the way, fix PQsetvalue() so that it reports all failures via pqInternalNotice(). It already did so in the case of bad field number, but neglected to report anything for other error causes. Because of the potential for crashes, this seems like a back-patchable bug fix, despite the lack of field reports. Michael Paquier, per a complaint from Igor Korot. Discussion: https://postgr.es/m/CA+FnnTxyLWyjY1goewmJNxC==HQCCF4fKkoCTa9qR36oRAHDPw@mail.gmail.com