postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
...
*	Fix off-by-one in memory allocation for quote_literal_cstr().	Heikki Linnakangas	2016-12-16
\| \| \| \| \| \| \| \| \| \|	The calculation didn't take into account the NULL terminator. That lead to overwriting the palloc'd buffer by one byte, if the input consists entirely of backslashes. For example "format('%L', E'\\')". Fixes bug #14468. Backpatch to all supported versions. Report: https://www.postgresql.org/message-id/20161216105001.13334.42819%40wrigleys.postgresql.org
*	Back-patch fcff8a575198478023ada8a48e13b50f70054766 as a bug fix.	Kevin Grittner	2016-12-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When there is both a serialization failure and a unique violation, throw the former rather than the latter. When initially pushed, this was viewed as a feature to assist application framework developers, so that they could more accurately determine when to retry a failed transaction, but a test case presented by Ian Jackson has shown that this patch can prevent serialization anomalies in some cases where a unique violation is caught within a subtransaction, the work of that subtransaction is discarded, and no error is thrown. That makes this a bug fix, so it is being back-patched to all supported branches where it is not already present (i.e., 9.2 to 9.5). Discussion: https://postgr.es/m/1481307991-16971-1-git-send-email-ian.jackson@eu.citrix.com Discussion: https://postgr.es/m/22607.56276.807567.924144@mariner.uk.xensource.com
*	Build backend/parser/scan.l and interfaces/ecpg/preproc/pgc.l standalone.	Tom Lane	2016-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Back-patch commit 72b1e3a21 into the pre-9.6 branches. As noted in the original commit, this has some extra benefits: we can narrow the scope of the -Wno-error flag that's forced on scan.c. Also, since these grammar and lexer files are so large, splitting them into separate build targets should have some advantages in build speed, particularly in parallel or ccache'd builds. However, the real reason for doing this now is that it avoids symbol- redefinition warnings (or worse) with the latest version of flex. It's not unreasonable that people would want to compile our old branches with recent tools. Per report from Дилян Палаузов. Discussion: https://postgr.es/m/d845c1af-e18d-6651-178f-9f08cdf37e10@aegee.org
*	Prevent crash when ts_rewrite() replaces a non-top-level subtree with null.	Tom Lane	2016-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When ts_rewrite()'s replacement argument is an empty tsquery, it's supposed to simplify any operator nodes whose operand(s) become NULL; but it failed to do that reliably, because dropvoidsubtree() only examined the top level of the result tree. Rather than make a second recursive pass, let's just give the responsibility to dofindsubquery() to simplify while it's doing the main replacement pass. Per report from Andreas Seltenreich. Artur Zakirov, with some cosmetic changes by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/8737i01dew.fsf@credativ.de
*	Fix reporting of column typmods for multi-row VALUES constructs.	Tom Lane	2016-12-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	expandRTE() and get_rte_attribute_type() reported the exprType() and exprTypmod() values of the expressions in the first row of the VALUES as being the column type/typmod returned by the VALUES RTE. That's fine for the data type, since we coerce all expressions in a column to have the same common type. But we don't coerce them to have a common typmod, so it was possible for rows after the first one to return values that violate the claimed column typmod. This leads to the incorrect result seen in bug #14448 from Hassan Mahmood, as well as some other corner-case misbehaviors. The desired behavior is the same as we use in other type-unification cases: report the common typmod if there is one, but otherwise return -1 indicating no particular constraint. We fixed this in HEAD by deriving the typmods during transformValuesClause and storing them in the RTE, but that's not a feasible solution in the back branches. Instead, just use a brute-force approach of determining the correct common typmod during expandRTE() and get_rte_attribute_type(). Simple testing says that that doesn't really cost much, at least not in common cases where expandRTE() is only used once per query. It turns out that get_rte_attribute_type() is typically never used at all on VALUES RTEs, so the inefficiency there is of no great concern. Report: https://postgr.es/m/20161205143037.4377.60754@wrigleys.postgresql.org Discussion: https://postgr.es/m/27429.1480968538@sss.pgh.pa.us
*	Log the creation of an init fork unconditionally.	Robert Haas	2016-12-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, it was thought that this only needed to be done for the benefit of possible standbys, so wal_level = minimal skipped it. But that's not safe, because during crash recovery we might replay XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE record which recursively removes the directory that contains the new init fork. So log it always. The user-visible effect of this bug is that if you create a database or tablespace, then create an unlogged table, then crash without checkpointing, then restart, accessing the table will fail, because the it won't have been properly reset. This commit fixes that. Michael Paquier, per a report from Konstantin Knizhnik. Wording of the comments per a suggestion from me.
*	Fix test about ignoring extension dependencies during extension scripts.	Tom Lane	2016-11-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 08dd23cec introduced an exception to the rule that extension member objects can only be dropped as part of dropping the whole extension, intending to allow such drops while running the extension's own creation or update scripts. However, the exception was only applied at the outermost recursion level, because it was modeled on a pre-existing check to ignore dependencies on objects listed in pendingObjects. Bug #14434 from Philippe Beaudoin shows that this is inadequate: in some cases we can reach an extension member object by recursion from another one. (The bug concerns the serial-sequence case; I'm not sure if there are other cases, but there might well be.) To fix, revert 08dd23cec's changes to findDependentObjects() and instead apply the creating_extension exception regardless of stack level. Having seen this example, I'm a bit suspicious that the pendingObjects logic is also wrong and such cases should likewise be allowed at any recursion level. However, changing that would interact in subtle ways with the recursion logic (at least it would need to be moved to after the recursing-from check). Given that the code's been like that a long time, I'll refrain from touching it without a clear example showing it's wrong. Back-patch to all active branches. In HEAD and 9.6, where suitable test infrastructure exists, add a regression test case based on the bug report. Report: <20161125151448.6529.33039@wrigleys.postgresql.org> Discussion: <13224.1480177514@sss.pgh.pa.us>
*	Check for pending trigger events on far end when dropping an FK constraint.	Tom Lane	2016-11-25
\| \| \| \| \| \| \| \| \| \| \| \| \|	When dropping a foreign key constraint with ALTER TABLE DROP CONSTRAINT, we refuse the drop if there are any pending trigger events on the named table; this ensures that we won't remove the pg_trigger row that will be consulted by those events. But we should make the same check for the referenced relation, else we might remove a due-to-be-referenced pg_trigger row for that relation too, resulting in "could not find trigger NNN" or "relation NNN has no triggers" errors at commit. Per bug #14431 from Benjie Gillam. Back-patch to all supported branches. Report: <20161124114911.6530.31200@wrigleys.postgresql.org>
*	Make sure ALTER TABLE preserves index tablespaces.	Tom Lane	2016-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When rebuilding an existing index, ALTER TABLE correctly kept the physical file in the same tablespace, but it messed up the pg_class entry if the index had been in the database's default tablespace and "default_tablespace" was set to some non-default tablespace. This led to an inaccessible index. Fix by fixing pg_get_indexdef_string() to always include a tablespace clause, whether or not the index is in the default tablespace. The previous behavior was installed in commit 537e92e41, and I think it just wasn't thought through very clearly; certainly the possible effect of default_tablespace wasn't considered. There's some risk in changing the behavior of this function, but there are no other call sites in the core code. Even if it's being used by some third party extension, it's fairly hard to envision a usage that is okay with a tablespace clause being appended some of the time but can't handle it being appended all the time. Back-patch to all supported versions. Code fix by me, investigation and test cases by Michael Paquier. Discussion: <1479294998857-5930602.post@n3.nabble.com>
*	Fix PGLC_localeconv() to handle errors better.	Tom Lane	2016-11-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code was intentionally not very careful about leaking strdup'd strings in case of an error. That was forgivable probably, but it also failed to notice strdup() failures, which could lead to subsequent null-pointer-dereference crashes, since many callers unsurprisingly didn't check for null pointers in the struct lconv fields. An even worse problem is that it could throw error while we were setlocale'd to a non-C locale, causing unwanted behavior in subsequent libc calls. Rewrite to ensure that we cannot throw elog(ERROR) until after we've restored the previous locale settings, or at least attempted to. (I'm sorely tempted to make restore failure be a FATAL error, but will refrain for the moment.) Having done that, it's not much more work to ensure that we clean up strdup'd storage on the way out, too. This code is substantially the same in all supported branches, so back-patch all the way. Michael Paquier and Tom Lane Discussion: <CAB7nPqRMbGqa_mesopcn4MPyTs34eqtVEK7ELYxvvV=oqS00YA@mail.gmail.com>
*	Prevent multicolumn expansion of "foo.*" in an UPDATE source expression.	Tom Lane	2016-11-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because we use transformTargetList() for UPDATE as well as SELECT tlists, the code accidentally tried to expand a "" reference into several columns. This is nonsensical, because the UPDATE syntax provides exactly one target column to put the value into. The immediate result was that transformUpdateTargetList() got confused and reported "UPDATE target count mismatch --- internal error". It seems better to treat such a reference as a plain whole-row variable, as it would be in other contexts. (This could produce useful results when the target column is of composite type.) Fix by tweaking transformTargetList() to perform -expansion only conditionally, depending on its exprKind parameter. Back-patch to 9.3. The problem exists further back, but a fix would be much more invasive before that, because transformTargetList() wasn't told what kind of list it was working on. Doesn't seem worth the trouble given the lack of field reports. (I only noticed it because I was checking the code while trying to improve the documentation about how we handle "foo.*".) Discussion: <4308.1479595330@sss.pgh.pa.us>
*	Avoid pin scan for replay of XLOG_BTREE_VACUUM in all cases	Alvaro Herrera	2016-11-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replay of XLOG_BTREE_VACUUM during Hot Standby was previously thought to require complex interlocking that matched the requirements on the master. This required an O(N) operation that became a significant problem with large indexes, causing replication delays of seconds or in some cases minutes while the XLOG_BTREE_VACUUM was replayed. This commit skips the “pin scan” that was previously required, by observing in detail when and how it is safe to do so, with full documentation. The pin scan is skipped only in replay; the VACUUM code path on master is not touched here. No tests included. Manual tests using an additional patch to view WAL records and their timing have shown the change in WAL records and their handling has successfully reduced replication delay. This is a back-patch of commits 687f2cd7a015, 3e4b7d87988f, b60284261375 by Simon Riggs, to branches 9.4 and 9.5. No further backpatch is possible because this depends on catalog scans being MVCC. I (Álvaro) additionally updated a slight problem in the README, which explains why this touches the 9.6 and master branches.
*	Account for catalog snapshot in PGXACT->xmin updates.	Tom Lane	2016-11-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The CatalogSnapshot was not plugged into SnapshotResetXmin()'s accounting for whether MyPgXact->xmin could be cleared or advanced. In normal transactions this was masked by the fact that the transaction snapshot would be older, but during backend startup and certain utility commands it was possible to re-use the CatalogSnapshot after MyPgXact->xmin had been cleared, meaning that recently-deleted rows could be pruned even though this snapshot could still see them, causing unexpected catalog lookup failures. This effect appears to be the explanation for a recent failure on buildfarm member piculet. To fix, add the CatalogSnapshot to the RegisteredSnapshots heap whenever it is valid. In the previous logic, it was possible for the CatalogSnapshot to remain valid across waits for client input, but with this change that would mean it delays advance of global xmin in cases where it did not before. To avoid possibly causing new table-bloat problems with clients that sit idle for long intervals, add code to invalidate the CatalogSnapshot before waiting for client input. (When the backend is busy, it's unlikely that the CatalogSnapshot would be the oldest snap for very long, so we don't worry about forcing early invalidation of it otherwise.) In passing, remove the CatalogSnapshotStale flag in favor of using "CatalogSnapshot != NULL" to represent validity, as we do for the other special snapshots in snapmgr.c. And improve some obsolete comments. No regression test because I don't know a deterministic way to cause this failure. But the stress test shown in the original discussion provokes "cache lookup failed for relation 1255" within a few dozen seconds for me. Back-patch to 9.4 where MVCC catalog scans were introduced. (Note: it's quite easy to produce similar failures with the same test case in branches before 9.4. But MVCC catalog scans were supposed to fix that.) Discussion: <16447.1478818294@sss.pgh.pa.us>
*	Fix handling of symlinked pg_stat_tmp and pg_replslot	Magnus Hagander	2016-11-07
\| \| \| \| \| \| \| \| \| \| \|	This was already fixed in HEAD as part of 6ad8ac60 but was not backpatched. Also change the way pg_xlog is handled to be the same as the other directories. Patch from me with pg_xlog addition from Michael Paquier, test updates from David Steele.
*	Fix nasty performance problem in tsquery_rewrite().	Tom Lane	2016-10-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tsquery_rewrite() tries to find matches to subsets of AND/OR conditions; for example, in the query 'a \| b \| c' the substitution subquery 'a \| c' should match and lead to replacement of the first and third items. That's fine, but the matching algorithm apparently takes about O(2^N) for an N-clause query (I say "apparently" because the code is also both unintelligible and uncommented). We could probably do better than that even without any extra assumptions --- but actually, we know that the subclauses are sorted, indeed are depending on that elsewhere in this very same function. So we can just scan the two lists a single time to detect matches, as though we were doing a merge join. Also do a re-flattening call (QTNTernary()) in tsquery_rewrite_query, just to make sure that the tree fits the expectations of the next search cycle. I didn't try to devise a test case for this, but I'm pretty sure that the oversight could have led to failure to match in some cases where a match would be expected. Improve comments, and also stick a CHECK_FOR_INTERRUPTS into dofindsubquery, just in case it's still too slow for somebody. Per report from Andreas Seltenreich. Back-patch to all supported branches. Discussion: <8760oasf2y.fsf@credativ.de>
*	Fix bogus tree-flattening logic in QTNTernary().	Tom Lane	2016-10-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	QTNTernary() contains logic to flatten, eg, '(a & b) & c' into 'a & b & c', which is all well and good, but it tries to do that to NOT nodes as well, so that '!!a' gets changed to '!a'. Explicitly restrict the conversion to be done only on AND and OR nodes, and add a test case illustrating the bug. In passing, provide some comments for the sadly naked functions in tsquery_util.c, and simplify some baroque logic in QTNFree(), which I think may have been leaking some items it intended to free. Noted while investigating a complaint from Andreas Seltenreich. Back-patch to all supported versions.
*	If the stats collector dies during Hot Standby, restart it.	Robert Haas	2016-10-27
\| \| \| \| \| \| \| \|	This bug exists as far back as 9.0, when Hot Standby was introduced, so back-patch to all supported branches. Report and patch by Takayuki Tsunakawa, reviewed by Michael Paquier and Kuntal Ghosh.
*	Fix possible pg_basebackup failure on standby with "include WAL".	Robert Haas	2016-10-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a restartpoint flushed no dirty buffers, it could fail to update the minimum recovery point, leading to a minimum recovery point prior to the starting REDO location. perform_base_backup() would interpret that as meaning that no WAL files at all needed to be included in the backup, failing an internal sanity check. To fix, have restartpoints always update the minimum recovery point to just after the checkpoint record itself, so that the file (or files) containing the checkpoint record will always be included in the backup. Code by Amit Kapila, per a design suggestion by me, with some additional work on the code comment by me. Test case by Michael Paquier. Report by Kyotaro Horiguchi.
*	Fix incorrect trigger-property updating in ALTER CONSTRAINT.	Tom Lane	2016-10-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code to change the deferrability properties of a foreign-key constraint updated all the associated triggers to match; but a moment's examination of the code that creates those triggers in the first place shows that only some of them should track the constraint's deferrability properties. This leads to odd failures in subsequent exercise of the foreign key, as the triggers are fired at the wrong times. Fix that, and add a regression test comparing the trigger properties produced by ALTER CONSTRAINT with those you get by creating the constraint as-intended to begin with. Per report from James Parks. Back-patch to 9.4 where this ALTER functionality was introduced. Report: <CAJ3Xv+jzJ8iNNUcp4RKW8b6Qp1xVAxHwSXVpjBNygjKxcVuE9w@mail.gmail.com>
*	Translation updates	Peter Eisentraut	2016-10-24
\| \| \| \| \|	Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 475f2bcc7c56f293db4e62d31f85b3bfc0f9f279
*	Avoid testing tuple visibility without buffer lock in RI_FKey_check().	Tom Lane	2016-10-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Despite the argumentation I wrote in commit 7a2fe85b0, it's unsafe to do this, because in corner cases it's possible for HeapTupleSatisfiesSelf to try to set hint bits on the target tuple; and at least since 8.2 we have required the buffer content lock to be held while setting hint bits. The added regression test exercises one such corner case. Unpatched, it causes an assertion failure in assert-enabled builds, or otherwise would cause a hint bit change in a buffer we don't hold lock on, which given the right race condition could result in checksum failures or other data consistency problems. The odds of a problem in the field are probably pretty small, but nonetheless back-patch to all supported branches. Report: <19391.1477244876@sss.pgh.pa.us>
*	Fix EXPLAIN so that it doesn't emit invalid XML in corner cases.	Tom Lane	2016-10-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With track_io_timing = on, EXPLAIN (ANALYZE, BUFFERS) will emit fields named like "I/O Read Time". The slash makes that invalid as an XML element name, so that adding FORMAT XML would produce invalid XML. We already have code in there to translate spaces to dashes, so let's generalize that to convert anything that isn't a valid XML name character, viz letters, digits, hyphens, underscores, and periods. We could just reject slashes, which would run a bit faster. But the fact that this went unnoticed for so long doesn't give me a warm feeling that we'd notice the next creative violation, so let's make it a permanent fix. Reported by Markus Winand, though this isn't his initial patch proposal. Back-patch to 9.2 where track_io_timing was added. The problem is only latent in 9.1, so I don't feel a need to fix it there. Discussion: <E0BF6A45-68E8-45E6-918F-741FB332C6BB@winand.at>
*	Suppress "Factory" zone in pg_timezone_names view for tzdata >= 2016g.	Tom Lane	2016-10-19
\| \| \| \| \| \|	IANA got rid of the really silly "abbreviation" and replaced it with one that's only moderately silly. But it's still pointless, so keep on not showing it.
*	Fix WAL-logging of FSM and VM truncation.	Heikki Linnakangas	2016-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a relation is truncated, it is important that the FSM is truncated as well. Otherwise, after recovery, the FSM can return a page that has been truncated away, leading to errors like: ERROR: could not read block 28991 in file "base/16390/572026": read only 0 of 8192 bytes We were using MarkBufferDirtyHint() to dirty the buffer holding the last remaining page of the FSM, but during recovery, that might in fact not dirty the page, and the FSM update might be lost. To fix, use the stronger MarkBufferDirty() function. MarkBufferDirty() requires us to do WAL-logging ourselves, to protect from a torn page, if checksumming is enabled. Also fix an oversight in visibilitymap_truncate: it also needs to WAL-log when checksumming is enabled. Analysis by Pavan Deolasee. Discussion: <CABOikdNr5vKucqyZH9s1Mh0XebLs_jRhKv6eJfNnD2wxTn=_9A@mail.gmail.com> Backpatch to 9.3, where we got data checksums.
*	Fix cidin() to handle values above 2^31 platform-independently.	Tom Lane	2016-10-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CommandId is declared as uint32, and values up to 4G are indeed legal. cidout() handles them properly by treating the value as unsigned int. But cidin() was just using atoi(), which has platform-dependent behavior for values outside the range of signed int, as reported by Bart Lengkeek in bug #14379. Use strtoul() instead, as xidin() does. In passing, make some purely cosmetic changes to make xidin/xidout look more like cidin/cidout; the former didn't have a monopoly on best practice IMO. Neither xidin nor cidin make any attempt to throw error for invalid input. I didn't change that here, and am not sure it's worth worrying about since neither is really a user-facing type. The point is just to ensure that indubitably-valid inputs work as expected. It's been like this for a long time, so back-patch to all supported branches. Report: <20161018152550.1413.6439@wrigleys.postgresql.org>
*	Fix assorted integer-overflow hazards in varbit.c.	Tom Lane	2016-10-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bitshiftright() and bitshiftleft() would recursively call each other infinitely if the user passed INT_MIN for the shift amount, due to integer overflow in negating the shift amount. To fix, clamp to -VARBITMAXLEN. That doesn't change the results since any shift distance larger than the input bit string's length produces an all-zeroes result. Also fix some places that seemed inadequately paranoid about input typmods exceeding VARBITMAXLEN. While a typmod accepted by anybit_typmodin() will certainly be much less than that, at least some of these spots are reachable with user-chosen integer values. Andreas Seltenreich and Tom Lane Discussion: <87d1j2zqtz.fsf@credativ.de>
*	Fix another bug in merging of inherited CHECK constraints.	Tom Lane	2016-10-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's not good for an inherited child constraint to be marked connoinherit; that would result in the constraint not propagating to grandchild tables, if any are created later. The code mostly prevented this from happening but there was one case that was missed. This is somewhat related to commit e55a946a8, which also tightened checks on constraint merging. Hence, back-patch to 9.2 like that one. This isn't so much because there's a concrete feature-related reason to stop there, as to avoid having more distinct behaviors than we have to in this area. Amit Langote Discussion: <b28ee774-7009-313d-dd55-5bdd81242c41@lab.ntt.co.jp>
*	Try to find out the actual hugepage size when making a MAP_HUGETLB request.	Tom Lane	2016-10-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Even if Linux's mmap() is okay with a partial-hugepage request, munmap() is not, as reported by Chris Richards. Therefore it behooves us to try a bit harder to find out the actual hugepage size, instead of assuming that we can skate by with a guess. For the moment, just look into /proc/meminfo to find out the default hugepage size, and use that. Later, on kernels that support requests for nondefault sizes, we might try to consider other alternatives. But that smells more like a new feature than a bug fix, especially if we want to provide any way for the DBA to control it, so leave it for another day. I set this up to allow easy addition of platform-specific code for non-Linux platforms, if needed; but right now there are no reports suggesting that we need to work harder on other platforms. Back-patch to 9.4 where hugepage support was introduced. Discussion: <31056.1476303954@sss.pgh.pa.us>
*	Clean up handling of anonymous mmap'd shared-memory segment.	Tom Lane	2016-10-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix detaching of the mmap'd segment to have its own on_shmem_exit callback, rather than piggybacking on the one for detaching from the SysV segment. That was confusing, and given the distance between the two attach calls, it was trouble waiting to happen. Make the detaching calls idempotent by clearing AnonymousShmem to show we've already unmapped. I spent quite a bit of time yesterday trying to find a path that would allow the munmap()'s to be done twice, and while I did not succeed, it seems silly that there's even a question. Make the #ifdef logic less confusing by separating "do we want to use anonymous shmem" from EXEC_BACKEND. Even though there's no current scenario where those conditions are different, it is not helpful for different places in the same file to be testing EXEC_BACKEND for what are fundamentally different reasons. Don't do on_exit_reset() in StartBackgroundWorker(). At best that's useless (InitPostmasterChild would have done it already) and at worst it could zap some callback that's unrelated to shared memory. Improve comments, and simplify the huge_pages enablement logic slightly. Back-patch to 9.4 where hugepage support was introduced. Arguably this should go into 9.3 as well, but the code looks significantly different there, and I doubt it's worth the trouble of adapting the patch given I can't show a live bug.
*	Fix two bugs in merging of inherited CHECK constraints.	Tom Lane	2016-10-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Historically, we've allowed users to add a CHECK constraint to a child table and then add an identical CHECK constraint to the parent. This results in "merging" the two constraints so that the pre-existing child constraint ends up with both conislocal = true and coninhcount > 0. However, if you tried to do it in the other order, you got a duplicate constraint error. This is problematic for pg_dump, which needs to issue separated ADD CONSTRAINT commands in some cases, but has no good way to ensure that the constraints will be added in the required order. And it's more than a bit arbitrary, too. The goal of complaining about duplicated ADD CONSTRAINT commands can be served if we reject the case of adding a constraint when the existing one already has conislocal = true; but if it has conislocal = false, let's just make the ADD CONSTRAINT set conislocal = true. In this way, either order of adding the constraints has the same end result. Another problem was that the code allowed creation of a parent constraint marked convalidated that is merged with a child constraint that is !convalidated. In this case, an inheritance scan of the parent table could emit some rows violating the constraint condition, which would be an unexpected result given the marking of the parent constraint as validated. Hence, forbid merging of constraints in this case. (Note: valid child and not-valid parent seems fine, so continue to allow that.) Per report from Benedikt Grundmann. Back-patch to 9.2 where we introduced possibly-not-valid check constraints. The second bug obviously doesn't apply before that, and I think the first doesn't either, because pg_dump only gets into this situation when dealing with not-valid constraints. Report: <CADbMkNPT-Jz5PRSQ4RbUASYAjocV_KHUWapR%2Bg8fNvhUAyRpxA%40mail.gmail.com> Discussion: <22108.1475874586@sss.pgh.pa.us>
*	Correct logical decoding restore behaviour for subtransactions.	Andres Freund	2016-10-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before initializing iteration over a subtransaction's changes, the last few changes were not spilled to disk. That's correct if the transaction didn't spill to disk, but otherwise... This bug can lead to missed or misorderd subtransaction contents when they were spilled to disk. Move spilling of the remaining in-memory changes to ReorderBufferIterTXNInit(), where it can easily be applied to the top transaction and, if present, subtransactions. Since this code had too many bugs already, noticeably increase test coverage. Fixes: #14319 Reported-By: Huan Ruan Discussion: <20160909012610.20024.58169@wrigleys.postgresql.org> Backport: 9,4-, where logical decoding was added
*	Do ClosePostmasterPorts() earlier in SubPostmasterMain().	Tom Lane	2016-10-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In standard Unix builds, postmaster child processes do ClosePostmasterPorts immediately after InitPostmasterChild, that is almost immediately after being spawned. This is important because we don't want children holding open the postmaster's end of the postmaster death watch pipe. However, in EXEC_BACKEND builds, SubPostmasterMain was postponing this responsibility significantly, in order to make it slightly more convenient to pass the right flag value to ClosePostmasterPorts. This is bad, particularly seeing that process_shared_preload_libraries() might invoke nearly-arbitrary code. Rearrange so that we do it as soon as we've fetched the socket FDs via read_backend_variables(). Also move the comment explaining about randomize_va_space to before the call of PGSharedMemoryReAttach, which is where it's relevant. The old placement was appropriate when the reattach happened inside CreateSharedMemoryAndSemaphores, but that was a long time ago. Back-patch to 9.3; the patch doesn't apply cleanly before that, and it doesn't seem worth a lot of effort given that we've had no actual field complaints traceable to this. Discussion: <4157.1475178360@sss.pgh.pa.us>
*	Include <sys/select.h> where needed	Alvaro Herrera	2016-09-27
\| \| \| \| \| \| \| \| \| \| \| \|	<sys/select.h> is required by POSIX.1-2001 to get the prototype of select(2), but nearly no systems enforce that because older standards let you get away with including some other headers. Recent OpenBSD hacking has removed that frail touch of friendliness, however, which broke some compiles; fix all the way back to 9.1 by adding the required standard. Only vacuumdb.c was reported to fail, but it seems easier to fix the whole lot in a fell swoop. Per bug #14334 by Sean Farrell.
*	Don't trust CreateFileMapping() to clear the error code on success.	Tom Lane	2016-09-23
\| \| \| \| \| \| \| \| \| \| \| \| \|	We must test GetLastError() even when CreateFileMapping() returns a non-null handle. If that value were left over from some previous system call, we might be fooled into thinking the segment already existed. Experimentation on Windows 7 suggests that CreateFileMapping() clears the error code on success, but it is not documented to do so, so let's not rely on that happening in all Windows releases. Amit Kapila Discussion: <20811.1474390987@sss.pgh.pa.us>
*	Avoid using PostmasterRandom() for DSM control segment ID.	Tom Lane	2016-09-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commits 470d886c3 et al intended to fix the problem that the postmaster selected the same "random" DSM control segment ID on every start. But using PostmasterRandom() for that destroys the intended property that the delay between random_start_time and random_stop_time will be unpredictable. (Said delay is probably already more predictable than we could wish, but that doesn't mean that reducing it by a couple orders of magnitude is OK.) Revert the previous patch and add a comment warning against misuse of PostmasterRandom. Fix the original problem by calling srandom() early in PostmasterMain, using a low-security seed that will later be overwritten by PostmasterRandom. Discussion: <20789.1474390434@sss.pgh.pa.us>
*	Be sure to rewind the tuplestore read pointer in non-leader CTEScan nodes.	Tom Lane	2016-09-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ExecInitCteScan supposed that it didn't have to do anything to the extra tuplestore read pointer it gets from tuplestore_alloc_read_pointer. However, it needs this read pointer to be positioned at the start of the tuplestore, while tuplestore_alloc_read_pointer is actually defined as cloning the current position of read pointer 0. In normal situations that accidentally works because we initialize the whole plan tree at once, before anything gets read. But it fails in an EvalPlanQual recheck, as illustrated in bug #14328 from Dima Pavlov. To fix, just forcibly rewind the pointer after tuplestore_alloc_read_pointer. The cost of doing so is negligible unless the tuplestore is already in TSS_READFILE state, which wouldn't happen in normal cases. We could consider altering tuplestore's API to make that case cheaper, but that would make for a more invasive back-patch and it doesn't seem worth it. This has been broken probably for as long as we've had CTEs, so back-patch to all supported branches. Discussion: <32468.1474548308@sss.pgh.pa.us>
*	Use PostmasterRandom(), not random(), for DSM control segment ID.	Robert Haas	2016-09-20
\| \| \| \| \|	Otherwise, every startup gets the same "random" value, which is definitely not what was intended.
*	Retry DSM control segment creation if Windows indicates access denied.	Robert Haas	2016-09-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, attempts to run multiple postmasters running on the same machine may fail, because Windows sometimes returns ERROR_ACCESS_DENIED rather than ERROR_ALREADY_EXISTS when there is an existing segment. Hitting this bug is much more likely because of another defect not fixed by this patch, namely that dsm_postmaster_startup() uses random() which returns the same value every time. But that's not a reason not to fix this. Kyotaro Horiguchi and Amit Kapila, reviewed by Michael Paquier Discussion: <CAA4eK1JyNdMeF-dgrpHozDecpDfsRZUtpCi+1AbtuEkfG3YooQ@mail.gmail.com>
*	Fix copy/pasto in file identification	Simon Riggs	2016-09-12
\| \| \| \|	Daniel Gustafsson
*	Fix miserable coding in pg_stat_get_activity().	Tom Lane	2016-09-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit dd1a3bccc replaced a test on whether a subroutine returned a null pointer with a test on whether &pointer->backendStatus was null. This accidentally failed to fail, at least on common compilers, because backendStatus is the first field in the struct; but it was surely trouble waiting to happen. Commit f91feba87 then messed things up further, changing the logic to local_beentry = pgstat_fetch_stat_local_beentry(curr_backend); if (!local_beentry) continue; beentry = &local_beentry->backendStatus; if (!beentry) { where the second "if" is now dead code, so that the intended behavior of printing a row with "<backend information not available>" cannot occur. I suspect this is all moot because pgstat_fetch_stat_local_beentry will never actually return null in this function's usage, but it's still very poor coding. Repair back to 9.4 where the original problem was introduced.
*	Fix locking a tuple updated by an aborted (sub)transaction	Alvaro Herrera	2016-09-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When heap_lock_tuple decides to follow the update chain, it tried to also lock any version of the tuple that was created by an update that was subsequently rolled back. This is pointless, since for all intents and purposes that tuple exists no more; and moreover it causes misbehavior, as reported independently by Marko Tiikkaja and Marti Raudsepp: some SELECT FOR UPDATE/SHARE queries may fail to return the tuples, and assertion-enabled builds crash. Fix by having heap_lock_updated_tuple test the xmin and return success immediately if the tuple was created by an aborted transaction. The condition where tuples become invisible occurs when an updated tuple chain is followed by heap_lock_updated_tuple, which reports the problem as HeapTupleSelfUpdated to its caller heap_lock_tuple, which in turn propagates that code outwards possibly leading the calling code (ExecLockRows) to believe that the tuple exists no longer. Backpatch to 9.3. Only on 9.5 and newer this leads to a visible failure, because of commit 27846f02c176; before that, heap_lock_tuple skips the whole dance when the tuple is already locked by the same transaction, because of the ancient HeapTupleSatisfiesUpdate behavior. Still, the buggy condition may also exist in more convoluted scenarios involving concurrent transactions, so it seems safer to fix the bug in the old branches too. Discussion: https://www.postgresql.org/message-id/CABRT9RC81YUf1=jsmWopcKJEro=VoeG2ou6sPwyOUTx_qteRsg@mail.gmail.com https://www.postgresql.org/message-id/48d3eade-98d3-8b9a-477e-1a8dc32a724d@joh.to
*	Fix VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL	Simon Riggs	2016-09-09
\| \| \| \| \| \| \| \| \| \|	lazy_truncate_heap() was waiting for VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL, but in microseconds not milliseconds as originally intended. Found by code inspection. Simon Riggs
*	Fix mdtruncate() to close fd.c handle of deleted segments.	Andres Freund	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mdtruncate() forgot to FileClose() a segment's mdfd_vfd, when deleting it. That lead to a fd.c handle to a truncated file being kept open until backend exit. The issue appears to have been introduced way back in 1a5c450f3024ac5, before that the handle was closed inside FileUnlink(). The impact of this bug is limited - only VACUUM and ON COMMIT TRUNCATE for temporary tables, truncate files in place (i.e. TRUNCATE itself is not affected), and the relation has to be bigger than 1GB. The consequences of a leaked fd.c handle aren't severe either. Discussion: <20160908220748.oqh37ukwqqncbl3n@alap3.anarazel.de> Backpatch: all supported releases
*	Fix corrupt GIN_SEGMENT_ADDITEMS WAL records on big-endian hardware.	Tom Lane	2016-09-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	computeLeafRecompressWALData() tried to produce a uint16 WAL log field by memcpy'ing the first two bytes of an int-sized variable. That accidentally works on little-endian hardware, but not at all on big-endian. Replay then thinks it's looking at an ADDITEMS action with zero entries, and reads the first two bytes of the first TID therein as the next segno/action, typically leading to "unexpected GIN leaf action" errors during replay. Even if replay failed to crash, the resulting GIN index page would surely be incorrect. To fix, just declare the variable as uint16 instead. Per bug #14295 from Spencer Thomason (much thanks to Spencer for turning his problem into a self-contained test case). This likely also explains a previous report of the same symptom from Bernd Helmle. Back-patch to 9.4 where the problem was introduced (by commit 14d02f0bb). Discussion: <20160826072658.15676.7628@wrigleys.postgresql.org> Possible-Report: <2DA7350F7296B2A142272901@eje.land.credativ.lan>
*	Don't require dynamic timezone abbreviations to match underlying time zone.	Tom Lane	2016-09-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we threw an error if a dynamic timezone abbreviation did not match any abbreviation recorded in the referenced IANA time zone entry. That seemed like a good consistency check at the time, but it turns out that a number of the abbreviations in the IANA database are things that Olson and crew made up out of whole cloth. Their current policy is to remove such names in favor of using simple numeric offsets. Perhaps unsurprisingly, a lot of these made-up abbreviations have varied in meaning over time, which meant that our commit b2cbced9e and later changes made them into dynamic abbreviations. So with newer IANA database versions that don't mention these abbreviations at all, we fail, as reported in bug #14307 from Neil Anderson. It's worse than just a few unused-in-the-wild abbreviations not working, because the pg_timezone_abbrevs view stops working altogether (since its underlying function tries to compute the whole view result in one call). We considered deleting these abbreviations from our abbreviations list, but the problem with that is that we can't stay ahead of possible future IANA changes. Instead, let's leave the abbreviations list alone, and treat any "orphaned" dynamic abbreviation as just meaning the referenced time zone. It will behave a bit differently than it used to, in that you can't any longer override the zone's standard vs. daylight rule by using the "wrong" abbreviation of a pair, but that's better than failing entirely. (Also, this solution can be interpreted as adding a small new feature, which is that any abbreviation a user wants can be defined as referencing a time zone name.) Back-patch to all supported branches, since this problem affects all of them when using tzdata 2016f or newer. Report: <20160902031551.15674.67337@wrigleys.postgresql.org> Discussion: <6189.1472820913@sss.pgh.pa.us>
*	Fix minor 9.4-only bug in logical decoding.	Tom Lane	2016-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 9.4 version of log_heap_update() stores the full length of the old_key_tuple in xlhdr.t_len, rather than the length less offsetof(HeapTupleHeaderData, t_bits) as is customary elsewhere. DecodeUpdate() was expecting the usual definition, which caused it to copy 23 bytes too much out of the WAL-file buffer. Most of the time that's harmless, although Valgrind on buildfarm member skink has been complaining about it, and there was one actual SIGSEGV on curculio that might have been caused by this. The reconstructed key tuple would also have an over-length t_len, though offhand I do not see how that could cause any visible problem. The problem seems to have disappeared in the 9.5 WAL format changes (commit 2c03216d8), so we only need to fix 9.4. Discussion: <11035.1469058247@sss.pgh.pa.us>
*	Prevent starting a standalone backend with standby_mode on.	Tom Lane	2016-08-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This can't really work because standby_mode expects there to be more WAL arriving, which there will not ever be because there's no WAL receiver process to fetch it. Moreover, if standby_mode is on then hot standby might also be turned on, causing even more strangeness because that expects read-only sessions to be executing in parallel. Bernd Helmle reported a case where btree_xlog_delete_get_latestRemovedXid got confused, but rather than band-aiding individual problems it seems best to prevent getting anywhere near this state in the first place. Back-patch to all supported branches. In passing, also fix some omissions of errcodes in other ereport's in readRecoveryCommandFile(). Michael Paquier (errcode hacking by me) Discussion: <00F0B2CEF6D0CEF8A90119D4@eje.credativ.lan>
*	Fix pg_xlogdump so that it handles cross-page XLP_FIRST_IS_CONTRECORD record.	Fujii Masao	2016-08-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously pg_xlogdump failed to dump the contents of the WAL file if the file starts with the continuation WAL record which spans more than one pages. Since pg_xlogdump assumed that the continuation record always fits on a page, it could not find the valid WAL record to start reading from in that case. This patch changes pg_xlogdump so that it can handle a continuation WAL record which crosses a page boundary and find the valid record to start reading from. Back-patch to 9.3 where pg_xlogdump was introduced. Author: Pavan Deolasee Reviewed-By: Michael Paquier and Craig Ringer Discussion: CABOikdPsPByMiG6J01DKq6om2+BNkxHTPkOyqHM2a4oYwGKsqQ@mail.gmail.com
*	Fix small query-lifespan memory leak in bulk updates.	Tom Lane	2016-08-24
\| \| \| \| \| \| \| \| \| \| \| \| \|	When there is an identifiable REPLICA IDENTITY index on the target table, heap_update leaks the id_attrs bitmapset. That's not many bytes, but it adds up over enough rows, since the code typically runs in a query-lifespan context. Bug introduced in commit e55704d8b, which did a rather poor job of cloning the existing use-pattern for RelationGetIndexAttrBitmap(). Per bug #14293 from Zhou Digoal. Back-patch to 9.4 where the bug was introduced. Report: <20160824114320.15676.45171@wrigleys.postgresql.org>
*	Fix improper repetition of previous results from a hashed aggregate.	Tom Lane	2016-08-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ExecReScanAgg's check for whether it could re-use a previously calculated hashtable neglected the possibility that the Agg node might reference PARAM_EXEC Params that are not referenced by its input plan node. That's okay if the Params are in upper tlist or qual expressions; but if one appears in aggregate input expressions, then the hashtable contents need to be recomputed when the Param's value changes. To avoid unnecessary performance degradation in the case of a Param that isn't within an aggregate input, add logic to the planner to determine which Params are within aggregate inputs. This requires a new field in struct Agg, but fortunately we never write plans to disk, so this isn't an initdb-forcing change. Per report from Jeevan Chalke. This has been broken since forever, so back-patch to all supported branches. Andrew Gierth, with minor adjustments by me Report: <CAM2+6=VY8ykfLT5Q8vb9B6EbeBk-NGuLbT6seaQ+Fq4zXvrDcA@mail.gmail.com>