aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Fix precision and rounding issues in money multiplication and division.Tom Lane2017-05-21
| | | | | | | | | | | | | | | | | | | | | | | The cash_div_intX functions applied rint() to the result of the division. That's not merely useless (because the result is already an integer) but it causes precision loss for values larger than 2^52 or so, because of the forced conversion to float8. On the other hand, the cash_mul_fltX functions neglected to apply rint() to their multiplication results, thus possibly causing off-by-one outputs. Per C standard, arithmetic between any integral value and a float value is performed in float format. Thus, cash_mul_flt4 and cash_div_flt4 produced answers good to only about six digits, even when the float value is exact. We can improve matters noticeably by widening the float inputs to double. (It's tempting to consider using "long double" arithmetic if available, but that's probably too much of a stretch for a back-patched fix.) Also, document that cash_div_intX operators truncate rather than round. Per bug #14663 from Richard Pistole. Back-patch to all supported branches. Discussion: https://postgr.es/m/22403.1495223615@sss.pgh.pa.us
* Fix typo in comment.Heikki Linnakangas2017-05-18
| | | | Daniel Gustafsson
* Fix new warnings from GCC 7Peter Eisentraut2017-05-16
| | | | | This addresses the new warning types -Wformat-truncation -Wformat-overflow that are part of -Wall, via -Wformat, in GCC 7.
* Fix unsafe reference into relcache in constructed CommentStmt.Tom Lane2017-05-15
| | | | | | | | | | | | | | The CommentStmt made by RebuildConstraintComment() has to pstrdup the relation name, else it will contain a dangling pointer after that relcache entry is flushed. (I'm less sure that pstrdup'ing conname is necessary, but let's be safe.) Failure to do this leads to weird errors or crashes, as reported by Marko Elezovic. Bug introduced by commit e42375fc8, so back-patch to 9.5 as that was. Fix by David Rowley, regression test by Michael Paquier Discussion: https://postgr.es/m/DB6PR03MB30775D58E732D4EB0C13725B9AE00@DB6PR03MB3077.eurprd03.prod.outlook.com
* Avoid superfluous work for commits during logical slot creation.Andres Freund2017-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Before 955a684e0401 logical decoding snapshot maintenance needed to cope with transactions it might not have seen in their entirety. For such transactions we'd to assume they modified the catalog (could have happened before we were watching), and thus a new snapshot had to be built, and distributed to concurrently running transactions. That's problematic because building a new snapshot isn't that cheap , especially as the the array of committed transactions needs to be sorted. When creating a slot on a server with a lot of transactions, this could make logical slot creation infeasibly expensive. After 955a684e0401 there's no need to deal with transaction that aren't guaranteed to be fully observable. That allows to avoid building snapshots for transactions that haven't modified catalog, even before reaching consistency. While this isn't necessarily a bugfix, slot creation being impossible in some production workloads, is severe enough to warrant backpatching. Author: Andres Freund, based on a quite different patch from Petr Jelinek Analyzed-By: Petr Jelinek Reviewed-By: Petr Jelinek Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backpatch: 9.4-, where logical decoding has been introduced
* Fix race condition leading to hanging logical slot creation.Andres Freund2017-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The snapshot assembly during the creation of logical slots relied waiting for transactions in xl_running_xacts to end, by checking for their commit/abort records. Unfortunately, despite locking, it is possible to see an xl_running_xact record listing transactions as ready, that have already WAL-logged an commit/abort record, as the locking just prevents the ProcArray to be adjusted, and the commit record has to be logged first. That lead to either delayed or hanging snapshot creation, because snapbuild.c would wait "forever" to see commit/abort records for some transactions. That hang resolved only if a xl_running_xacts record without any running transactions happened to be logged, far from certain on a busy server. It's impractical to prevent that via more heavyweight locking, the likelihood of deadlocks and significantly increased contention would be too big. Instead change the initial snapshot creation to be solely based on tracking the oldest running transaction via xl_running_xacts->oldestRunningXid - that actually ends up significantly simplifying the code. That has two disadvantages: 1) Because we cannot fully "trust" the contents of xl_running_xacts, we cannot use it to build the initial snapshot. Instead we have to wait twice for all running transactions to finish. 2) Previously a slot, unless the race occurred, could be created when the all transaction perceived as running based on commit/abort records, now we have to wait for the next xl_running_xacts record. To address that, trigger logging new xl_running_xacts record from within snapbuild.c exactly when necessary. Unfortunately snabuild.c's SnapBuild is stored on disk, one of the stupider ideas of a certain Mr Freund, so we can't change it in a minor release. As this is going to be backpatched, we have to hack around a bit to keep on-disk compatibility. A later commit will rejigger that on master. Author: Andres Freund, based on a quite different patch from Petr Jelinek Analyzed-By: Petr Jelinek Reviewed-By: Petr Jelinek Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backpatch: 9.4-, where logical decoding has been introduced
* Avoid searching for callback functions in CallSyscacheCallbacks().Tom Lane2017-05-12
| | | | | | | | | | | | | | | | | | | | | | | | | We have now grown enough registerable syscache-invalidation callback functions that the original assumption that there would be few of them is causing performance problems. In particular, let's fix things so that CallSyscacheCallbacks doesn't have to search the whole array to find which callback(s) to invoke for a given cache ID. Preserve the original behavior that callbacks are called in order of registration, just in case there's someplace that depends on that (which I doubt). In support of this, export the number of syscaches from syscache.h. People could have found that out anyway from the enum, but adding a #define makes that much safer. This provides a useful additional speedup in Mathieu Fenniak's logical-decoding test case, although we're reaching the point of diminishing returns there. I think any further improvement will have to come from reducing the number of cache invalidations that are triggered in the first place. Still, we can hope that this change gives some incremental benefit for all invalidation scenarios. Back-patch to 9.4 where logical decoding was introduced. Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
* Reduce initial size of RelfilenodeMapHash.Tom Lane2017-05-12
| | | | | | | | | | | | | | | | | | | | | A test case provided by Mathieu Fenniak shows that hash_seq_search'ing this hashtable can consume a very significant amount of overhead during logical decoding, which triggers frequent cache invalidation. Testing suggests that the actual population of the hashtable is often no more than a few dozen entries, so we can cut the overhead just by dropping the initial number of buckets down from 1024 --- I chose to cut it to 64. (In situations where we do have a significant number of entries, we shouldn't get any real penalty from doing this, as the dynahash.c code will resize the hashtable automatically.) This gives a further factor-of-two savings in Mathieu's test case. That may be overly optimistic for real-world benefit, as real cases may have larger average table populations, but it's hard to see it turning into a net negative for any workload. Back-patch to 9.4 where relfilenodemap.c was introduced. Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
* Avoid searching for the target catcache in CatalogCacheIdInvalidate.Tom Lane2017-05-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | A test case provided by Mathieu Fenniak shows that the initial search for the target catcache in CatalogCacheIdInvalidate consumes a very significant amount of overhead in cases where cache invalidation is triggered but has little useful work to do. There is no good reason for that search to exist at all, as the index array maintained by syscache.c allows direct lookup of the catcache from its ID. We just need a frontend function in syscache.c, matching the division of labor for most other cache-accessing operations. While there's more that can be done in this area, this patch alone reduces the runtime of Mathieu's example by 2X. We can hope that it offers some useful benefit in other cases too, although usually cache invalidation overhead is not such a striking fraction of the total runtime. Back-patch to 9.4 where logical decoding was introduced. It might be worth going further back, but presently the only case we know of where cache invalidation is really a significant burden is in logical decoding. Also, older branches have fewer catcaches, reducing the possible benefit. (Note: although this nominally changes catcache's API, we have always documented CatalogCacheIdInvalidate as a private function, so I would have little sympathy for an external module calling it directly. So backpatching should be fine.) Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
* Increase MAX_SYSCACHE_CALLBACKS to provide more room for extensions.Tom Lane2017-05-11
| | | | | | | | | | | | | | | | | | Increase from the historical value of 32 to 64. We are up to 31 callers of CacheRegisterSyscacheCallback() in HEAD, so if they were all to be exercised in one process that would leave only one slot for add-on modules. It's probably not possible for that to happen, but still we clearly need more daylight here. (At some point it might be worth making the array dynamically resizable; but since we've never heard a complaint of "out of syscache_callback_list slots" happening in the field, I doubt it's worth it yet.) Back-patch as far as 9.4, which is where we increased the companion limit MAX_RELCACHE_CALLBACKS (cf commit f01d1ae3a). It's not as urgent in released branches, which have only a couple dozen call sites in core, but it still seems that somebody might hit the limit before these branches die. Discussion: https://postgr.es/m/12184.1494450131@sss.pgh.pa.us
* Further patch rangetypes_selfuncs.c's statistics slot management.Tom Lane2017-05-08
| | | | | | | | | | | | | | | Values in a STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM slot are float8, not of the type of the column the statistics are for. This bug is at least partly the fault of sloppy specification comments for get_attstatsslot()/free_attstatsslot(): the type OID they want is that of the stavalues entries, not of the underlying column. (I double-checked other callers and they seem to get this right.) Adjust the comments to be more correct. Per buildfarm. Security: CVE-2017-7484
* Fix possibly-uninitialized variable.Tom Lane2017-05-08
| | | | | | Oversight in e2d4ef8de et al (my fault not Peter's). Per buildfarm. Security: CVE-2017-7484
* Match pg_user_mappings limits to information_schema.user_mapping_options.Noah Misch2017-05-08
| | | | | | | | | | | | | | | Both views replace the umoptions field with NULL when the user does not meet qualifications to see it. They used different qualifications, and pg_user_mappings documented qualifications did not match its implemented qualifications. Make its documentation and implementation match those of user_mapping_options. One might argue for stronger qualifications, but these have long, documented tenure. pg_user_mappings has always exhibited this problem, so back-patch to 9.2 (all supported versions). Michael Paquier and Feike Steenbergen. Reviewed by Jeff Janes. Reported by Andrew Wheelwright. Security: CVE-2017-7486
* Translation updatesPeter Eisentraut2017-05-08
| | | | | Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: dd6f9ed0d9d7b33d328761dcdd3a70f44aaff6ff
* Add security checks to selectivity estimation functionsPeter Eisentraut2017-05-08
| | | | | | | | | | | | | | | | | | | | | Some selectivity estimation functions run user-supplied operators over data obtained from pg_statistic without security checks, which allows those operators to leak pg_statistic data without having privileges on the underlying tables. Fix by checking that one of the following is satisfied: (1) the user has table or column privileges on the table underlying the pg_statistic data, or (2) the function implementing the user-supplied operator is leak-proof. If neither is satisfied, planning will proceed as if there are no statistics available. At least one of these is satisfied in most cases in practice. The only situations that are negatively impacted are user-defined or not-leak-proof operators on a security-barrier view. Reported-by: Robert Haas <robertmhaas@gmail.com> Author: Peter Eisentraut <peter_e@gmx.net> Author: Tom Lane <tgl@sss.pgh.pa.us> Security: CVE-2017-7484
* RLS: Fix ALL vs. SELECT+UPDATE policy usageStephen Frost2017-05-06
| | | | | | | | | | | | | | | | | | | | When we add the SELECT-privilege based policies to the RLS with check options (such as for an UPDATE statement, or when we have INSERT ... RETURNING), we need to be sure and use the 'USING' case if the policy is actually an 'ALL' policy (which could have both a USING clause and an independent WITH CHECK clause). This could result in policies acting differently when built using ALL (when the ALL had both USING and WITH CHECK clauses) and when building the policies independently as SELECT and UPDATE policies. Fix this by adding an explicit boolean to add_with_check_options() to indicate when the USING policy should be used, even if the policy has both USING and WITH CHECK policies on it. Reported by: Rod Taylor Back-patch to 9.5 where RLS was introduced.
* Fix cursor_to_xml in tableforest false modePeter Eisentraut2017-05-04
| | | | | | | | | | | | | It only produced <row> elements but no wrapping <table> element. By contrast, cursor_to_xmlschema produced a schema that is now correct but did not previously match the XML data produced by cursor_to_xml. In passing, also fix a minor misunderstanding about moving cursors in the tests related to this. Reported-by: filip@jirsak.org Based-on-patch-by: Thomas Munro <thomas.munro@enterprisedb.com>
* Fix pfree-of-already-freed-tuple when rescanning a GiST index-only scan.Tom Lane2017-05-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | GiST's getNextNearest() function attempts to pfree the previously-returned tuple if any (that is, scan->xs_hitup in HEAD, or scan->xs_itup in older branches). However, if we are rescanning a plan node after ending a previous scan early, those tuple pointers could be pointing to garbage, because they would be pointing into the scan's pageDataCxt or queueCxt which has been reset. In a debug build this reliably results in a crash, although I think it might sometimes accidentally fail to fail in production builds. To fix, clear the pointer field anyplace we reset a context it might be pointing into. This may be overkill --- I think probably only the queueCxt case is involved in this bug, so that resetting in gistrescan() would be sufficient --- but dangling pointers are generally bad news, so let's avoid them. Another plausible answer might be to just not bother with the pfree in getNextNearest(). The reconstructed tuples would go away anyway in the context resets, and I'm far from convinced that freeing them a bit earlier really saves anything meaningful. I'll stick with the original logic in this patch, but if we find more problems in the same area we should consider that approach. Per bug #14641 from Denis Smirnov. Back-patch to 9.5 where this logic was introduced. Discussion: https://postgr.es/m/20170504072034.24366.57688@wrigleys.postgresql.org
* Ensure commands in extension scripts see the results of preceding DDL.Tom Lane2017-05-02
| | | | | | | | | | | | | Due to a missing CommandCounterIncrement() call, parsing of a non-utility command in an extension script would not see the effects of the immediately preceding DDL command, unless that command's execution ends with CommandCounterIncrement() internally ... which some do but many don't. Report by Philippe Beaudoin, diagnosis by Julien Rouhaud. Rather remarkably, this bug has evaded detection since extensions were invented, so back-patch to all supported branches. Discussion: https://postgr.es/m/2cf7941e-4e41-7714-3de8-37b1a8f74dff@free.fr
* Fix VALIDATE CONSTRAINT to consider NO INHERIT attribute.Robert Haas2017-04-28
| | | | | | | | | | Currently, trying to validate a NO INHERIT constraint on the parent will search for the constraint in child tables (where it is not supposed to exist), wrongly causing a "constraint does not exist" error. Amit Langote, per a report from Hans Buschmann. Discussion: http://postgr.es/m/20170421184012.24362.19@wrigleys.postgresql.org
* Don't use on-disk snapshots for exported logical decoding snapshot.Andres Freund2017-04-27
| | | | | | | | | | | | | | | | | | | | | Logical decoding stores historical snapshots on disk, so that logical decoding can restart without having to reconstruct a snapshot from scratch (for which the resources are not guaranteed to be present anymore). These serialized snapshots were also used when creating a new slot via the walsender interface, which can export a "full" snapshot (i.e. one that can read all tables, not just catalog ones). The problem is that the serialized snapshots are only useful for catalogs and not for normal user tables. Thus the use of such a serialized snapshot could result in an inconsistent snapshot being exported, which could lead to queries returning wrong data. This would only happen if logical slots are created while another logical slot already exists. Author: Petr Jelinek Reviewed-By: Andres Freund Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backport: 9.4, where logical decoding was introduced.
* Cope with glibc too old to have epoll_create1().Tom Lane2017-04-27
| | | | | | | | | Commit fa31b6f4e supposed that we didn't have to worry about that anymore, but it seems that RHEL5 is like that, and that's still a supported platform. Put back the prior coding under an #ifdef, adding an explicit fcntl() to retain the desired CLOEXEC property. Discussion: https://postgr.es/m/12307.1493325329@sss.pgh.pa.us
* Preserve required !catalog tuples while computing initial decoding snapshot.Andres Freund2017-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The logical decoding machinery already preserved all the required catalog tuples, which is sufficient in the course of normal logical decoding, but did not guarantee that non-catalog tuples were preserved during computation of the initial snapshot when creating a slot over the replication protocol. This could cause a corrupted initial snapshot being exported. The time window for issues is usually not terribly large, but on a busy server it's perfectly possible to it hit it. Ongoing decoding is not affected by this bug. To avoid increased overhead for the SQL API, only retain additional tuples when a logical slot is being created over the replication protocol. To do so this commit changes the signature of CreateInitDecodingContext(), but it seems unlikely that it's being used in an extension, so that's probably ok. In a drive-by fix, fix handling of ReplicationSlotsComputeRequiredXmin's already_locked argument, which should only apply to ProcArrayLock, not ReplicationSlotControlLock. Reported-By: Erik Rijkers Analyzed-By: Petr Jelinek Author: Petr Jelinek, heavily editorialized by Andres Freund Reviewed-By: Andres Freund Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com Backport: 9.4, where logical decoding was introduced.
* Make latch.c more paranoid about child-process cases.Tom Lane2017-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | Although the postmaster doesn't currently create a self-pipe or any latches, there's discussion of it doing so in future. It's also conceivable that a shared_preload_libraries extension would try to create such a thing in the postmaster process today. In that case the self-pipe FDs would be inherited by forked child processes. latch.c was entirely unprepared for such a case and could suffer an assertion failure, or worse try to use the inherited pipe if somebody called WaitLatch without having called InitializeLatchSupport in that process. Make it keep track of whether InitializeLatchSupport has been called in the *current* process, and do the right thing if state has been inherited from a parent. Apply FD_CLOEXEC to file descriptors created in latch.c (the self-pipe, as well as epoll event sets). This ensures that child processes spawned in backends, the archiver, etc cannot accidentally or intentionally mess with these FDs. It also ensures that we end up with the right state for the self-pipe in EXEC_BACKEND processes, which otherwise wouldn't know to close the postmaster's self-pipe FDs. Back-patch to 9.6, mainly to keep latch.c looking similar in all branches it exists in. Discussion: https://postgr.es/m/8322.1493240739@sss.pgh.pa.us
* Allow multiple bgworkers to be launched per postmaster iteration.Tom Lane2017-04-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, maybe_start_bgworker() would launch at most one bgworker process per call, on the grounds that the postmaster might otherwise neglect its other duties for too long. However, that seems overly conservative, especially since bad effects only become obvious when many hundreds of bgworkers need to be launched at once. On the other side of the coin is that the existing logic could result in substantial delay of bgworker launches, because ServerLoop isn't guaranteed to iterate immediately after a signal arrives. (My attempt to fix that by using pselect(2) encountered too many portability question marks, and in any case could not help on platforms without pselect().) One could also question the wisdom of using an O(N^2) processing method if the system is intended to support so many bgworkers. As a compromise, allow that function to launch up to 100 bgworkers per call (and in consequence, rename it to maybe_start_bgworkers). This will allow any normal parallel-query request for workers to be satisfied immediately during sigusr1_handler, avoiding the question of whether ServerLoop will be able to launch more promptly. There is talk of rewriting the postmaster to use a WaitEventSet to avoid the signal-response-delay problem, but I'd argue that this change should be kept even after that happens (if it ever does). Backpatch to 9.6 where parallel query was added. The issue exists before that, but previous uses of bgworkers typically aren't as sensitive to how quickly they get launched. Discussion: https://postgr.es/m/4707.1493221358@sss.pgh.pa.us
* Revert "Use pselect(2) not select(2), if available, to wait in postmaster's ↵Tom Lane2017-04-24
| | | | | | | | | | | | | | loop." This reverts commit b9515b62879722e3497236f6e0d6783c3fc059a2. Buildfarm results suggest that some platforms have versions of pselect(2) that are not merely non-atomic, but flat out non-functional. Revert the use-pselect patch to confirm this diagnosis (and exclude the no-SA_RESTART patch as the source of trouble). If it's so, we should probably look into blacklisting specific platforms that have broken pselect. Discussion: https://postgr.es/m/9696.1493072081@sss.pgh.pa.us
* Use pselect(2) not select(2), if available, to wait in postmaster's loop.Tom Lane2017-04-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally we've unblocked signals, called select(2), and then blocked signals again. The code expects that the select() will be cancelled with EINTR if an interrupt occurs; but there's a race condition, which is that an already-pending signal will be delivered as soon as we unblock, and then when we reach select() there will be nothing preventing it from waiting. This can result in a long delay before we perform any action that ServerLoop was supposed to have taken in response to the signal. As with the somewhat-similar symptoms fixed by commit 893902085, the main practical problem is slow launching of parallel workers. The window for trouble is usually pretty short, corresponding to one iteration of ServerLoop; but it's not negligible. To fix, use pselect(2) in place of select(2) where available, as that's designed to solve exactly this problem. Where not available, we continue to use the old way, and are no worse off than before. pselect(2) has been required by POSIX since about 2001, so most modern platforms should have it. A bigger portability issue is that some implementations are said to be non-atomic, ie pselect() isn't really any different from unblock/select/reblock. Still, we're no worse off than before on such a platform. There is talk of rewriting the postmaster to use a WaitEventSet and not do signal response work in signal handlers, at which point this could be reverted, since we'd be using a self-pipe to solve the race condition. But that's not happening before v11 at the earliest. Back-patch to 9.6. The problem exists much further back, but the worst symptom arises only in connection with parallel query, so it does not seem worth taking any portability risks in older branches. Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
* Run the postmaster's signal handlers without SA_RESTART.Tom Lane2017-04-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The postmaster keeps signals blocked everywhere except while waiting for something to happen in ServerLoop(). The code expects that the select(2) will be cancelled with EINTR if an interrupt occurs; without that, followup actions that should be performed by ServerLoop() itself will be delayed. However, some platforms interpret the SA_RESTART signal flag as meaning that they should restart rather than cancel the select(2). Worse yet, some of them restart it with the original timeout delay, meaning that a steady stream of signal interrupts can prevent ServerLoop() from iterating at all if there are no incoming connection requests. Observable symptoms of this, on an affected platform such as HPUX 10, include extremely slow parallel query startup (possibly as much as 30 seconds) and failure to update timestamps on the postmaster's sockets and lockfiles when no new connections arrive for a long time. We can fix this by running the postmaster's signal handlers without SA_RESTART. That would be quite a scary change if the range of code where signals are accepted weren't so tiny, but as it is, it seems safe enough. (Note that postmaster children do, and must, reset all the handlers before unblocking signals; so this change should not affect any child process.) There is talk of rewriting the postmaster to use a WaitEventSet and not do signal response work in signal handlers, at which point it might be appropriate to revert this patch. But that's not happening before v11 at the earliest. Back-patch to 9.6. The problem exists much further back, but the worst symptom arises only in connection with parallel query, so it does not seem worth taking any portability risks in older branches. Discussion: https://postgr.es/m/9205.1492833041@sss.pgh.pa.us
* Fix postmaster's handling of fork failure for a bgworker process.Tom Lane2017-04-24
| | | | | | | | | | | | | | | | | | | | | This corner case didn't behave nicely at all: the postmaster would (partially) update its state as though the process had started successfully, and be quite confused thereafter. Fix it to act like the worker had crashed, instead. In passing, refactor so that do_start_bgworker contains all the state-change logic for bgworker launch, rather than just some of it. Back-patch as far as 9.4. 9.3 contains similar logic, but it's just enough different that I don't feel comfortable applying the patch without more study; and the use of bgworkers in 9.3 was so small that it doesn't seem worth the extra work. transam/parallel.c is still entirely unprepared for the possibility of bgworker startup failure, but that seems like material for a separate patch. Discussion: https://postgr.es/m/4905.1492813727@sss.pgh.pa.us
* Zero padding in replication origin's checkpointed on disk-state.Andres Freund2017-04-23
| | | | | | | | | | | | | | | | This seems to be largely cosmetic, avoiding valgrind bleats and the like. The uninitialized padding influences the CRC of the on-disk entry, but because it's also used when verifying the CRC, that doesn't cause spurious failures. Backpatch nonetheless. It's a bit unfortunate that contrib/test_decoding/sql/replorigin.sql doesn't exercise the checkpoint path, but checkpoints are fairly expensive on weaker machines, and we'd have to stop/start for that to be meaningful. Author: Andres Freund Discussion: https://postgr.es/m/20170422183123.w2jgiuxtts7qrqaq@alap3.anarazel.de Backpatch: 9.5, where replication origins were introduced
* Fix order of arguments to SubTransSetParent().Tom Lane2017-04-23
| | | | | | | | | | | | | | ProcessTwoPhaseBuffer (formerly StandbyRecoverPreparedTransactions) mixed up the parent and child XIDs when calling SubTransSetParent to record the transactions' relationship in pg_subtrans. Remarkably, analysis by Simon Riggs suggests that this doesn't lead to visible problems (at least, not in non-Assert builds). That might explain why we'd not noticed it before. Nonetheless, it's surely wrong. This code was born broken, so back-patch to all supported branches. Discussion: https://postgr.es/m/20110.1492905318@sss.pgh.pa.us
* Avoid depending on non-POSIX behavior of fcntl(2).Tom Lane2017-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The POSIX standard does not say that the success return value for fcntl(F_SETFD) and fcntl(F_SETFL) is zero; it says only that it's not -1. We had several calls that were making the stronger assumption. Adjust them to test specifically for -1 for strict spec compliance. The standard further leaves open the possibility that the O_NONBLOCK flag bit is not the only active one in F_SETFL's argument. Formally, therefore, one ought to get the current flags with F_GETFL and store them back with only the O_NONBLOCK bit changed when trying to change the nonblock state. In port/noblock.c, we were doing the full pushup in pg_set_block but not in pg_set_noblock, which is just weird. Make both of them do it properly, since they have little business making any assumptions about the socket they're handed. The other places where we're issuing F_SETFL are working with FDs we just got from pipe(2), so it's reasonable to assume the FDs' properties are all default, so I didn't bother adding F_GETFL steps there. Also, while pg_set_block deserves some points for trying to do things right, somebody had decided that it'd be even better to cast fcntl's third argument to "long". Which is completely loony, because POSIX clearly says the third argument for an F_SETFL call is "int". Given the lack of field complaints, these missteps apparently are not of significance on any common platforms. But they're still wrong, so back-patch to all supported branches. Discussion: https://postgr.es/m/30882.1492800880@sss.pgh.pa.us
* Always build a custom plan node's targetlist from the path's pathtarget.Tom Lane2017-04-17
| | | | | | | | | | | | | | | | | | | | | | | We were applying the use_physical_tlist optimization to all relation scan plans, even those implemented by custom scan providers. However, that's a bad idea for a couple of reasons. The custom provider might be unable to provide columns that it hadn't expected to be asked for (for example, the custom scan might depend on an index-only scan). Even more to the point, there's no good reason to suppose that this "optimization" is a win for a custom scan; whatever the custom provider is doing is likely not based on simply returning physical heap tuples. (As a counterexample, if the custom scan is an interface to a column store, demanding all columns would be a huge loss.) If it is a win, the custom provider could make that decision for itself and insert a suitable pathtarget into the path, anyway. Per discussion with Dmitry Ivanov. Back-patch to 9.5 where custom scan support was introduced. The argument that the custom provider can adjust the behavior by changing the pathtarget only applies to 9.6+, but on balance it seems more likely that use_physical_tlist will hurt custom scans than help them. Discussion: https://postgr.es/m/e29ddd30-8ef9-4da5-a50b-2bb7b8c7198d@postgrespro.ru
* Fix compiler warningPeter Eisentraut2017-04-16
| | | | | Introduced by 41306a511c01dd299115cf447858a00e34aebbf6, happens with gcc 4.7.2.
* Provide a way to control SysV shmem attach address in EXEC_BACKEND builds.Tom Lane2017-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | In standard non-Windows builds, there's no particular reason to care what address the kernel chooses to map the shared memory segment at. However, when building with EXEC_BACKEND, there's a risk that the chosen address won't be available in all child processes. Linux with ASLR enabled (which it is by default) seems particularly at risk because it puts shmem segments into the same area where it maps shared libraries. We can work around that by specifying a mapping address that's outside the range where shared libraries could get mapped. On x86_64 Linux, 0x7e0000000000 seems to work well. This is only meant for testing/debugging purposes, so it doesn't seem necessary to go as far as providing a GUC (or any user-visible documentation, though we might change that later). Instead, it's just controlled by setting an environment variable PG_SHMEM_ADDR to the desired attach address. Back-patch to all supported branches, since the point here is to remove intermittent buildfarm failures on EXEC_BACKEND animals. Owners of affected animals will need to add a suitable setting of PG_SHMEM_ADDR to their build_env configuration. Discussion: https://postgr.es/m/7036.1492231361@sss.pgh.pa.us
* Avoid passing function pointers across process boundaries.Tom Lane2017-04-15
| | | | | | | | | | | | This back-patches commit 32470825d36d99a81347ee36c181d609c952c061 into 9.6, primarily to make buildfarm member culicidae happy. Unlike the HEAD patch, avoid changing the existing API of CreateParallelContext; instead we just switch to using CreateParallelContextForExternalFunction, even for core functions. Petr Jelinek, with a bunch of basically-cosmetic adjustments by me Discussion: https://postgr.es/m/548f9c1d-eafa-e3fa-9da8-f0cc2f654e60@2ndquadrant.com
* Fix regexport.c to behave sanely with lookaround constraints.Tom Lane2017-04-13
| | | | | | | | | | | | | | | | | | | | regexport.c thought it could just ignore LACON arcs, but the correct behavior is to treat them as satisfiable while consuming zero input (rather reminiscently of commit 9f1e642d5). Otherwise, the emitted simplified-NFA representation may contain no paths leading from initial to final state, which unsurprisingly confuses pg_trgm, as seen in bug #14623 from Jeff Janes. Since regexport's output representation has no concept of an arc that consumes zero input, recurse internally to find the next normal arc(s) after any LACON transitions. We'd be forced into changing that representation if a LACON could be the last arc reaching the final state, but fortunately the regex library never builds NFAs with such a configuration, so there always is a next normal arc. Back-patch to 9.3 where this logic was introduced. Discussion: https://postgr.es/m/20170413180503.25948.94871@wrigleys.postgresql.org
* Fix planner error (or assert trap) with nested set operations.Tom Lane2017-04-07
| | | | | | | | | | | | | | | | | | | | | As reported by Sean Johnston in bug #14614, since 9.6 the planner can fail due to trying to look up the referent of a Var with varno 0. This happens because we generate such Vars in generate_append_tlist, for lack of any better way to describe the output of a SetOp node. In typical situations nothing really cares about that, but given nested set-operation queries we will call estimate_num_groups on the output of the subquery, and that wants to know what a Var actually refers to. That logic used to look at subquery->targetList, but in commit 3fc6e2d7f I'd switched it to look at subroot->processed_tlist, ie the actual output of the subquery plan not the parser's idea of the result. It seemed like a good idea at the time :-(. As a band-aid fix, change it back. Really we ought to have an honest way of naming the outputs of SetOp steps, which suggests that it'd be a good idea for the parser to emit an RTE corresponding to each one. But that's a task for another day, and it certainly wouldn't yield a back-patchable fix. Report: https://postgr.es/m/20170407115808.25934.51866@wrigleys.postgresql.org
* Remove dead code and fix comments in fast-path function handling.Heikki Linnakangas2017-04-06
| | | | | | | | | | | | | | | | HandleFunctionRequest() is no longer responsible for reading the protocol message from the client, since commit 2b3a8b20c2. Fix the outdated comments. HandleFunctionRequest() now always returns 0, because the code that used to return EOF was moved in 2b3a8b20c2. Therefore, the caller no longer needs to check the return value. Reported by Andres Freund. Backpatch to all supported versions, even though this doesn't have any user-visible effect, to make backporting future patches in this area easier. Discussion: https://www.postgresql.org/message-id/20170405010525.rt5azbya5fkbhvrx@alap3.anarazel.de
* Fix integer-overflow problems in interval comparison.Tom Lane2017-04-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using integer timestamps, the interval-comparison functions tried to compute the overall magnitude of an interval as an int64 number of microseconds. As reported by Frazer McLean, this overflows for intervals exceeding about 296000 years, which is bad since we nominally allow intervals many times larger than that. That results in wrong comparison results, and possibly in corrupted btree indexes for columns containing such large interval values. To fix, compute the magnitude as int128 instead. Although some compilers have native support for int128 calculations, many don't, so create our own support functions that can do 128-bit addition and multiplication if the compiler support isn't there. These support functions are designed with an eye to allowing the int128 code paths in numeric.c to be rewritten for use on all platforms, although this patch doesn't do that, or even provide all the int128 primitives that will be needed for it. Back-patch as far as 9.4. Earlier releases did not guard against overflow of interval values at all (commit 146604ec4 fixed that), so it seems not very exciting to worry about overly-large intervals for them. Before 9.6, we did not assume that unreferenced "static inline" functions would not draw compiler warnings, so omit functions not directly referenced by timestamp.c, the only present consumer of int128.h. (We could have omitted these functions in HEAD too, but since they were written and debugged on the way to the present patch, and they look likely to be needed by numeric.c, let's keep them in HEAD.) I did not bother to try to prevent such warnings in a --disable-integer-datetimes build, though. Before 9.5, configure will never define HAVE_INT128, so the part of int128.h that exploits a native int128 implementation is dead code in the 9.4 branch. I didn't bother to remove it, thinking that keeping the file looking similar in different branches is more useful. In HEAD only, add a simple test harness for int128.h in src/tools/. In back branches, this does not change the float-timestamps code path. That's not subject to the same kind of overflow risk, since it computes the interval magnitude as float8. (No doubt, when this code was originally written, overflow was disregarded for exactly that reason.) There is a precision hazard instead :-(, but we'll avert our eyes from that question, since no complaints have been reported and that code's deprecated anyway. Kyotaro Horiguchi and Tom Lane Discussion: https://postgr.es/m/1490104629.422698.918452336.26FA96B7@webmail.messagingengine.com
* Fix parallel query so it doesn't spoil row estimates above Gather.Robert Haas2017-03-31
| | | | | | | | | | | | | | | | | Commit 45be99f8cd5d606086e0a458c9c72910ba8a613d removed GatherPath's num_workers field, but this is entirely bogus. Normally, a path's parallel_workers flag is supposed to indicate the number of workers that it wants, and should be 0 for a non-partial path. In that commit, I mistakenly thought that GatherPath could also use that field to indicate the number of workers that it would try to start, but that's disastrous, because then it can propagate up to higher nodes in the plan tree, which will then get incorrect rowcounts because the parallel_workers flag is involved in computing those values. Repair by putting the separate field back. Report by Tomas Vondra. Patch by me, reviewed by Amit Kapila. Discussion: http://postgr.es/m/f91b4a44-f739-04bd-c4b6-f135bd643669@2ndquadrant.com
* Don't use bgw_main even to specify in-core bgworker entrypoints.Robert Haas2017-03-31
| | | | | | | | | | | | | | | On EXEC_BACKEND builds, this can fail if ASLR is in use. Backpatch to 9.5. On master, completely remove the bgw_main field completely, since there is no situation in which it is safe for an EXEC_BACKEND build. On 9.6 and 9.5, leave the field intact to avoid breaking things for third-party code that doesn't care about working under EXEC_BACKEND. Prior to 9.5, there are no in-core bgworker entrypoints. Petr Jelinek, reviewed by me. Discussion: http://postgr.es/m/09d8ad33-4287-a09b-a77f-77f8761adb5e@2ndquadrant.com
* Suppress implicit-conversion warnings seen with newer clang versions.Tom Lane2017-03-28
| | | | | | | | | | | | | | | We were assigning values near 255 through "char *" pointers. On machines where char is signed, that's not entirely kosher, and it's reasonable for compilers to warn about it. A better solution would be to change the pointer type to "unsigned char *", but that would be vastly more invasive. For the moment, let's just apply this simple backpatchable solution. Aleksander Alekseev Discussion: https://postgr.es/m/20170220141239.GD12278@e733.localdomain Discussion: https://postgr.es/m/2839.1490714708@sss.pgh.pa.us
* Fix unportable disregard of alignment requirements in RADIUS code.Tom Lane2017-03-26
| | | | | | | | | | | | | | | The compiler is entitled to store a char[] local variable with no particular alignment requirement. Our RADIUS code cavalierly took such a local variable and cast its address to a struct type that does have alignment requirements. On an alignment-picky machine this would lead to bus errors. To fix, declare the local variable honestly, and then cast its address to char * for use in the I/O calls. Given the lack of field complaints, there must be very few if any people affected; but nonetheless this is a clear portability issue, so back-patch to all supported branches. Noted while looking at a Coverity complaint in the same code.
* Fix backup cancelingTeodor Sigaev2017-03-24
| | | | | | | | | | | | | | | | | Assert-enabled build crashes but without asserts it works by wrong way: it may not reset forcing full page write and preventing from starting exclusive backup with the same name as cancelled. Patch replaces pair of booleans nonexclusive_backup_running/exclusive_backup_running to single enum to correctly describe backup state. Backpatch to 9.6 where bug was introduced Reported-by: David Steele Authors: Michael Paquier, David Steele Reviewed-by: Anastasia Lubennikova https://commitfest.postgresql.org/13/1068/
* Fix support for some operators (&<, &>, $<|, |&>) in box operator classTeodor Sigaev2017-03-21
| | | | | | | | | | | | of SP-GiST. Bug exists since initial commit of box opclass for SP-GiST, so backpath to 9.6 Author: Nikita Glukhov with minor editorization of tests by me Reviewed-by: Kyotaro Horiguchi, Anastasia Lubennikova https://commitfest.postgresql.org/13/981/
* Revert unintentional change in increasing usage count during pin of buffers,Teodor Sigaev2017-03-20
| | | | | | | | | | | | this makes buffer access strategy have no effect. Change was a part of commit 48354581a49c30f5757c203415aa8412d85b0f70 during 9.6 release cycle, so backpath to 9.6 Reported-by: Jim Nasby Author: Alexander Korotkov Reviewed-by: Jim Nasby, Andres Freund https://commitfest.postgresql.org/13/1029/
* Fix WaitEventSetWait() to handle write-ready waits properly on Windows.Tom Lane2017-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | Windows apparently will not detect socket write-ready events unless a preceding send attempt returned WSAEWOULDBLOCK. In many usage patterns that's satisfied by the caller of WaitEvenSetWait(), but not always. Apply the same solution that we already had in pgwin32_select(), namely to perform a dummy WSASend() call with len=0. This will return WSAEWOULDBLOCK if there's no buffer space (even though it could legitimately do nothing and report success, which makes me a bit nervous about this solution; but since it's been working fine in libpq, let's roll with it). In passing, improve the comments about this in pgwin32_select(), and remove duplicated code there. Back-patch to 9.6 where WaitEventSetWait() was introduced. We might need to back-patch something similar into predecessor code. But given the lack of complaints so far, it's not clear that the case ever gets exercised in the back branches, so I'm not going to expend effort on it right now. This should resolve recurring failures on buildfarm member bowerbird, which has been failing since 1e8a85009 went in. Diagnosis and patch by Petr Jelinek, cosmetic adjustments by me. Discussion: https://postgr.es/m/5b6a6d6d-fb45-0afb-2e95-5600063c3dbd@2ndquadrant.com
* Avoid having vacuum set reltuples to 0 on non-empty relations in theAndrew Gierth2017-03-16
| | | | | | | | | | | | | | | | | | presence of page pins, which leads to serious estimation errors in the planner. This particularly affects small heavily-accessed tables, especially where locking (e.g. from FK constraints) forces frequent vacuums for mxid cleanup. Fix by keeping separate track of pages whose live tuples were actually counted vs. pages that were only scanned for freezing purposes. Thus, reltuples can only be set to 0 if all pages of the relation were actually counted. Backpatch to all supported versions. Per bug #14057 from Nicolas Baccelli, analyzed by me. Discussion: https://postgr.es/m/20160331103739.8956.94469@wrigleys.postgresql.org
* Fix ancient get_object_address_opf_member bugAlvaro Herrera2017-03-16
| | | | | | | | | | | The original coding was trying to use a TypeName as a string Value, which doesn't work; an oversight in my commit a61fd533. Repair. Also, make sure we cover the broken case in the relevant test script. Backpatch to 9.5. Discussion: https://postgr.es/m/20170315151829.bhxsvrp75xdxhm3n@alvherre.pgsql