aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Don't lose walreceiver start requests due to race condition in postmaster.Tom Lane2017-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a walreceiver dies, the startup process will notice that and send a PMSIGNAL_START_WALRECEIVER signal to the postmaster, asking for a new walreceiver to be launched. There's a race condition, which at least in HEAD is very easy to hit, whereby the postmaster might see that signal before it processes the SIGCHLD from the walreceiver process. In that situation, sigusr1_handler() just dropped the start request on the floor, reasoning that it must be redundant. Eventually, after 10 seconds (WALRCV_STARTUP_TIMEOUT), the startup process would make a fresh request --- but that's a long time if the connection could have been re-established almost immediately. Fix it by setting a state flag inside the postmaster that we won't clear until we do launch a walreceiver. In cases where that results in an extra walreceiver launch, it's up to the walreceiver to realize it's unwanted and go away --- but we have, and need, that logic anyway for the opposite race case. I came across this through investigating unexpected delays in the src/test/recovery TAP tests: it manifests there in test cases where a master server is stopped and restarted while leaving streaming slaves active. This logic has been broken all along, so back-patch to all supported branches. Discussion: https://postgr.es/m/21344.1498494720@sss.pgh.pa.us
* Ignore old stats file timestamps when starting the stats collector.Tom Lane2017-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The stats collector disregards inquiry messages that bear a cutoff_time before when it last wrote the relevant stats file. That's fine, but at startup when it reads the "permanent" stats files, it absorbed their timestamps as if they were the times at which the corresponding temporary stats files had been written. In reality, of course, there's no data out there at all. This led to disregarding inquiry messages soon after startup if the postmaster had been shut down and restarted within less than PGSTAT_STAT_INTERVAL; which is a pretty common scenario, both for testing and in the field. Requesting backends would hang for 10 seconds and then report failure to read statistics, unless they got bailed out by some other backend coming along and making a newer request within that interval. I came across this through investigating unexpected delays in the src/test/recovery TAP tests: it manifests there because the autovacuum launcher hangs for 10 seconds when it can't get statistics at startup, thus preventing a second shutdown from occurring promptly. We might want to do some things in the autovac code to make it less prone to getting stuck that way, but this change is a good bug fix regardless. In passing, also fix pgstat_read_statsfiles() to ensure that it re-zeroes its global stats variables if they are corrupted by a short read from the stats file. (Other reads in that function go into temp variables, so that the issue doesn't arise.) This has been broken since we created the separation between permanent and temporary stats files in 8.4, so back-patch to all supported branches. Discussion: https://postgr.es/m/16860.1498442626@sss.pgh.pa.us
* Fix typo in commentAlvaro Herrera2017-06-22
| | | | | | | | Once upon a time, WAL pointers could be NULL, but no longer. We talk about "valid" now. Reported-by: Amit Langote Discussion: https://postgr.es/m/33e9617d-27f1-eee8-3311-e27af98eaf2b@lab.ntt.co.jp
* Fix possibility of creating a "phantom" segment after promotion.Andres Freund2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When promoting a standby just after a XLOG_SWITCH record was replayed, and next segment(s) are already are locally available (via walsender, restore_command + trigger/recovery target), that segment could accidentally be recycled onto the past of the new timeline. Later checkpointer would create a .ready file for it, assuming there was an error during creation, and it would get archived. That causes trouble if another standby is later brought up from a basebackup from before the timeline creation, because it would try to read the segment, because XLogFileReadAnyTLI just tries all possible timelines, which doesn't have valid contents. Thus replay would fail. The problem, if already occurred, can be fixed by removing the segment and/or having restore_command filter it out. The reason for the creation of such "phantom" segments was, that after an XLOG_SWITCH record the EndOfLog variable points to the beginning of the next segment, and RemoveXlogFile() used XLByteToPrevSeg(). Normally RemoveXlogFile() doing so is harmless, because the last segment will still exist preventing InstallXLogFileSegment() from causing harm, but just after promotion there's no previous segment on the new timeline. Fix that by using XLByteToSeg() instead of XLByteToPrevSeg(). Author: Andres Freund Reported-By: Greg Burek Discussion: https://postgr.es/m/20170619073026.zcwpe6mydsaz5ygd@alap3.anarazel.de Backpatch: 9.2-, bug older than all supported versions
* Fix typo in comment.Heikki Linnakangas2017-06-21
| | | | Etsuro Fujita
* Fix leaking of small spilled subtransactions during logical decoding.Andres Freund2017-06-18
| | | | | | | | | | | | | | | | | | When, during logical decoding, a transaction gets too big, it's contents get spilled to disk. Not just the top-transaction gets spilled, but *also* all of its subtransactions, even if they're not that large themselves. Unfortunately we didn't clean up such small spilled subtransactions from disk. Fix that, by keeping better track of whether a transaction has been spilled to disk. Author: Andres Freund Reported-By: Dmitriy Sarafannikov, Fabrízio de Royes Mello Discussion: https://postgr.es/m/1457621358.355011041@f382.i.mail.ru https://postgr.es/m/CAFcNs+qNMhNYii4nxpO6gqsndiyxNDYV0S=JNq0v_sEE+9PHXg@mail.gmail.com Backpatch: 9.4-, where logical decoding was introduced
* Fix dependency, when changing a function's argument/return type.Heikki Linnakangas2017-06-16
| | | | | | | | | | | | When a new base type is created using the old-style procedure of first creating the input/output functions with "opaque" in place of the base type, the "opaque" argument/return type is changed to the final base type, on CREATE TYPE. However, we did not create a pg_depend record when doing that, so the functions were left not depending on the type. Fixes bug #14706, reported by Karen Huddleston. Discussion: https://www.postgresql.org/message-id/20170614232259.1424.82774@wrigleys.postgresql.org
* Fix low-probability leaks of PGresult objects in the backend.Tom Lane2017-06-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We had three occurrences of essentially the same coding pattern wherein we tried to retrieve a query result from a libpq connection without blocking. In the case where PQconsumeInput failed (typically indicating a lost connection), all three loops simply gave up and returned, forgetting to clear any previously-collected PGresult object. Since those are malloc'd not palloc'd, the oversight results in a process-lifespan memory leak. One instance, in libpqwalreceiver, is of little significance because the walreceiver process would just quit anyway if its connection fails. But we might as well fix it. The other two instances, in postgres_fdw, are somewhat more worrisome because at least in principle the scenario could be repeated, allowing the amount of memory leaked to build up to something worth worrying about. Moreover, in these cases the loops contain CHECK_FOR_INTERRUPTS calls, as well as other calls that could potentially elog(ERROR), providing another way to exit without having cleared the PGresult. Here we need to add PG_TRY logic similar to what exists in quite a few other places in postgres_fdw. Coverity noted the libpqwalreceiver bug; I found the other two cases by checking all calls of PQconsumeInput. Back-patch to all supported versions as appropriate (9.2 lacks postgres_fdw, so this is really quite unexciting for that branch). Discussion: https://postgr.es/m/22620.1497486981@sss.pgh.pa.us
* Assert that we don't invent relfilenodes or type OIDs in binary upgrade.Tom Lane2017-06-12
| | | | | | | | | | | | | | | | | | | | | | | | | | During pg_upgrade's restore run, all relfilenode choices should be overridden by commands in the dump script. If we ever find ourselves choosing a relfilenode in the ordinary way, someone blew it. Likewise for pg_type OIDs. Since pg_upgrade might well succeed anyway, if there happens not to be a conflict during the regression test run, we need assertions here to keep us on the straight and narrow. We might someday be able to remove the assertion in GetNewRelFileNode, if pg_upgrade is rewritten to remove its assumption that old and new relfilenodes always match. But it's hard to see how to get rid of the pg_type OID constraint, since those OIDs are embedded in user tables in some cases. Back-patch as far as 9.5, because of the risk of back-patches breaking something here even if it works in HEAD. I'd prefer to go back further, but 9.4 fails both assertions due to get_rel_infos()'s use of a temporary table. We can't use the later-branch solution of a CTE for compatibility reasons (cf commit 5d16332e9), and it doesn't seem worth inventing some other way to do the query. (I did check, by dint of changing the Asserts to elog(WARNING), that there are no other cases of unwanted OID assignments during 9.4's regression test run.) Discussion: https://postgr.es/m/19785.1497215827@sss.pgh.pa.us
* Unify SIGHUP handling between normal and walsender backends.Andres Freund2017-06-05
| | | | | | | | | | | | | | | | | | | | Because walsender and normal backends share the same main loop it's problematic to have two different flag variables, set in signal handlers, indicating a pending configuration reload. Only certain walsender commands reach code paths checking for the variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT ... LOGICAL, notably not base backups). This is a bug present since the introduction of walsender, but has gotten worse in releases since then which allow walsender to do more. A later patch, not slated for v10, will similarly unify SIGHUP handling in other types of processes as well. Author: Petr Jelinek, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de Backpatch: 9.2-, bug is present since 9.0
* Prevent possibility of panics during shutdown checkpoint.Andres Freund2017-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the checkpointer writes the shutdown checkpoint, it checks afterwards whether any WAL has been written since it started and throws a PANIC if so. At that point, only walsenders are still active, so one might think this could not happen, but walsenders can also generate WAL, for instance in BASE_BACKUP and logical decoding related commands (e.g. via hint bits). So they can trigger this panic if such a command is run while the shutdown checkpoint is being written. To fix this, divide the walsender shutdown into two phases. First, checkpointer, itself triggered by postmaster, sends a PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend is idle or runs an SQL query this causes the backend to shutdown, if logical replication is in progress all existing WAL records are processed followed by a shutdown. Otherwise this causes the walsender to switch to the "stopping" state. In this state, the walsender will reject any further replication commands. The checkpointer begins the shutdown checkpoint once all walsenders are confirmed as stopping. When the shutdown checkpoint finishes, the postmaster sends us SIGUSR2. This instructs walsender to send any outstanding WAL, including the shutdown checkpoint record, wait for it to be replicated to the standby, and then exit. Author: Andres Freund, based on an earlier patch by Michael Paquier Reported-By: Fujii Masao, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de Backpatch: 9.4, where logical decoding was introduced
* Have walsenders participate in procsignal infrastructure.Andres Freund2017-06-05
| | | | | | | | | | | | | | | | | The non-participation in procsignal was a problem for both changes in master, e.g. parallelism not working for normal statements run in walsender backends, and older branches, e.g. recovery conflicts and catchup interrupts not working for logical decoding walsenders. This commit thus replaces the previous WalSndXLogSendHandler with procsignal_sigusr1_handler. In branches since db0f6cad48 that can lead to additional SetLatch calls, but that only rarely seems to make a difference. Author: Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170421014030.fdzvvvbrz4nckrow@alap3.anarazel.de Backpatch: 9.4, earlier commits don't seem to benefit sufficiently
* Assorted translatable string fixesAlvaro Herrera2017-06-04
| | | | | Mark our rusage reportage string translatable; remove quotes from type names; unify formatting of very similar messages.
* Try to ensure that stats collector's receive buffer size is at least 100KB.Tom Lane2017-05-29
| | | | | | Back-patch of commit 8b0b6303e991079726e83d17401405e94da11564. Discussion: https://postgr.es/m/22173.1494788088@sss.pgh.pa.us
* Allow NumericOnly to be "+ FCONST".Tom Lane2017-05-29
| | | | | | | | | | | | | | | | | | | | | The NumericOnly grammar production accepted ICONST, + ICONST, - ICONST, FCONST, and - FCONST, but for some reason not + FCONST. This led to strange inconsistencies like regression=# set random_page_cost = +4; SET regression=# set random_page_cost = 4000000000; SET regression=# set random_page_cost = +4000000000; ERROR: syntax error at or near "4000000000" (because 4000000000 is too large to be an ICONST). While there's no actual functional reason to need to write a "+", if we allow it for integers it seems like we should allow it for numerics too. It's been like that forever, so back-patch to all supported branches. Discussion: https://postgr.es/m/30908.1496006184@sss.pgh.pa.us
* Move autogenerated array types out of the way during ALTER ... RENAME.Tom Lane2017-05-26
| | | | | | | | | | | | Commit 9aa3c782c added code to allow CREATE TABLE/CREATE TYPE to not fail when the desired type name conflicts with an autogenerated array type, by dint of renaming the array type out of the way. But I (tgl) overlooked that the same case arises in ALTER TABLE/TYPE RENAME. Fix that too. Back-patch to all supported branches. Report and patch by Vik Fearing, modified a bit by me Discussion: https://postgr.es/m/0f4ade49-4f0b-a9a3-c120-7589f01d1eb8@2ndquadrant.com
* Tighten checks for whitespace in functions that parse identifiers etc.Tom Lane2017-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces isspace() calls with scanner_isspace() in functions that are likely to be presented with non-ASCII input. isspace() has the small advantage that it will correctly recognize no-break space in single-byte encodings (such as LATIN1); but it cannot work successfully for any multibyte character, and depending on platform it might return false positive results for some fragments of multibyte characters. That's disastrous for functions that are trying to discard whitespace between valid strings, as noted in bug #14662 from Justin Muise. Even treating no-break space as whitespace is pretty questionable for the usages touched here, because the core scanner would think it is an identifier character. Affected functions are parse_ident(), parseNameAndArgTypes (underlying regprocedurein() and siblings), SplitIdentifierString (used for parsing GUCs and options that are qualified names or lists of names), and SplitDirectoriesString (used for parsing GUCs that are lists of directories). All the functions adjusted here are parsing SQL identifiers and similar constructs, so it's reasonable to insist that their definition of whitespace match the core scanner. So we can hope that this won't cause many backwards-compatibility problems. I've left alone isspace() calls in places that aren't really expecting any non-ASCII input characters, such as float8in(). Back-patch to all supported branches. Discussion: https://postgr.es/m/10129.1495302480@sss.pgh.pa.us
* Fix precision and rounding issues in money multiplication and division.Tom Lane2017-05-21
| | | | | | | | | | | | | | | | | | | | | | | The cash_div_intX functions applied rint() to the result of the division. That's not merely useless (because the result is already an integer) but it causes precision loss for values larger than 2^52 or so, because of the forced conversion to float8. On the other hand, the cash_mul_fltX functions neglected to apply rint() to their multiplication results, thus possibly causing off-by-one outputs. Per C standard, arithmetic between any integral value and a float value is performed in float format. Thus, cash_mul_flt4 and cash_div_flt4 produced answers good to only about six digits, even when the float value is exact. We can improve matters noticeably by widening the float inputs to double. (It's tempting to consider using "long double" arithmetic if available, but that's probably too much of a stretch for a back-patched fix.) Also, document that cash_div_intX operators truncate rather than round. Per bug #14663 from Richard Pistole. Back-patch to all supported branches. Discussion: https://postgr.es/m/22403.1495223615@sss.pgh.pa.us
* Fix typo in comment.Heikki Linnakangas2017-05-18
| | | | Daniel Gustafsson
* Fix new warnings from GCC 7Peter Eisentraut2017-05-16
| | | | | This addresses the new warning types -Wformat-truncation -Wformat-overflow that are part of -Wall, via -Wformat, in GCC 7.
* Fix unsafe reference into relcache in constructed CommentStmt.Tom Lane2017-05-15
| | | | | | | | | | | | | | The CommentStmt made by RebuildConstraintComment() has to pstrdup the relation name, else it will contain a dangling pointer after that relcache entry is flushed. (I'm less sure that pstrdup'ing conname is necessary, but let's be safe.) Failure to do this leads to weird errors or crashes, as reported by Marko Elezovic. Bug introduced by commit e42375fc8, so back-patch to 9.5 as that was. Fix by David Rowley, regression test by Michael Paquier Discussion: https://postgr.es/m/DB6PR03MB30775D58E732D4EB0C13725B9AE00@DB6PR03MB3077.eurprd03.prod.outlook.com
* Avoid superfluous work for commits during logical slot creation.Andres Freund2017-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Before 955a684e0401 logical decoding snapshot maintenance needed to cope with transactions it might not have seen in their entirety. For such transactions we'd to assume they modified the catalog (could have happened before we were watching), and thus a new snapshot had to be built, and distributed to concurrently running transactions. That's problematic because building a new snapshot isn't that cheap , especially as the the array of committed transactions needs to be sorted. When creating a slot on a server with a lot of transactions, this could make logical slot creation infeasibly expensive. After 955a684e0401 there's no need to deal with transaction that aren't guaranteed to be fully observable. That allows to avoid building snapshots for transactions that haven't modified catalog, even before reaching consistency. While this isn't necessarily a bugfix, slot creation being impossible in some production workloads, is severe enough to warrant backpatching. Author: Andres Freund, based on a quite different patch from Petr Jelinek Analyzed-By: Petr Jelinek Reviewed-By: Petr Jelinek Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backpatch: 9.4-, where logical decoding has been introduced
* Fix race condition leading to hanging logical slot creation.Andres Freund2017-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The snapshot assembly during the creation of logical slots relied waiting for transactions in xl_running_xacts to end, by checking for their commit/abort records. Unfortunately, despite locking, it is possible to see an xl_running_xact record listing transactions as ready, that have already WAL-logged an commit/abort record, as the locking just prevents the ProcArray to be adjusted, and the commit record has to be logged first. That lead to either delayed or hanging snapshot creation, because snapbuild.c would wait "forever" to see commit/abort records for some transactions. That hang resolved only if a xl_running_xacts record without any running transactions happened to be logged, far from certain on a busy server. It's impractical to prevent that via more heavyweight locking, the likelihood of deadlocks and significantly increased contention would be too big. Instead change the initial snapshot creation to be solely based on tracking the oldest running transaction via xl_running_xacts->oldestRunningXid - that actually ends up significantly simplifying the code. That has two disadvantages: 1) Because we cannot fully "trust" the contents of xl_running_xacts, we cannot use it to build the initial snapshot. Instead we have to wait twice for all running transactions to finish. 2) Previously a slot, unless the race occurred, could be created when the all transaction perceived as running based on commit/abort records, now we have to wait for the next xl_running_xacts record. To address that, trigger logging new xl_running_xacts record from within snapbuild.c exactly when necessary. Unfortunately snabuild.c's SnapBuild is stored on disk, one of the stupider ideas of a certain Mr Freund, so we can't change it in a minor release. As this is going to be backpatched, we have to hack around a bit to keep on-disk compatibility. A later commit will rejigger that on master. Author: Andres Freund, based on a quite different patch from Petr Jelinek Analyzed-By: Petr Jelinek Reviewed-By: Petr Jelinek Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backpatch: 9.4-, where logical decoding has been introduced
* Avoid searching for callback functions in CallSyscacheCallbacks().Tom Lane2017-05-12
| | | | | | | | | | | | | | | | | | | | | | | | | We have now grown enough registerable syscache-invalidation callback functions that the original assumption that there would be few of them is causing performance problems. In particular, let's fix things so that CallSyscacheCallbacks doesn't have to search the whole array to find which callback(s) to invoke for a given cache ID. Preserve the original behavior that callbacks are called in order of registration, just in case there's someplace that depends on that (which I doubt). In support of this, export the number of syscaches from syscache.h. People could have found that out anyway from the enum, but adding a #define makes that much safer. This provides a useful additional speedup in Mathieu Fenniak's logical-decoding test case, although we're reaching the point of diminishing returns there. I think any further improvement will have to come from reducing the number of cache invalidations that are triggered in the first place. Still, we can hope that this change gives some incremental benefit for all invalidation scenarios. Back-patch to 9.4 where logical decoding was introduced. Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
* Reduce initial size of RelfilenodeMapHash.Tom Lane2017-05-12
| | | | | | | | | | | | | | | | | | | | | A test case provided by Mathieu Fenniak shows that hash_seq_search'ing this hashtable can consume a very significant amount of overhead during logical decoding, which triggers frequent cache invalidation. Testing suggests that the actual population of the hashtable is often no more than a few dozen entries, so we can cut the overhead just by dropping the initial number of buckets down from 1024 --- I chose to cut it to 64. (In situations where we do have a significant number of entries, we shouldn't get any real penalty from doing this, as the dynahash.c code will resize the hashtable automatically.) This gives a further factor-of-two savings in Mathieu's test case. That may be overly optimistic for real-world benefit, as real cases may have larger average table populations, but it's hard to see it turning into a net negative for any workload. Back-patch to 9.4 where relfilenodemap.c was introduced. Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
* Avoid searching for the target catcache in CatalogCacheIdInvalidate.Tom Lane2017-05-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | A test case provided by Mathieu Fenniak shows that the initial search for the target catcache in CatalogCacheIdInvalidate consumes a very significant amount of overhead in cases where cache invalidation is triggered but has little useful work to do. There is no good reason for that search to exist at all, as the index array maintained by syscache.c allows direct lookup of the catcache from its ID. We just need a frontend function in syscache.c, matching the division of labor for most other cache-accessing operations. While there's more that can be done in this area, this patch alone reduces the runtime of Mathieu's example by 2X. We can hope that it offers some useful benefit in other cases too, although usually cache invalidation overhead is not such a striking fraction of the total runtime. Back-patch to 9.4 where logical decoding was introduced. It might be worth going further back, but presently the only case we know of where cache invalidation is really a significant burden is in logical decoding. Also, older branches have fewer catcaches, reducing the possible benefit. (Note: although this nominally changes catcache's API, we have always documented CatalogCacheIdInvalidate as a private function, so I would have little sympathy for an external module calling it directly. So backpatching should be fine.) Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
* Increase MAX_SYSCACHE_CALLBACKS to provide more room for extensions.Tom Lane2017-05-11
| | | | | | | | | | | | | | | | | | Increase from the historical value of 32 to 64. We are up to 31 callers of CacheRegisterSyscacheCallback() in HEAD, so if they were all to be exercised in one process that would leave only one slot for add-on modules. It's probably not possible for that to happen, but still we clearly need more daylight here. (At some point it might be worth making the array dynamically resizable; but since we've never heard a complaint of "out of syscache_callback_list slots" happening in the field, I doubt it's worth it yet.) Back-patch as far as 9.4, which is where we increased the companion limit MAX_RELCACHE_CALLBACKS (cf commit f01d1ae3a). It's not as urgent in released branches, which have only a couple dozen call sites in core, but it still seems that somebody might hit the limit before these branches die. Discussion: https://postgr.es/m/12184.1494450131@sss.pgh.pa.us
* Further patch rangetypes_selfuncs.c's statistics slot management.Tom Lane2017-05-08
| | | | | | | | | | | | | | | Values in a STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM slot are float8, not of the type of the column the statistics are for. This bug is at least partly the fault of sloppy specification comments for get_attstatsslot()/free_attstatsslot(): the type OID they want is that of the stavalues entries, not of the underlying column. (I double-checked other callers and they seem to get this right.) Adjust the comments to be more correct. Per buildfarm. Security: CVE-2017-7484
* Fix possibly-uninitialized variable.Tom Lane2017-05-08
| | | | | | Oversight in e2d4ef8de et al (my fault not Peter's). Per buildfarm. Security: CVE-2017-7484
* Match pg_user_mappings limits to information_schema.user_mapping_options.Noah Misch2017-05-08
| | | | | | | | | | | | | | | Both views replace the umoptions field with NULL when the user does not meet qualifications to see it. They used different qualifications, and pg_user_mappings documented qualifications did not match its implemented qualifications. Make its documentation and implementation match those of user_mapping_options. One might argue for stronger qualifications, but these have long, documented tenure. pg_user_mappings has always exhibited this problem, so back-patch to 9.2 (all supported versions). Michael Paquier and Feike Steenbergen. Reviewed by Jeff Janes. Reported by Andrew Wheelwright. Security: CVE-2017-7486
* Translation updatesPeter Eisentraut2017-05-08
| | | | | Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 14c4b5cb0f9330a9397159979c48e7076fa856d8
* Add security checks to selectivity estimation functionsPeter Eisentraut2017-05-08
| | | | | | | | | | | | | | | | | | | | | Some selectivity estimation functions run user-supplied operators over data obtained from pg_statistic without security checks, which allows those operators to leak pg_statistic data without having privileges on the underlying tables. Fix by checking that one of the following is satisfied: (1) the user has table or column privileges on the table underlying the pg_statistic data, or (2) the function implementing the user-supplied operator is leak-proof. If neither is satisfied, planning will proceed as if there are no statistics available. At least one of these is satisfied in most cases in practice. The only situations that are negatively impacted are user-defined or not-leak-proof operators on a security-barrier view. Reported-by: Robert Haas <robertmhaas@gmail.com> Author: Peter Eisentraut <peter_e@gmx.net> Author: Tom Lane <tgl@sss.pgh.pa.us> Security: CVE-2017-7484
* RLS: Fix ALL vs. SELECT+UPDATE policy usageStephen Frost2017-05-06
| | | | | | | | | | | | | | | | | | | | When we add the SELECT-privilege based policies to the RLS with check options (such as for an UPDATE statement, or when we have INSERT ... RETURNING), we need to be sure and use the 'USING' case if the policy is actually an 'ALL' policy (which could have both a USING clause and an independent WITH CHECK clause). This could result in policies acting differently when built using ALL (when the ALL had both USING and WITH CHECK clauses) and when building the policies independently as SELECT and UPDATE policies. Fix this by adding an explicit boolean to add_with_check_options() to indicate when the USING policy should be used, even if the policy has both USING and WITH CHECK policies on it. Reported by: Rod Taylor Back-patch to 9.5 where RLS was introduced.
* Fix cursor_to_xml in tableforest false modePeter Eisentraut2017-05-04
| | | | | | | | | | | | | It only produced <row> elements but no wrapping <table> element. By contrast, cursor_to_xmlschema produced a schema that is now correct but did not previously match the XML data produced by cursor_to_xml. In passing, also fix a minor misunderstanding about moving cursors in the tests related to this. Reported-by: filip@jirsak.org Based-on-patch-by: Thomas Munro <thomas.munro@enterprisedb.com>
* Fix pfree-of-already-freed-tuple when rescanning a GiST index-only scan.Tom Lane2017-05-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | GiST's getNextNearest() function attempts to pfree the previously-returned tuple if any (that is, scan->xs_hitup in HEAD, or scan->xs_itup in older branches). However, if we are rescanning a plan node after ending a previous scan early, those tuple pointers could be pointing to garbage, because they would be pointing into the scan's pageDataCxt or queueCxt which has been reset. In a debug build this reliably results in a crash, although I think it might sometimes accidentally fail to fail in production builds. To fix, clear the pointer field anyplace we reset a context it might be pointing into. This may be overkill --- I think probably only the queueCxt case is involved in this bug, so that resetting in gistrescan() would be sufficient --- but dangling pointers are generally bad news, so let's avoid them. Another plausible answer might be to just not bother with the pfree in getNextNearest(). The reconstructed tuples would go away anyway in the context resets, and I'm far from convinced that freeing them a bit earlier really saves anything meaningful. I'll stick with the original logic in this patch, but if we find more problems in the same area we should consider that approach. Per bug #14641 from Denis Smirnov. Back-patch to 9.5 where this logic was introduced. Discussion: https://postgr.es/m/20170504072034.24366.57688@wrigleys.postgresql.org
* Ensure commands in extension scripts see the results of preceding DDL.Tom Lane2017-05-02
| | | | | | | | | | | | | Due to a missing CommandCounterIncrement() call, parsing of a non-utility command in an extension script would not see the effects of the immediately preceding DDL command, unless that command's execution ends with CommandCounterIncrement() internally ... which some do but many don't. Report by Philippe Beaudoin, diagnosis by Julien Rouhaud. Rather remarkably, this bug has evaded detection since extensions were invented, so back-patch to all supported branches. Discussion: https://postgr.es/m/2cf7941e-4e41-7714-3de8-37b1a8f74dff@free.fr
* Fix VALIDATE CONSTRAINT to consider NO INHERIT attribute.Robert Haas2017-04-28
| | | | | | | | | | Currently, trying to validate a NO INHERIT constraint on the parent will search for the constraint in child tables (where it is not supposed to exist), wrongly causing a "constraint does not exist" error. Amit Langote, per a report from Hans Buschmann. Discussion: http://postgr.es/m/20170421184012.24362.19@wrigleys.postgresql.org
* Don't use on-disk snapshots for exported logical decoding snapshot.Andres Freund2017-04-27
| | | | | | | | | | | | | | | | | | | | | Logical decoding stores historical snapshots on disk, so that logical decoding can restart without having to reconstruct a snapshot from scratch (for which the resources are not guaranteed to be present anymore). These serialized snapshots were also used when creating a new slot via the walsender interface, which can export a "full" snapshot (i.e. one that can read all tables, not just catalog ones). The problem is that the serialized snapshots are only useful for catalogs and not for normal user tables. Thus the use of such a serialized snapshot could result in an inconsistent snapshot being exported, which could lead to queries returning wrong data. This would only happen if logical slots are created while another logical slot already exists. Author: Petr Jelinek Reviewed-By: Andres Freund Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb214@2ndquadrant.com Backport: 9.4, where logical decoding was introduced.
* Preserve required !catalog tuples while computing initial decoding snapshot.Andres Freund2017-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The logical decoding machinery already preserved all the required catalog tuples, which is sufficient in the course of normal logical decoding, but did not guarantee that non-catalog tuples were preserved during computation of the initial snapshot when creating a slot over the replication protocol. This could cause a corrupted initial snapshot being exported. The time window for issues is usually not terribly large, but on a busy server it's perfectly possible to it hit it. Ongoing decoding is not affected by this bug. To avoid increased overhead for the SQL API, only retain additional tuples when a logical slot is being created over the replication protocol. To do so this commit changes the signature of CreateInitDecodingContext(), but it seems unlikely that it's being used in an extension, so that's probably ok. In a drive-by fix, fix handling of ReplicationSlotsComputeRequiredXmin's already_locked argument, which should only apply to ProcArrayLock, not ReplicationSlotControlLock. Reported-By: Erik Rijkers Analyzed-By: Petr Jelinek Author: Petr Jelinek, heavily editorialized by Andres Freund Reviewed-By: Andres Freund Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com Backport: 9.4, where logical decoding was introduced.
* Fix postmaster's handling of fork failure for a bgworker process.Tom Lane2017-04-24
| | | | | | | | | | | | | | | | | | | | | This corner case didn't behave nicely at all: the postmaster would (partially) update its state as though the process had started successfully, and be quite confused thereafter. Fix it to act like the worker had crashed, instead. In passing, refactor so that do_start_bgworker contains all the state-change logic for bgworker launch, rather than just some of it. Back-patch as far as 9.4. 9.3 contains similar logic, but it's just enough different that I don't feel comfortable applying the patch without more study; and the use of bgworkers in 9.3 was so small that it doesn't seem worth the extra work. transam/parallel.c is still entirely unprepared for the possibility of bgworker startup failure, but that seems like material for a separate patch. Discussion: https://postgr.es/m/4905.1492813727@sss.pgh.pa.us
* Repair crash with unsortable data in grouping sets.Andrew Gierth2017-04-24
| | | | | | | | | | | Previously the code would generate incorrect results, assertion failures, or crashes if given unsortable (but hashable) columns in grouping sets. Handle by throwing an error instead. Report and patch by Pavan Deolasee (though I changed the error wording slightly); regression test by me. (This affects 9.5 only since the planner was refactored in 9.6.)
* Zero padding in replication origin's checkpointed on disk-state.Andres Freund2017-04-23
| | | | | | | | | | | | | | | | This seems to be largely cosmetic, avoiding valgrind bleats and the like. The uninitialized padding influences the CRC of the on-disk entry, but because it's also used when verifying the CRC, that doesn't cause spurious failures. Backpatch nonetheless. It's a bit unfortunate that contrib/test_decoding/sql/replorigin.sql doesn't exercise the checkpoint path, but checkpoints are fairly expensive on weaker machines, and we'd have to stop/start for that to be meaningful. Author: Andres Freund Discussion: https://postgr.es/m/20170422183123.w2jgiuxtts7qrqaq@alap3.anarazel.de Backpatch: 9.5, where replication origins were introduced
* Fix order of arguments to SubTransSetParent().Tom Lane2017-04-23
| | | | | | | | | | | | | | ProcessTwoPhaseBuffer (formerly StandbyRecoverPreparedTransactions) mixed up the parent and child XIDs when calling SubTransSetParent to record the transactions' relationship in pg_subtrans. Remarkably, analysis by Simon Riggs suggests that this doesn't lead to visible problems (at least, not in non-Assert builds). That might explain why we'd not noticed it before. Nonetheless, it's surely wrong. This code was born broken, so back-patch to all supported branches. Discussion: https://postgr.es/m/20110.1492905318@sss.pgh.pa.us
* Avoid depending on non-POSIX behavior of fcntl(2).Tom Lane2017-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The POSIX standard does not say that the success return value for fcntl(F_SETFD) and fcntl(F_SETFL) is zero; it says only that it's not -1. We had several calls that were making the stronger assumption. Adjust them to test specifically for -1 for strict spec compliance. The standard further leaves open the possibility that the O_NONBLOCK flag bit is not the only active one in F_SETFL's argument. Formally, therefore, one ought to get the current flags with F_GETFL and store them back with only the O_NONBLOCK bit changed when trying to change the nonblock state. In port/noblock.c, we were doing the full pushup in pg_set_block but not in pg_set_noblock, which is just weird. Make both of them do it properly, since they have little business making any assumptions about the socket they're handed. The other places where we're issuing F_SETFL are working with FDs we just got from pipe(2), so it's reasonable to assume the FDs' properties are all default, so I didn't bother adding F_GETFL steps there. Also, while pg_set_block deserves some points for trying to do things right, somebody had decided that it'd be even better to cast fcntl's third argument to "long". Which is completely loony, because POSIX clearly says the third argument for an F_SETFL call is "int". Given the lack of field complaints, these missteps apparently are not of significance on any common platforms. But they're still wrong, so back-patch to all supported branches. Discussion: https://postgr.es/m/30882.1492800880@sss.pgh.pa.us
* Always build a custom plan node's targetlist from the path's pathtarget.Tom Lane2017-04-17
| | | | | | | | | | | | | | | | | | | | | | | We were applying the use_physical_tlist optimization to all relation scan plans, even those implemented by custom scan providers. However, that's a bad idea for a couple of reasons. The custom provider might be unable to provide columns that it hadn't expected to be asked for (for example, the custom scan might depend on an index-only scan). Even more to the point, there's no good reason to suppose that this "optimization" is a win for a custom scan; whatever the custom provider is doing is likely not based on simply returning physical heap tuples. (As a counterexample, if the custom scan is an interface to a column store, demanding all columns would be a huge loss.) If it is a win, the custom provider could make that decision for itself and insert a suitable pathtarget into the path, anyway. Per discussion with Dmitry Ivanov. Back-patch to 9.5 where custom scan support was introduced. The argument that the custom provider can adjust the behavior by changing the pathtarget only applies to 9.6+, but on balance it seems more likely that use_physical_tlist will hurt custom scans than help them. Discussion: https://postgr.es/m/e29ddd30-8ef9-4da5-a50b-2bb7b8c7198d@postgrespro.ru
* Fix compiler warningPeter Eisentraut2017-04-16
| | | | | Introduced by 087e696f066d31cb2f1269a1296a13dfe0bf7a11, happens with gcc 4.7.2.
* Provide a way to control SysV shmem attach address in EXEC_BACKEND builds.Tom Lane2017-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | In standard non-Windows builds, there's no particular reason to care what address the kernel chooses to map the shared memory segment at. However, when building with EXEC_BACKEND, there's a risk that the chosen address won't be available in all child processes. Linux with ASLR enabled (which it is by default) seems particularly at risk because it puts shmem segments into the same area where it maps shared libraries. We can work around that by specifying a mapping address that's outside the range where shared libraries could get mapped. On x86_64 Linux, 0x7e0000000000 seems to work well. This is only meant for testing/debugging purposes, so it doesn't seem necessary to go as far as providing a GUC (or any user-visible documentation, though we might change that later). Instead, it's just controlled by setting an environment variable PG_SHMEM_ADDR to the desired attach address. Back-patch to all supported branches, since the point here is to remove intermittent buildfarm failures on EXEC_BACKEND animals. Owners of affected animals will need to add a suitable setting of PG_SHMEM_ADDR to their build_env configuration. Discussion: https://postgr.es/m/7036.1492231361@sss.pgh.pa.us
* Fix regexport.c to behave sanely with lookaround constraints.Tom Lane2017-04-13
| | | | | | | | | | | | | | | | | | | | regexport.c thought it could just ignore LACON arcs, but the correct behavior is to treat them as satisfiable while consuming zero input (rather reminiscently of commit 9f1e642d5). Otherwise, the emitted simplified-NFA representation may contain no paths leading from initial to final state, which unsurprisingly confuses pg_trgm, as seen in bug #14623 from Jeff Janes. Since regexport's output representation has no concept of an arc that consumes zero input, recurse internally to find the next normal arc(s) after any LACON transitions. We'd be forced into changing that representation if a LACON could be the last arc reaching the final state, but fortunately the regex library never builds NFAs with such a configuration, so there always is a next normal arc. Back-patch to 9.3 where this logic was introduced. Discussion: https://postgr.es/m/20170413180503.25948.94871@wrigleys.postgresql.org
* Remove dead code and fix comments in fast-path function handling.Heikki Linnakangas2017-04-06
| | | | | | | | | | | | | | | | HandleFunctionRequest() is no longer responsible for reading the protocol message from the client, since commit 2b3a8b20c2. Fix the outdated comments. HandleFunctionRequest() now always returns 0, because the code that used to return EOF was moved in 2b3a8b20c2. Therefore, the caller no longer needs to check the return value. Reported by Andres Freund. Backpatch to all supported versions, even though this doesn't have any user-visible effect, to make backporting future patches in this area easier. Discussion: https://www.postgresql.org/message-id/20170405010525.rt5azbya5fkbhvrx@alap3.anarazel.de
* Fix integer-overflow problems in interval comparison.Tom Lane2017-04-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using integer timestamps, the interval-comparison functions tried to compute the overall magnitude of an interval as an int64 number of microseconds. As reported by Frazer McLean, this overflows for intervals exceeding about 296000 years, which is bad since we nominally allow intervals many times larger than that. That results in wrong comparison results, and possibly in corrupted btree indexes for columns containing such large interval values. To fix, compute the magnitude as int128 instead. Although some compilers have native support for int128 calculations, many don't, so create our own support functions that can do 128-bit addition and multiplication if the compiler support isn't there. These support functions are designed with an eye to allowing the int128 code paths in numeric.c to be rewritten for use on all platforms, although this patch doesn't do that, or even provide all the int128 primitives that will be needed for it. Back-patch as far as 9.4. Earlier releases did not guard against overflow of interval values at all (commit 146604ec4 fixed that), so it seems not very exciting to worry about overly-large intervals for them. Before 9.6, we did not assume that unreferenced "static inline" functions would not draw compiler warnings, so omit functions not directly referenced by timestamp.c, the only present consumer of int128.h. (We could have omitted these functions in HEAD too, but since they were written and debugged on the way to the present patch, and they look likely to be needed by numeric.c, let's keep them in HEAD.) I did not bother to try to prevent such warnings in a --disable-integer-datetimes build, though. Before 9.5, configure will never define HAVE_INT128, so the part of int128.h that exploits a native int128 implementation is dead code in the 9.4 branch. I didn't bother to remove it, thinking that keeping the file looking similar in different branches is more useful. In HEAD only, add a simple test harness for int128.h in src/tools/. In back branches, this does not change the float-timestamps code path. That's not subject to the same kind of overflow risk, since it computes the interval magnitude as float8. (No doubt, when this code was originally written, overflow was disregarded for exactly that reason.) There is a precision hazard instead :-(, but we'll avert our eyes from that question, since no complaints have been reported and that code's deprecated anyway. Kyotaro Horiguchi and Tom Lane Discussion: https://postgr.es/m/1490104629.422698.918452336.26FA96B7@webmail.messagingengine.com