aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
* Update code commentsPeter Eisentraut2017-06-09
| | | | Author: Neha Khatri <nehakhatri5@gmail.com>
* Fix typoPeter Eisentraut2017-06-09
| | | | Author: Masahiko Sawada <sawada.mshk@gmail.com>
* psql: Update tab completion for ALTER SUBSCRIPTIONPeter Eisentraut2017-06-09
| | | | Author: Masahiko Sawada <sawada.mshk@gmail.com>
* Improve tablesync behavior with concurrent changesPeter Eisentraut2017-06-09
| | | | | | | | When a table is removed from a subscription before the tablesync worker could start, this would previously result in an error when reading pg_subscription_rel. Now we just ignore this. Author: Masahiko Sawada <sawada.mshk@gmail.com>
* Give a better error message on invalid hostaddr option.Heikki Linnakangas2017-06-09
| | | | | | | | | | | | | | | If you accidentally pass a host name in the hostaddr option, e.g. hostaddr=localhost, you get an error like: psql: could not translate host name "localhost" to address: Name or service not known That's a bit confusing, because it implies that we tried to look up "localhost" in DNS, but it failed. To make it more clear that we tried to parse "localhost" as a numeric network address, change the message to: psql: could not parse network address "localhost": Name or service not known Discussion: https://www.postgresql.org/message-id/10badbc6-4d5a-a769-623a-f7ada43e14dd@iki.fi
* Fix script name in README.Heikki Linnakangas2017-06-09
| | | | | The script was rewritten in Perl, and renamed from regress.sh to regress.pl, back in 2012.
* Use standard interrupt handling in logical replication launcher.Andres Freund2017-06-08
| | | | | | | | | | | | | | | Previously the exit handling was only able to exit from within the main loop, and not from within the backend code it calls. Fix that by using the standard die() SIGTERM handler, and adding the necessary CHECK_FOR_INTERRUPTS() call. This requires adding yet another process-type-specific branch to ProcessInterrupts(), which hints that we probably should generalize that handling. But that's work for another day. Author: Petr Jelinek Reviewed-By: Andres Freund Discussion: https://postgr.es/m/fe072153-babd-3b5d-8052-73527a6eb657@2ndquadrant.com
* Again report a useful error message when walreceiver's connection closes.Andres Freund2017-06-08
| | | | | | | | | | | | | | | | | | | | | | Since 7c4f52409a8c (merged in v10), a shutdown master is reported as FATAL: unexpected result after CommandComplete: server closed the connection unexpectedly by walsender. It used to be LOG: replication terminated by primary server FATAL: could not send end-of-streaming message to primary: no COPY in progress while the old message clearly is not perfect, it's definitely better than what's reported now. The change comes from the attempt to handle finished COPYs without erroring out, needed for the new logical replication, which wasn't needed before. There's probably better ways to handle this, but for now just explicitly check for a closed connection. Author: Petr Jelinek Reviewed-By: Andres Freund Discussion: https://postgr.es/m/f7c7dd08-855c-e4ed-41f4-d064a6c0665a@2ndquadrant.com Backpatch: -
* Mark to_tsvector(regconfig,json[b]) functions immutableAndrew Dunstan2017-06-08
| | | | | | | | | This make them consistent with the text function and means they can be used in functional indexes. Catalog version bumped. Per gripe from Josh Berkus.
* Fix bit-rot in pg_upgrade's test.sh, and improve documentation.Tom Lane2017-06-08
| | | | | | | | | | | | | | | | | | | | | | | | | Doing a cross-version upgrade test with test.sh evidently hasn't been tested since circa 9.2, because the script lacked case branches for old-version servers newer than 9.1. Future-proof that a bit, and clean up breakage induced by our recent drop of V0 function call protocol (namely that oldstyle_length() isn't in the regression suite anymore). (This isn't enough to make the test work perfectly cleanly across versions, but at least it finishes and provides dump files that you can diff manually. One issue I didn't touch is that we might want to execute the "reindex_hash.sql" file in the new DB before dumping it, so that the hash indexes don't vanish from the dump.) Improve the TESTING doc file: put the tl;dr version at the top not the bottom, and bring its explanation of how to run a cross-version test up to speed, since the installcheck target isn't there and won't be resurrected. Improve the comment in the Makefile about why not. In passing, teach .gitignore and "make clean" about a couple more junk output files. Discussion: https://postgr.es/m/14058.1496892482@sss.pgh.pa.us
* Improve authentication error messages.Heikki Linnakangas2017-06-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the improvements were in the new SCRAM code: * In SCRAM protocol violation messages, use errdetail to provide the details. * If pg_backend_random() fails, throw an ERROR rather than just LOG. We shouldn't continue authentication if we can't generate a random nonce. * Use ereport() rather than elog() for the "invalid SCRAM verifier" messages. They shouldn't happen, if everything works, but it's not inconceivable that someone would have invalid scram verifiers in pg_authid, e.g. if a broken client application was used to generate the verifier. But this change applied to old code: * Use ERROR rather than COMMERROR for protocol violation errors. There's no reason to not tell the client what they did wrong. The client might be confused already, so that it cannot read and display the error correctly, but let's at least try. In the "invalid password packet size" case, we used to actually continue with authentication anyway, but that is now a hard error. Patch by Michael Paquier and me. Thanks to Daniel Varrazzo for spotting the typo in one of the messages that spurred the discussion and these larger changes. Discussion: https://www.postgresql.org/message-id/CA%2Bmi_8aZYLhuyQi1Jo0hO19opNZ2OEATEOM5fKApH7P6zTOZGg%40mail.gmail.com
* Put new command-line options in alphabetical orderPeter Eisentraut2017-06-08
|
* Add statistics subdirectory to Makefile.Robert Haas2017-06-08
| | | | | | | | Commit 7b504eb282ca2f5104b5c00b4f05a3ef6bb1385b overlooked this. Report and patch by Kyotaro Horiguchi Discussion: http://postgr.es/m/20170608.145852.54673832.horiguchi.kyotaro@lab.ntt.co.jp
* Fix updating of pg_subscription_rel from workersPeter Eisentraut2017-06-07
| | | | | | | | | | A logical replication worker should not insert new rows into pg_subscription_rel, only update existing rows, so that there are no races if a concurrent refresh removes rows. Adjust the API to be able to choose that behavior. Author: Masahiko Sawada <sawada.mshk@gmail.com> Reported-by: tushar <tushar.ahuja@enterprisedb.com>
* Prevent BEFORE triggers from violating partitioning constraints.Robert Haas2017-06-07
| | | | | | | | | | | | | | | | | | | | | Since tuple-routing implicitly checks the partitioning constraints at least for the levels of the partitioning hierarchy it traverses, there's normally no need to revalidate the partitioning constraint after performing tuple routing. However, if there's a BEFORE trigger on the target partition, it could modify the tuple, causing the partitioning constraint to be violated. Catch that case. Also, instead of checking the root table's partition constraint after tuple-routing, check it beforehand. Otherwise, the rules for when the partitioning constraint gets checked get too complicated, because you sometimes have to check part of the constraint but not all of it. This effectively reverts commit 39162b2030fb0a35a6bb28dc636b5a71b8df8d1c in favor of a different approach altogether. Report by me. Initial debugging by Jeevan Ladhe. Patch by Amit Langote, reviewed by me. Discussion: http://postgr.es/m/CA+Tgmoa9DTgeVOqopieV8d1QRpddmP65aCdxyjdYDoEO5pS5KA@mail.gmail.com
* Clear auth context correctly when re-connecting after failed auth attempt.Heikki Linnakangas2017-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If authentication over an SSL connection fails, with sslmode=prefer, libpq will reconnect without SSL and retry. However, we did not clear the variables related to GSS, SSPI, and SASL authentication state, when reconnecting. Because of that, the second authentication attempt would always fail with a "duplicate GSS/SASL authentication request" error. pg_SSPI_startup did not check for duplicate authentication requests like the corresponding GSS and SASL functions, so with SSPI, you would leak some memory instead. Another way this could manifest itself, on version 10, is if you list multiple hostnames in the "host" parameter. If the first server requests Kerberos or SCRAM authentication, but it fails, the attempts to connect to the other servers will also fail with "duplicate authentication request" errors. To fix, move the clearing of authentication state from closePGconn to pgDropConnection, so that it is cleared also when re-connecting. Patch by Michael Paquier, with some kibitzing by me. Backpatch down to 9.3. 9.2 has the same bug, but the code around closing the connection is somewhat different, so that this patch doesn't apply. To fix this in 9.2, I think we would need to back-port commit 210eb9b743 first, and then apply this patch. However, given that we only bumped into this in our own testing, we haven't heard any reports from users about this, and that 9.2 will be end-of-lifed in a couple of months anyway, it doesn't seem worth the risk and trouble. Discussion: https://www.postgresql.org/message-id/CAB7nPqRuOUm0MyJaUy9L3eXYJU3AKCZ-0-03=-aDTZJGV4GyWw@mail.gmail.com
* Fix double-free bug in GSS authentication.Heikki Linnakangas2017-06-07
| | | | | | | | | | | | | | | | | | | | | | | The logic to free the buffer after the gss_init_sec_context() call was always a bit wonky. Because gss_init_sec_context() sets the GSS context variable, conn->gctx, we would in fact always attempt to free the buffer. That only works, because previously conn->ginbuf.value was initialized to NULL, and free(NULL) is a no-op. Commit 61bf96cab0 refactored things so that the GSS input token buffer is allocated locally in pg_GSS_continue, and not held in the PGconn object. After that, the now-local ginbuf.value variable isn't initialized when it's not used, so we pass a bogus pointer to free(). To fix, only try to free the input buffer if we allocated it. That was the intention, certainly after the refactoring, and probably even before that. But because there's no live bug before the refactoring, I refrained from backpatching this. The bug was also independently reported by Graham Dutton, as bug #14690. Patch reviewed by Michael Paquier. Discussion: https://www.postgresql.org/message-id/6288d80e-a0bf-d4d3-4e12-7b79c77f1771%40iki.fi Discussion: https://www.postgresql.org/message-id/20170605130954.1438.90535%40wrigleys.postgresql.org
* Consistently use subscription name as application namePeter Eisentraut2017-06-06
| | | | | | | The logical replication apply worker uses the subscription name as application name, except for table sync. This was incorrectly set to use the replication slot name, which might be different, in one case. Also add a comment why the other case is different.
* Clean up latch related code.Andres Freund2017-06-06
| | | | | | | | | | | | | | | | | | | | | | The larger part of this patch replaces usages of MyProc->procLatch with MyLatch. The latter works even early during backend startup, where MyProc->procLatch doesn't yet. While the affected code shouldn't run in cases where it's not initialized, it might get copied into places where it might. Using MyLatch is simpler and a bit faster to boot, so there's little point to stick with the previous coding. While doing so I noticed some weaknesses around newly introduced uses of latches that could lead to missed events, and an omitted CHECK_FOR_INTERRUPTS() call in worker_spi. As all the actual bugs are in v10 code, there doesn't seem to be sufficient reason to backpatch this. Author: Andres Freund Discussion: https://postgr.es/m/20170606195321.sjmenrfgl2nu6j63@alap3.anarazel.de https://postgr.es/m/20170606210405.sim3yl6vpudhmufo@alap3.anarazel.de Backpatch: -
* Improve handover logic between sync and apply workersPeter Eisentraut2017-06-06
| | | | | | | | | | | | Make apply busy wait check the catalog instead of shmem state to ensure that next transaction will see the expected table synchronization state. Also make the handover always go through same set of steps to make the overall process easier to understand and debug. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Tested-by: Mark Kirkwood <mark.kirkwood@catalyst.net.nz> Tested-by: Erik Rijkers <er@xs4all.nl>
* Fix some cases of "the the" split across two lines.Robert Haas2017-06-06
| | | | | | | | Kevin Grittner observed that 2186b608b3cb859fe0ec04015a5c4e4cbf69caed introduced a new occurence of this by copying existing text, and I found a few more cases using grep. Discussion: http://postgr.es/m/CADAecHWfG-K+YvocHCkrXV-ycm+eUOaaUVfYZNOnwf0pSmuQCw@mail.gmail.com
* Use NIL rather than NULL to represent an empty list.Robert Haas2017-06-06
| | | | | | | | Just to be tidy. Amit Langote Discussion: http://postgr.es/m/9297f80f-e4ab-7dda-33d4-8580bab6d634@lab.ntt.co.jp
* Clean up partcollation handling for OID 0.Robert Haas2017-06-06
| | | | | | | | | | | | Consistent with what we do for indexes, we shouldn't try to record dependencies on collation OID 0 or the default collation OID (which is pinned). Also, the fact that indcollation and partcollation can contain zero OIDs when the data type is not collatable should be documented. Amit Langote, per a complaint from me. Discussion: http://postgr.es/m/CA+Tgmoba5mtPgM3NKfG06vv8na5gGbVOj0h4zvivXQwLw8wXXQ@mail.gmail.com
* Wire up query cancel interrupt for walsender backends.Andres Freund2017-06-05
| | | | | | | | | | | This allows to cancel commands run over replication connections. While it might have some use before v10, it has become important now that normal SQL commands are allowed in database connected walsender connections. Author: Petr Jelinek Reviewed-By: Andres Freund, Michael Paquier Discussion: https://postgr.es/m/7966f454-7cd7-2b0c-8b70-cdca9d5a8c97@2ndquadrant.com
* Unify SIGHUP handling between normal and walsender backends.Andres Freund2017-06-05
| | | | | | | | | | | | | | | | | | | | Because walsender and normal backends share the same main loop it's problematic to have two different flag variables, set in signal handlers, indicating a pending configuration reload. Only certain walsender commands reach code paths checking for the variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT ... LOGICAL, notably not base backups). This is a bug present since the introduction of walsender, but has gotten worse in releases since then which allow walsender to do more. A later patch, not slated for v10, will similarly unify SIGHUP handling in other types of processes as well. Author: Petr Jelinek, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de Backpatch: 9.2-, bug is present since 9.0
* Prevent possibility of panics during shutdown checkpoint.Andres Freund2017-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the checkpointer writes the shutdown checkpoint, it checks afterwards whether any WAL has been written since it started and throws a PANIC if so. At that point, only walsenders are still active, so one might think this could not happen, but walsenders can also generate WAL, for instance in BASE_BACKUP and logical decoding related commands (e.g. via hint bits). So they can trigger this panic if such a command is run while the shutdown checkpoint is being written. To fix this, divide the walsender shutdown into two phases. First, checkpointer, itself triggered by postmaster, sends a PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend is idle or runs an SQL query this causes the backend to shutdown, if logical replication is in progress all existing WAL records are processed followed by a shutdown. Otherwise this causes the walsender to switch to the "stopping" state. In this state, the walsender will reject any further replication commands. The checkpointer begins the shutdown checkpoint once all walsenders are confirmed as stopping. When the shutdown checkpoint finishes, the postmaster sends us SIGUSR2. This instructs walsender to send any outstanding WAL, including the shutdown checkpoint record, wait for it to be replicated to the standby, and then exit. Author: Andres Freund, based on an earlier patch by Michael Paquier Reported-By: Fujii Masao, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de Backpatch: 9.4, where logical decoding was introduced
* Have walsenders participate in procsignal infrastructure.Andres Freund2017-06-05
| | | | | | | | | | | | | | | | | The non-participation in procsignal was a problem for both changes in master, e.g. parallelism not working for normal statements run in walsender backends, and older branches, e.g. recovery conflicts and catchup interrupts not working for logical decoding walsenders. This commit thus replaces the previous WalSndXLogSendHandler with procsignal_sigusr1_handler. In branches since db0f6cad48 that can lead to additional SetLatch calls, but that only rarely seems to make a difference. Author: Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170421014030.fdzvvvbrz4nckrow@alap3.anarazel.de Backpatch: 9.4, earlier commits don't seem to benefit sufficiently
* Revert "Prevent panic during shutdown checkpoint"Andres Freund2017-06-05
| | | | | | | | | | | This reverts commit 086221cf6b1727c2baed4703c582f657b7c5350e, which was made to master only. The approach implemented in the above commit has some issues. While those could easily be fixed incrementally, doing so would make backpatching considerably harder, so instead first revert this patch. Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
* Don't set application_name in logical replication workersPeter Eisentraut2017-06-05
| | | | | | This was bothering some people because it's not the intended use of application_name and it makes the default view of pg_stat_activity bulky.
* Fix ALTER SUBSCRIPTION grammar ambiguityPeter Eisentraut2017-06-05
| | | | | | | | | There was a grammar ambiguity between SET PUBLICATION name REFRESH and SET PUBLICATION SKIP REFRESH, because SKIP is not a reserved word. To resolve that, fold the refresh choice into the WITH options. Refreshing is the default now. Reported-by: tushar <tushar.ahuja@enterprisedb.com>
* Ignore WL_POSTMASTER_DEATH latch event in single user modePeter Eisentraut2017-06-05
| | | | | | | Otherwise code that uses this will abort with an assertion failure, because postmaster_alive_fds are not initialized. Reported-by: tushar <tushar.ahuja@enterprisedb.com>
* Fix thinko in previous openssl changeAndrew Dunstan2017-06-05
|
* Fix record length computation in pg_waldump/xlogdump.Andres Freund2017-06-05
| | | | | | | | | | | | | | | | The current method of computing the record length (excluding the lenght of full-page images) has been wrong since the WAL format has been revamped in 2c03216d831160bedd72d45f712601b6f7d03f1c. Only the main record's length was counted, but that can be significantly too little if there's data associated with further blocks. Fix by computing the record length as total_lenght - fpi_length. Reported-By: Chen Huajun Bug: #14687 Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/20170603165939.1436.58887@wrigleys.postgresql.org Backpatch: 9.5-
* Code review for shm_toc.h/.c.Tom Lane2017-06-05
| | | | | | | | | | | | | | | | | | | | Declare the toc_nentry field as uint32 not Size. Since shm_toc_lookup() reads the field without any lock, it has to be atomically readable, and we do not assume that for fields wider than 32 bits. Performance would be impossibly bad for entry counts approaching 2^32 anyway, so there is no need to try to preserve maximum width here. This is probably an academic issue, because even if reading int64 isn't atomic, the high order half would never change in practice. Still, it's a coding rule violation, so let's fix it. Adjust some other not-terribly-well-chosen data types too, and copy-edit some comments. Make shm_toc_attach's Asserts consistent with shm_toc_create's. None of this looks to be a live bug, so no need for back-patch. Discussion: https://postgr.es/m/16984.1496679541@sss.pgh.pa.us
* Find openssl lib files in right directory for MSVCAndrew Dunstan2017-06-05
| | | | | | | | | Some openssl builds put their lib files in a VC subdirectory, others do not. Cater for both cases. Backpatch to all live branches. From an offline discussion with Leonardo Cecchi.
* Don't be so trusting that shm_toc_lookup() will always succeed.Tom Lane2017-06-05
| | | | | | | | | | | | | | | | | | Given the possibility of race conditions and so on, it seems entirely unsafe to just assume that shm_toc_lookup() always finds the key it's looking for --- but that was exactly what all but one call site were doing. To fix, add a "bool noError" argument, similarly to what we have in many other functions, and throw an error on an unexpected lookup failure. Remove now-redundant Asserts that a rather random subset of call sites had. I doubt this will throw any light on buildfarm member lorikeet's recent failures, because if an unnoticed lookup failure were involved, you'd kind of expect a null-pointer-dereference crash rather than the observed symptom. But you never know ... and this is better coding practice even if it never catches anything. Discussion: https://postgr.es/m/9697.1496675981@sss.pgh.pa.us
* Fix typo in error message.Heikki Linnakangas2017-06-05
| | | | | | Daniele Varrazzo Discussion: https://www.postgresql.org/message-id/CA+mi_8bqY5THP8hLKKSdMEr5GCz6M=hD6_uLbvFeyEBfwqUxeA@mail.gmail.com
* Fix comments in simplehash.h.Heikki Linnakangas2017-06-05
| | | | | | Jeff Janes and me. Discussion: https://www.postgresql.org/message-id/CAMkU=1zYnniLYg+W9itL93DXebCjx6Uk6m_=Xa8p_zM65X3S0Q@mail.gmail.com
* Replace over-optimistic Assert in partitioning code with a runtime test.Tom Lane2017-06-04
| | | | | | | | | | | | | | | | get_partition_parent felt that it could simply Assert that systable_getnext found a tuple. This is unlike any other caller of that function, and it's unsafe IMO --- in fact, the reason I noticed it was that the Assert failed. (OK, I was working with known-inconsistent catalog contents, but I wasn't expecting the DB to fall over quite that violently. The behavior in a non-assert-enabled build wouldn't be very nice, either.) Fix it to do what other callers do, namely an actual runtime-test-and-elog. Also, standardize the wording of elog messages that are complaining about unexpected failure of systable_getnext. 90% of them say "could not find tuple for <object>", so make the remainder do likewise. Many of the holdouts were using the phrasing "cache lookup failed", which is outright misleading since no catcache search is involved.
* #ifdef out assorted unused GEQO code.Tom Lane2017-06-04
| | | | | | | | | | | | | | I'd always assumed that backend/optimizer/geqo/'s remarkably poor showing on code coverage metrics was because we weren't exercising it much in the regression tests. But it turns out that a good chunk of the problem is that there's a bunch of code that is physically unreachable (because the calls to it are #ifdef'd out in geqo_main.c) but is being built anyway. Making the called code have #if guards similar to the calling code saves a couple of kilobytes of executable size and should make the coverage numbers more reflective of reality. It's arguable that we should just delete all the unused recombination mechanisms altogether, but I didn't feel a need to go that far today.
* Disallow CREATE INDEX if table is already in use in current session.Tom Lane2017-06-04
| | | | | | | | | | | | | | If we allow this, whatever outer command has the table open will not know about the new index and may fail to update it as needed, as shown in a report from Laurenz Albe. We already had such a prohibition in place for ALTER TABLE, but the CREATE INDEX syntax missed the check. Fixing it requires an API change for DefineIndex(), which conceivably would break third-party extensions if we were to back-patch it. Given how long this problem has existed without being noticed, fixing it in the back branches doesn't seem worth that risk. Discussion: https://postgr.es/m/A737B7A37273E048B164557ADEF4A58B53A4DC9A@ntex2010i.host.magwien.gv.at
* Assorted translatable string fixesAlvaro Herrera2017-06-04
| | | | | Mark our rusage reportage string translatable; remove quotes from type names; unify formatting of very similar messages.
* Remove dead variables.Tom Lane2017-06-03
| | | | | Commit 512c7356b left a couple of variables unused except for being set. My compiler didn't whine about this, but some buildfarm members did.
* Add some missing backslash commands to psql's tab-completion knowledge.Tom Lane2017-06-03
| | | | | | | | | | | | | \if and related commands were overlooked here, as were \dRp and \dRs from the logical-replication patch, as was \?. While here, reformat the list to put each new first command letter on a separate line; perhaps that will limit the need to reflow the whole list when we add more commands in future. Masahiko Sawada (reformatting by me) Discussion: https://postgr.es/m/CAD21AoDW1QHtBsM33hV+Fg2mYEs+FWj4qtoCU72AwHAXQ3U6ZQ@mail.gmail.com
* Fix <> and pattern-NOT-match estimators to handle nulls correctly.Tom Lane2017-06-03
| | | | | | | | | | | | | | | | | | | | | | | | These estimators returned 1 minus the corresponding equality/match estimate, which is incorrect: we need to subtract off the fraction of nulls in the column, since those are neither equal nor not equal to the comparison value. The error only becomes obvious if the nullfrac is large, but it could be very bad in a mostly-nulls column, as reported in bug #14676 from Marko Tiikkaja. To fix the <> case, refactor eqsel() and neqsel() to call a common support routine, which can be made to account for nullfrac correctly. The pattern-match cases were already factored that way, and it was simply an oversight that patternsel() wasn't subtracting off nullfrac. neqjoinsel() has a similar problem, but since we're elsewhere discussing changing its behavior entirely, I left it alone for now. This is a very longstanding bug, but I'm hesitant to back-patch a fix for it. Given the lack of prior complaints, such cases must not come up often, so it's probably not worth the risk of destabilizing plans in stable branches. Discussion: https://postgr.es/m/20170529153847.4275.95416@wrigleys.postgresql.org
* Fix old corner-case logic error in final_cost_nestloop().Tom Lane2017-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When costing a nestloop with stop-at-first-inner-match semantics, and a non-indexscan inner path, final_cost_nestloop() wants to charge the full scan cost of the inner rel at least once, with additional scans charged at inner_rescan_run_cost which might be less. However the logic for doing this effectively assumed that outer_matched_rows is at least 1. If it's zero, which is not unlikely for a small outer rel, we ended up charging inner_run_cost plus N times inner_rescan_run_cost, as much as double the correct charge for an outer rel with only one row that we're betting won't be matched. (Unless the inner rel is materialized, in which case it has very small inner_rescan_run_cost and the cost is not so far off what it should have been.) The upshot of this was that the planner had a tendency to select plans that failed to make effective use of the stop-at-first-inner-match semantics, and that might have Materialize nodes in them even when the predicted number of executions of the Materialize subplan was only 1. This was not so obvious before commit 9c7f5229a, because the case only arose in connection with semi/anti joins where there's not freedom to reverse the join order. But with the addition of unique-inner joins, it could result in some fairly bad planning choices, as reported by Teodor Sigaev. Indeed, some of the test cases added by that commit have plans that look dubious on closer inspection, and are changed by this patch. Fix the logic to ensure that we don't charge for too many inner scans. I chose to adjust it so that the full-freight scan cost is associated with an unmatched outer row if possible, not a matched one, since that seems like a better model of what would happen at runtime. This is a longstanding bug, but given the lesser impact in back branches, and the lack of field complaints, I won't risk a back-patch. Discussion: https://postgr.es/m/CAKJS1f-LzkUsFxdJ_-Luy38orQ+AdEXM5o+vANR+-pHAWPSecg@mail.gmail.com
* Receive invalidation messages correctly in tablesync workerPeter Eisentraut2017-06-03
| | | | | | | | | | | We didn't accept any invalidation messages until the whole sync process had finished (because it flattens all the remote transactions in the single one). So the sync worker didn't learn about subscription changes/drop until it has finished. This could lead to "orphaned" sync workers. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
* Make tablesync worker exit when apply dies while it was waiting for itPeter Eisentraut2017-06-03
| | | | | | | | | This avoids "orphaned" sync workers. This was caused by a thinko in wait_for_sync_status_change. Author: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
* Allow parallelism in COPY (query) TO ...;Andres Freund2017-06-02
| | | | | | | | | | | | | Previously this was not allowed, as copy.c didn't set the CURSOR_OPT_PARALLEL_OK flag when planning the query. Set it. While the lack of parallel query for COPY isn't strictly speaking a bug, it does prevent parallelism from being used in a facility commonly used to run long running queries. Thus backpatch to 9.6. Author: Andres Freund Discussion: https://postgr.es/m/20170531231958.ihanapplorptykzm@alap3.anarazel.de Backpatch: 9.6, where parallelism was introduced.
* Remove replication slot name check from ReplicationSlotAcquire()Peter Eisentraut2017-06-02
| | | | | | | When trying to access a replication slot that is supposed to already exist, we don't need to check the naming rules again. If the slot does not exist, we will then get a "does not exist" error message, which is generally more useful from the perspective of an end user.