aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Add missing_ok option to the SQL functions for reading files.Heikki Linnakangas2015-06-28
| | | | | | | | | | | | | | | This makes it possible to use the functions without getting errors, if there is a chance that the file might be removed or renamed concurrently. pg_rewind needs to do just that, although this could be useful for other purposes too. (The changes to pg_rewind to use these functions will come in a separate commit.) The read_binary_file() function isn't very well-suited for extensions.c's purposes anymore, if it ever was. So bite the bullet and make a copy of it in extension.c, tailored for that use case. This seems better than the accidental code reuse, even if it's a some more lines of code. Michael Paquier, with plenty of kibitzing by me.
* Fix comment for GetCurrentIntegerTimestamp().Kevin Grittner2015-06-28
| | | | | | The unit of measure is microseconds, not milliseconds. Backpatch to 9.3 where the function and its comment were added.
* Avoid passing NULL to memcmp() in lookups of zero-argument functions.Tom Lane2015-06-27
| | | | | | | | | | | | | | | | | | | | A few places assumed they could pass NULL for the argtypes array when looking up functions known to have zero arguments. At first glance it seems that this should be safe enough, since memcmp() is surely not allowed to fetch any bytes if its count argument is zero. However, close reading of the C standard says that such calls have undefined behavior, so we'd probably best avoid it. Since the number of places doing this is quite small, and some other places looking up zero-argument functions were already passing dummy arrays, let's standardize on the latter solution rather than hacking the function lookup code to avoid calling memcmp() in these cases. I also added Asserts to catch any future violations of the new rule. Given the utter lack of any evidence that this actually causes any problems in the field, I don't feel a need to back-patch this change. Per report from Piotr Stefaniak, though this is not his patch.
* Fix typo in commentHeikki Linnakangas2015-06-27
| | | | Etsuro Fujita
* Avoid hot standby cancels from VAC FREEZESimon Riggs2015-06-27
| | | | | | | | | | | | VACUUM FREEZE generated false cancelations of standby queries on an otherwise idle master. Caused by an off-by-one error on cutoff_xid which goes back to original commit. Backpatch to all versions 9.0+ Analysis and report by Marco Nenciarini Bug fix by Simon Riggs
* Fix DDL command collection for TRANSFORMAlvaro Herrera2015-06-26
| | | | | | | | Commit b488c580ae, which added the DDL command collection feature, neglected to update the code that commit cac76582053e had previously added two weeks earlier for the TRANSFORM feature. Reported by Michael Paquier.
* Fix BRIN xlog replayAlvaro Herrera2015-06-26
| | | | | | | | There was a confusion about which block number to use when storing an item's pointer in the revmap -- the revmap page's blkno was being used, not the data page's blkno. Spotted-by: Jeff Janes
* Be more conservative about removing tablespace "symlinks".Robert Haas2015-06-26
| | | | | | | | | | Don't apply rmtree(), which will gleefully remove an entire subtree, and don't even apply unlink() unless it's symlink or a directory, the only things that we expect to find. Amit Kapila, with minor tweaks by me, per extensive discussions involving Andrew Dunstan, Fujii Masao, and Heikki Linnakangas, at least some of whom also reviewed the code.
* Don't warn about creating temporary or unlogged hash indexes.Robert Haas2015-06-26
| | | | | | | Warning people that no WAL-logging will be done doesn't make sense in this case. Michael Paquier
* Reduce log level for background worker events from LOG to DEBUG1.Robert Haas2015-06-26
| | | | | | | Per discussion, LOG is just too chatty for something that will happen as routinely as this. Pavel Stehule
* Fix the fallback memory barrier implementation to be reentrant.Andres Freund2015-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | This was essentially "broken" since 0c8eda62; but until more recently (14e8803f) barriers usage in signal handlers was infrequent. The failure to be reentrant was noticed because the test_shm_mq, which uses memory barriers at a high frequency, occasionally got stuck on some solaris buildfarm animals. Turns out, those machines use sun studio 12.1, which doesn't yet have efficient memory barrier support. A machine with a newer sun studio did not fail. Forcing the barrier fallback to be used on x86 allows to reproduce the problem. The new fallback is to use kill(PostmasterPid, 0) based on the theory that that'll always imply a barrier due to checking the liveliness of PostmasterPid on systems old enough to need fallback support. It's hard to come up with a good and performant fallback. I'm not backpatching this for now - the problem isn't active in the back branches, and we haven't backpatched barrier changes for now. Additionally master looks entirely different than the back branches due to the new atomics abstraction. It seems better to let this rest in master, where the non-reentrancy actively causes a problem, and then consider backpatching. Found-By: Robert Haas Discussion: 55626265.3060800@dunslane.net
* Improve handling of CustomPath/CustomPlan(State) children.Robert Haas2015-06-26
| | | | | | | | | | Allow CustomPath to have a list of paths, CustomPlan a list of plans, and CustomPlanState a list of planstates known to the core system, so that custom path/plan providers can more reasonably use this infrastructure for nodes with multiple children. KaiGai Kohei, per a design suggestion from Tom Lane, with some further kibitzing by me.
* Fix a couple of bugs with wal_log_hints.Heikki Linnakangas2015-06-26
| | | | | | | | | | | | | | 1. Replay of the WAL record for setting a bit in the visibility map contained an assertion that a full-page image of that record type can only occur with checksums enabled. But it can also happen with wal_log_hints, so remove the assertion. Unlike checksums, wal_log_hints can be changed on the fly, so it would be complicated to figure out if it was enabled at the time that the WAL record was generated. 2. wal_log_hints has the same effect on the locking needed to read the LSN of a page as data checksums. BufferGetLSNAtomic() didn't get the memo. Backpatch to 9.4, where wal_log_hints was added.
* Allow background workers to connect to no particular database.Robert Haas2015-06-25
| | | | | | | The documentation claims that this is supported, but it didn't actually work. Fix that. Reported by Pavel Stehule; patch by me.
* Fix the logic for putting relations into the relcache init file.Tom Lane2015-06-25
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit f3b5565dd4e59576be4c772da364704863e6a835 was a couple of bricks shy of a load; specifically, it missed putting pg_trigger_tgrelid_tgname_index into the relcache init file, because that index is not used by any syscache. However, we have historically nailed that index into cache for performance reasons. The upshot was that load_relcache_init_file always decided that the init file was busted and silently ignored it, resulting in a significant hit to backend startup speed. To fix, reinstantiate RelationIdIsInInitFile() as a wrapper around RelationSupportsSysCache(), which can know about additional relations that should be in the init file despite being unknown to syscache.c. Also install some guards against future mistakes of this type: make write_relcache_init_file Assert that all nailed relations get written to the init file, and make load_relcache_init_file emit a WARNING if it takes the "wrong number of nailed relations" exit path. Now that we remove the init files during postmaster startup, that case should never occur in the field, even if we are starting a minor-version update that added or removed rels from the nailed set. So the warning shouldn't ever be seen by end users, but it will show up in the regression tests if somebody breaks this logic. Back-patch to all supported branches, like the previous commit.
* Update get_relation_info comment.Robert Haas2015-06-23
| | | | Thomas Munro
* Improve inheritance_planner()'s performance for large inheritance sets.Tom Lane2015-06-22
| | | | | | | | | | | | | | | | | Commit c03ad5602f529787968fa3201b35c119bbc6d782 introduced a planner performance regression for UPDATE/DELETE on large inheritance sets. It required copying the append_rel_list (which is of size proportional to the number of inherited tables) once for each inherited table, thus resulting in O(N^2) time and memory consumption. While it's difficult to avoid that in general, the extra work only has to be done for append_rel_list entries that actually reference subquery RTEs, which inheritance-set entries will not. So we can buy back essentially all of the loss in cases without subqueries in FROM; and even for those, the added work is mainly proportional to the number of UNION ALL subqueries. Back-patch to 9.2, like the previous commit. Tom Lane and Dean Rasheed, per a complaint from Thomas Munro.
* Add transforms to pg_get_object_address and friendsAlvaro Herrera2015-06-21
| | | | | | | This was missed when transforms were added by commit cac76582053ef8e. Extracted from a larger patch Author: Michael Paquier
* Improve multixact emergency autovacuum logic.Andres Freund2015-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | Previously autovacuum was not necessarily triggered if space in the members slru got tight. The first problem was that the signalling was tied to values in the offsets slru, but members can advance much faster. Thats especially a problem if old sessions had been around that previously prevented the multixact horizon to increase. Secondly the skipping logic doesn't work if the database was restarted after autovacuum was triggered - that knowledge is not preserved across restart. This is especially a problem because it's a common panic-reaction to restart the database if it gets slow to anti-wraparound vacuums. Fix the first problem by separating the logic for members from offsets. Trigger autovacuum whenever a multixact crosses a segment boundary, as the current member offset increases in irregular values, so we can't use a simple modulo logic as for offsets. Add a stopgap for the second problem, by signalling autovacuum whenver ERRORing out because of boundaries. Discussion: 20150608163707.GD20772@alap3.anarazel.de Backpatch into 9.3, where it became more likely that multixacts wrap around.
* Add missing check for wal_debug GUC.Andres Freund2015-06-21
| | | | | | | | | | 9a20a9b2 added a new elog(), enabled when WAL_DEBUG is defined. The other WAL_DEBUG dependant messages check for the wal_debug GUC, but this one did not. While at it replace 'upto' with 'up to'. Discussion: 20150610110253.GF3832@alap3.anarazel.de Backpatch to 9.4, the first release containing 9a20a9b2.
* Fix failure to copy setlocale() return value.Noah Misch2015-06-20
| | | | | | | | | | | | | | | | | | | | | | | | | | POSIX permits setlocale() calls to invalidate any previous setlocale() return values, but commit 5f538ad004aa00cf0881f179f0cde789aad4f47e neglected to account for setlocale(LC_CTYPE, NULL) doing so. The effect was to set the LC_CTYPE environment variable to an unintended value. pg_perm_setlocale() sets this variable to assist PL/Perl; without it, Perl would undo PostgreSQL's locale settings. The known-affected configurations are 32-bit, release builds using Visual Studio 2012 or Visual Studio 2013. Visual Studio 2010 is unaffected, as were all buildfarm-attested configurations. In principle, this bug could leave the wrong LC_CTYPE in effect after PL/Perl use, which could in turn facilitate problems like corrupt tsvector datums. No known platform experiences that consequence, because PL/Perl on Windows does not use this environment variable. The bug has been user-visible, as early postmaster failure, on systems with Windows ANSI code page set to CP936 for "Chinese (Simplified, PRC)" and probably on systems using other multibyte code pages. (SetEnvironmentVariable() rejects values containing character data not valid under the Windows ANSI code page.) Back-patch to 9.4, where the faulty commit first appeared. Reported by Didi Hu and 林鹏程. Reviewed by Tom Lane, though this fix strategy was not his first choice.
* Revert "Detect setlocale(LC_CTYPE, NULL) clobbering previous return values."Noah Misch2015-06-20
| | | | | This reverts commit b76e76be460a240e99c33f6fb470dd1d5fe01a2a. The buildfarm yielded no related failures.
* Fix thinko in comment (launcher -> worker)Alvaro Herrera2015-06-20
|
* In immediate shutdown, postmaster should not exit till children are gone.Tom Lane2015-06-19
| | | | | | | | | | | | | | | | | | | | This adjusts commit 82233ce7ea42d6ba519aaec63008aff49da6c7af so that the postmaster does not exit until all its child processes have exited, even if the 5-second timeout elapses and we have to send SIGKILL. There is no great value in having the postmaster process quit sooner, and doing so can mislead onlookers into thinking that the cluster is fully terminated when actually some child processes still survive. This effect might explain recent test failures on buildfarm member hamster, wherein we failed to restart a cluster just after shutting it down with "pg_ctl stop -m immediate". I also did a bit of code review/beautification, including fixing a faulty use of the Max() macro on a volatile expression. Back-patch to 9.4. In older branches, the postmaster never waited for children to exit during immediate shutdowns, and changing that would be too much of a behavioral change.
* Clamp autovacuum launcher sleep time to 5 minutesAlvaro Herrera2015-06-19
| | | | | | | | | | | | | | | This avoids the problem that it might go to sleep for an unreasonable amount of time in unusual conditions like the server clock moving backwards an unreasonable amount of time. (Simply moving the server clock forward again doesn't solve the problem unless you wake up the autovacuum launcher manually, say by sending it SIGHUP). Per trouble report from Prakash Itnal in https://www.postgresql.org/message-id/CAHC5u79-UqbapAABH2t4Rh2eYdyge0Zid-X=Xz-ZWZCBK42S0Q@mail.gmail.com Analyzed independently by Haribabu Kommi and Tom Lane.
* Fix bogus range_table_mutator() logic for RangeTblEntry.tablesample.Tom Lane2015-06-19
| | | | | | | Must make a copy of the TableSampleClause node; the previous coding modified the input data structure in-place. Petr Jelinek
* Fix corner case in autovacuum-forcing logic for multixact wraparound.Robert Haas2015-06-19
| | | | | | | | | | | | | | | Since find_multixact_start() relies on SimpleLruDoesPhysicalPageExist(), and that function looks only at the on-disk state, it's possible for it to fail to find a page that exists in the in-memory SLRU that has not been written yet. If that happens, SetOffsetVacuumLimit() will erroneously decide to force emergency autovacuuming immediately. We should probably fix find_multixact_start() to consider the data cached in memory as well as on the on-disk state, but that's no excuse for SetOffsetVacuumLimit() to be stupid about the case where it can no longer read the value after having previously succeeded in doing so. Report by Andres Freund.
* Detect setlocale(LC_CTYPE, NULL) clobbering previous return values.Noah Misch2015-06-17
| | | | | | | | POSIX permits setlocale() calls to invalidate any previous setlocale() return values. Commit 5f538ad004aa00cf0881f179f0cde789aad4f47e neglected to account for that. In advance of fixing that bug, switch to failing hard on affected configurations. This is a planned temporary commit to assay buildfarm-represented configurations.
* Fix "path" infrastructure bug affecting jsonb_set()Andrew Dunstan2015-06-12
| | | | | | | | | | | | | | jsonb_set() and other clients of the setPathArray() utility function could get spurious results when an array integer subscript is provided that is not within the range of int. To fix, ensure that the value returned by strtol() within setPathArray() is within the range of int; when it isn't, assume an invalid input in line with existing, similar cases. The path-orientated operators that appeared in PostgreSQL 9.3 and 9.4 do not call setPathArray(), and already independently take this precaution, so no change there. Peter Geoghegan
* Improve error message and hint for ALTER COLUMN TYPE can't-cast failure.Tom Lane2015-06-12
| | | | | | | | | | We already tried to improve this once, but the "improved" text was rather off-target if you had provided a USING clause. Also, it seems helpful to provide the exact text of a suggested USING clause, so users can just copy-and-paste it when needed. Per complaint from Keith Rarick and a suggestion from Merlin Moncure. Back-patch to 9.2 where the current wording was adopted.
* Make postmaster restart archiver soon after it dies, even during recovery.Fujii Masao2015-06-12
| | | | | | | | | | | | | | After the archiver dies, postmaster tries to start a new one immediately. But previously this could happen only while server was running normally even though archiving was enabled always (i.e., archive_mode was set to always). So the archiver running during recovery could not restart soon after it died. This is an oversight in commit ffd3774. This commit changes reaper(), postmaster's signal handler to cleanup after a child process dies, so that it tries to a new archiver even during recovery if necessary. Patch by me. Review by Alvaro Herrera.
* Fix alphabetization in catalogs.sgml.Fujii Masao2015-06-12
| | | | | | | System catalogs and views should be listed alphabetically in catalog.sgml, but only pg_file_settings view not. This patch also fixes typos in pg_file_settings comments.
* Fix typoPeter Eisentraut2015-06-10
|
* Report more information if pg_perm_setlocale() fails at startup.Tom Lane2015-06-09
| | | | | | We don't know why a few Windows users have seen this fail, but the taciturnity of the error message certainly isn't helping debug it. Let's at least find out which LC category isn't working.
* Fix typosAlvaro Herrera2015-06-08
| | | | | | | tablesapce -> tablespace there -> their These were introduced in 72d422a52, so no need to backpatch.
* Refactor WAL segment copying code.Fujii Masao2015-06-09
| | | | | | | | | | | | | | | | | | | | * Remove unused argument "dstfname" and related code from XLogFileCopy(). * Previously XLogFileCopy() returned a pstrdup'd string so that InstallXLogFileSegment() used it later. Since the pstrdup'd string was never free'd, there could be a risk of memory leak. It was almost harmless because the startup process exited just after calling XLogFileCopy(), it existed. This commit changes XLogFileCopy() so that it directly calls InstallXLogFileSegment() and doesn't call pstrdup() at all. Which fixes that memory leak problem. * Extend InstallXLogFileSegment() so that the caller can specify the log level. Which allows us to emit an error when InstallXLogFileSegment() fails a disk file access like link() and rename(). Previously it was always logged with LOG level and additionally needed to be logged with ERROR when we wanted to treat it as an error. Michael Paquier
* Allow HotStandbyActiveInReplay() to be called in single user mode.Andres Freund2015-06-08
| | | | | | | | | | | | HotStandbyActiveInReplay, introduced in 061b079f, only allowed WAL replay to happen in the startup process, missing the single user case. This buglet is fairly harmless as it only causes problems when single user mode in an assertion enabled build is used to replay a btree vacuum record. Backpatch to 9.2. 061b079f was backpatched further, but the assertion was not.
* Desupport jsonb subscript deletion on objectsAndrew Dunstan2015-06-07
| | | | | | | | | | | | | | Supporting deletion of JSON pairs within jsonb objects using an array-style integer subscript allowed for surprising outcomes. This was mostly due to the implementation-defined ordering of pairs within objects for jsonb. It also seems desirable to make jsonb integer subscript deletion consistent with the 9.4 era general purpose integer subscripting operator for jsonb (although that operator returns NULL when an object is encountered, while we prefer here to throw an error). Peter Geoghegan, following discussion on -hackers.
* Use a safer method for determining whether relcache init file is stale.Tom Lane2015-06-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we invalidate the relcache entry for a system catalog or index, we must also delete the relcache "init file" if the init file contains a copy of that rel's entry. The old way of doing this relied on a specially maintained list of the OIDs of relations present in the init file: we made the list either when reading the file in, or when writing the file out. The problem is that when writing the file out, we included only rels present in our local relcache, which might have already suffered some deletions due to relcache inval events. In such cases we correctly decided not to overwrite the real init file with incomplete data --- but we still used the incomplete initFileRelationIds list for the rest of the current session. This could result in wrong decisions about whether the session's own actions require deletion of the init file, potentially allowing an init file created by some other concurrent session to be left around even though it's been made stale. Since we don't support changing the schema of a system catalog at runtime, the only likely scenario in which this would cause a problem in the field involves a "vacuum full" on a catalog concurrently with other activity, and even then it's far from easy to provoke. Remarkably, this has been broken since 2002 (in commit 786340441706ac1957a031f11ad1c2e5b6e18314), but we had never seen a reproducible test case until recently. If it did happen in the field, the symptoms would probably involve unexpected "cache lookup failed" errors to begin with, then "could not open file" failures after the next checkpoint, as all accesses to the affected catalog stopped working. Recovery would require manually removing the stale "pg_internal.init" file. To fix, get rid of the initFileRelationIds list, and instead consult syscache.c's list of relations used in catalog caches to decide whether a relation is included in the init file. This should be a tad more efficient anyway, since we're replacing linear search of a list with ~100 entries with a binary search. It's a bit ugly that the init file contents are now so directly tied to the catalog caches, but in practice that won't make much difference. Back-patch to all supported branches.
* Fix incorrect order of database-locking operations in InitPostgres().Tom Lane2015-06-05
| | | | | | | | | | | | | | | | | | | | | | | | We should set MyProc->databaseId after acquiring the per-database lock, not beforehand. The old way risked deadlock against processes trying to copy or delete the target database, since they would first acquire the lock and then wait for processes with matching databaseId to exit; that left a window wherein an incoming process could set its databaseId and then block on the lock, while the other process had the lock and waited in vain for the incoming process to exit. CountOtherDBBackends() would time out and fail after 5 seconds, so this just resulted in an unexpected failure not a permanent lockup, but it's still annoying when it happens. A real-world example of a use-case is that short-duration connections to a template database should not cause CREATE DATABASE to fail. Doing it in the other order should be fine since the contract has always been that processes searching the ProcArray for a database ID must hold the relevant per-database lock while searching. Thus, this actually removes the former race condition that required an assumption that storing to MyProc->databaseId is atomic. It's been like this for a long time, so back-patch to all active branches.
* Cope with possible failure of the oldest MultiXact to exist.Robert Haas2015-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent commits, mainly b69bf30b9bfacafc733a9ba77c9587cf54d06c0c and 53bb309d2d5a9432d2602c93ed18e58bd2924e15, introduced mechanisms to protect against wraparound of the MultiXact member space: the number of multixacts that can exist at one time is limited to 2^32, but the total number of members in those multixacts is also limited to 2^32, and older code did not take care to enforce the second limit, potentially allowing old data to be overwritten while it was still needed. Unfortunately, these new mechanisms failed to account for the fact that the code paths in which they run might be executed during recovery or while the cluster was in an inconsistent state. Also, they failed to account for the fact that users who used pg_upgrade to upgrade a PostgreSQL version between 9.3.0 and 9.3.4 might have might oldestMultiXid = 1 in the control file despite the true value being larger. To fix these problems, first, avoid unnecessarily examining the mmembers of MultiXacts when the cluster is not known to be consistent. TruncateMultiXact has done this for a long time, and this patch does not fix that. But the new calls used to prevent member wraparound are not needed until we reach normal running, so avoid calling them earlier. (SetMultiXactIdLimit is actually called before InRecovery is set, so we can't rely on that; we invent our own multixact-specific flag instead.) Second, make failure to look up the members of a MultiXact a non-fatal error. Instead, if we're unable to determine the member offset at which wraparound would occur, postpone arming the member wraparound defenses until we are able to do so. If we're unable to determine the member offset that should force autovacuum, force it continuously until we are able to do so. If we're unable to deterine the member offset at which we should truncate the members SLRU, log a message and skip truncation. An important consequence of these changes is that anyone who does have a bogus oldestMultiXid = 1 value in pg_control will experience immediate emergency autovacuuming when upgrading to a release that contains this fix. The release notes should highlight this fact. If a user has no pg_multixact/offsets/0000 file, but has oldestMultiXid = 1 in the control file, they may wish to vacuum any tables with relminmxid = 1 prior to upgrading in order to avoid an immediate emergency autovacuum after the upgrade. This must be done with a PostgreSQL version 9.3.5 or newer and with vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age set to 0. This patch also adds an additional log message at each database server startup, indicating either that protections against member wraparound have been engaged, or that they have not. In the latter case, once autovacuum has advanced oldestMultiXid to a sane value, the message indicating that the guards have been engaged will appear at the next checkpoint. A few additional messages have also been added at the DEBUG1 level so that the correct operation of this code can be properly audited. Along the way, this patch fixes another, related bug in TruncateMultiXact that has existed since PostgreSQL 9.3.0: when no MultiXacts exist at all, the truncation code looks up NextMultiXactId, which doesn't exist yet. This can lead to TruncateMultiXact removing every file in pg_multixact/offsets instead of keeping one around, as it should. This in turn will cause the database server to refuse to start afterwards. Patch by me. Review by Álvaro Herrera, Andres Freund, Noah Misch, and Thomas Munro.
* Fix some questionable edge-case behaviors in add_path() and friends.Tom Lane2015-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | add_path_precheck was doing exact comparisons of path costs, but it really needs to do them fuzzily to be sure it won't reject paths that could survive add_path's comparisons. (This can only matter if the initial cost estimate is very close to the final one, but that turns out to often be true.) Also, it should ignore startup cost for this purpose if and only if compare_path_costs_fuzzily would do so. The previous coding always ignored startup cost for parameterized paths, which is wrong as of commit 3f59be836c555fa6; it could result in improper early rejection of paths that we care about for SEMI/ANTI joins. It also always considered startup cost for unparameterized paths, which is just as wrong though the only effect is to waste planner cycles on paths that can't survive. Instead, it should consider startup cost only when directed to by the consider_startup/ consider_param_startup relation flags. Likewise, compare_path_costs_fuzzily should have symmetrical behavior for parameterized and unparameterized paths. In this case, the best answer seems to be that after establishing that total costs are fuzzily equal, we should compare startup costs whether or not the consider_xxx flags are on. That is what it's always done for unparameterized paths, so let's make the behavior for parameterized paths match. These issues were noted while developing the SEMI/ANTI join costing fix of commit 3f59be836c555fa6, but we chose not to back-patch these fixes, because they can cause changes in the planner's choices among nearly-same-cost plans. (There is in fact one minor change in plan choice within the core regression tests.) Destabilizing plan choices in back branches without very clear improvements is frowned on, so we'll just fix this in HEAD.
* Fix planner's cost estimation for SEMI/ANTI joins with inner indexscans.Tom Lane2015-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the inner side of a nestloop SEMI or ANTI join is an indexscan that uses all the join clauses as indexquals, it can be presumed that both matched and unmatched outer rows will be processed very quickly: for matched rows, we'll stop after fetching one row from the indexscan, while for unmatched rows we'll have an indexscan that finds no matching index entries, which should also be quick. The planner already knew about this, but it was nonetheless charging for at least one full run of the inner indexscan, as a consequence of concerns about the behavior of materialized inner scans --- but those concerns don't apply in the fast case. If the inner side has low cardinality (many matching rows) this could make an indexscan plan look far more expensive than it actually is. To fix, rearrange the work in initial_cost_nestloop/final_cost_nestloop so that we don't add the inner scan cost until we've inspected the indexquals, and then we can add either the full-run cost or just the first tuple's cost as appropriate. Experimentation with this fix uncovered another problem: add_path and friends were coded to disregard cheap startup cost when considering parameterized paths. That's usually okay (and desirable, because it thins the path herd faster); but in this fast case for SEMI/ANTI joins, it could result in throwing away the desired plain indexscan path in favor of a bitmap scan path before we ever get to the join costing logic. In the many-matching-rows cases of interest here, a bitmap scan will do a lot more work than required, so this is a problem. To fix, add a per-relation flag consider_param_startup that works like the existing consider_startup flag, but applies to parameterized paths, and set it for relations that are the inside of a SEMI or ANTI join. To make this patch reasonably safe to back-patch, care has been taken to avoid changing the planner's behavior except in the very narrow case of SEMI/ANTI joins with inner indexscans. There are places in compare_path_costs_fuzzily and add_path_precheck that are not terribly consistent with the new approach, but changing them will affect planner decisions at the margins in other cases, so we'll leave that for a HEAD-only fix. Back-patch to 9.3; before that, the consider_startup flag didn't exist, meaning that the second aspect of the patch would be too invasive. Per a complaint from Peter Holzer and analysis by Tomas Vondra.
* Avoid naming a variable "new", and remove bogus initializer.Andrew Dunstan2015-05-31
| | | | Per gripe from Tom Lane.
* Add a couple of missing JsonbValue type initialisers.Andrew Dunstan2015-05-31
|
* Rename jsonb_replace to jsonb_set and allow it to add new valuesAndrew Dunstan2015-05-31
| | | | | | | | | | | | The function is given a fourth parameter, which defaults to true. When this parameter is true, if the last element of the path is missing in the original json, jsonb_set creates it in the result and assigns it the new value. If it is false then the function does nothing unless all elements of the path are present, including the last. Based on some original code from Dmitry Dolgov, heavily modified by me. Catalog version bumped.
* Remove special cases for ETXTBSY from new fsync'ing logic.Tom Lane2015-05-29
| | | | | | | | | | The argument that this is a sufficiently-expected case to be silently ignored seems pretty thin. Andres had brought it up back when we were still considering that most fsync failures should be hard errors, and it probably would be legit not to fail hard for ETXTBSY --- but the same is true for EROFS and other cases, which is why we gave up on hard failures. ETXTBSY is surely not a normal case, so logging the failure seems fine from here.
* Revert exporting of internal GUC variable "data_directory".Tom Lane2015-05-29
| | | | | | | | | | | | | | | | This undoes a poorly-thought-out choice in commit 970a18687f9b3058, namely to export guc.c's internal variable data_directory. The authoritative variable so far as C code is concerned is DataDir; there is no reason for anything except specific bits of GUC code to look at the GUC variable. After yesterday's commits fixing the fsync-on-restart patch, the only remaining misuse of data_directory was in AlterSystemSetConfigFile(), which would be much better off just using a relative path anyhow: it's less code and it doesn't break if the DBA moves the data directory of a running system, which is a case we've taken some pains over in the past. This is mostly cosmetic, so no need for a back-patch (and I'd be hesitant to remove a global variable in stable branches anyway).
* Fix fsync-at-startup code to not treat errors as fatal.Tom Lane2015-05-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2ce439f3379aed857517c8ce207485655000fc8e introduced a rather serious regression, namely that if its scan of the data directory came across any un-fsync-able files, it would fail and thereby prevent database startup. Worse yet, symlinks to such files also caused the problem, which meant that crash restart was guaranteed to fail on certain common installations such as older Debian. After discussion, we agreed that (1) failure to start is worse than any consequence of not fsync'ing is likely to be, therefore treat all errors in this code as nonfatal; (2) we should not chase symlinks other than those that are expected to exist, namely pg_xlog/ and tablespace links under pg_tblspc/. The latter restriction avoids possibly fsync'ing a much larger part of the filesystem than intended, if the user has left random symlinks hanging about in the data directory. This commit takes care of that and also does some code beautification, mainly moving the relevant code into fd.c, which seems a much better place for it than xlog.c, and making sure that the conditional compilation for the pre_sync_fname pass has something to do with whether pg_flush_data works. I also relocated the call site in xlog.c down a few lines; it seems a bit silly to be doing this before ValidateXLOGDirectoryStructure(). The similar logic in initdb.c ought to be made to match this, but that change is noncritical and will be dealt with separately. Back-patch to all active branches, like the prior commit. Abhijit Menon-Sen and Tom Lane
* Fix assorted inconsistencies in our calls of readlink().Tom Lane2015-05-28
| | | | | | | | | | | | | | Ensure that we null-terminate the result string (one place in pg_rewind). Be paranoid about out-of-range results from readlink() (should not happen, but there is no good reason for some call sites to be careful about it and others not). Consistently use the whole buffer, not sometimes one byte less. Ensure we emit an appropriate errcode() in all cases. Spell the error messages the same way. The only serious bug here is the missing null-termination in pg_rewind, which is new code, so no need for a back-patch. Abhijit Menon-Sen and Tom Lane