aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Fix thinko in re-setting wal_log_hints flag from a parameter-change record.Heikki Linnakangas2015-01-15
| | | | | | | | The flag is supposed to be copied from the record. Same issue with track_commit_timestamps, but that's master-only. Report and fix by Petr Jalinek. Backpatch to 9.4, where wal_log_hints was added.
* Improve performance of EXPLAIN with large range tables.Tom Lane2015-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | As of 9.3, ruleutils.c goes to some lengths to ensure that table and column aliases used in its output are unique. Of course this takes more time than was required before, which in itself isn't fatal. However, EXPLAIN was set up so that recalculation of the unique aliases was repeated for each subexpression printed in a plan. That results in O(N^2) time and memory consumption for large plan trees, which did not happen in older branches. Fortunately, the expensive work is the same across a whole plan tree, so there is no need to repeat it; we can do most of the initialization just once per query and re-use it for each subexpression. This buys back most (not all) of the performance loss since 9.2. We need an extra ExplainState field to hold the precalculated deparse context. That's no problem in HEAD, but in the back branches, expanding sizeof(ExplainState) seems risky because third-party extensions might have local variables of that struct type. So, in 9.4 and 9.3, introduce an auxiliary struct to keep sizeof(ExplainState) the same. We should refactor the APIs to avoid such local variables in future, but that's material for a separate HEAD-only commit. Per gripe from Alexey Bashtanov. Back-patch to 9.3 where the issue was introduced.
* Make logging_collector=on work with non-windows EXEC_BACKEND again.Andres Freund2015-01-14
| | | | | | | | | | | | | | | | | | | | Commit b94ce6e80 reordered postmaster's startup sequence so that the tempfile directory is only cleaned up after all the necessary state for pg_ctl is collected. Unfortunately the chosen location is after the syslogger has been started; which normally is fine, except for !WIN32 EXEC_BACKEND builds, which pass information to children via files in the temp directory. Move the call to RemovePgTempFiles() to just before the syslogger has started. That's the first child we fork. Luckily EXEC_BACKEND is pretty much only used by endusers on windows, which has a separate method to pass information to children. That means the real world impact of this bug is very small. Discussion: 20150113182344.GF12272@alap3.anarazel.de Backpatch to 9.1, just as the previous commit was.
* Use correct text domain for errcontext() appearing within ereport().Tom Lane2015-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The mechanism added in commit dbdf9679d7d61b03a3bf73af9b095831b7010eb5 for associating the correct translation domain with errcontext strings potentially fails in cases where errcontext() is used within an ereport() macro. Such usage was not originally envisioned for errcontext(), but we do have a few places that do it. In this situation, the intended comma expression becomes just a couple of arguments to errfinish(), which the compiler might choose to evaluate right-to-left. Fortunately, in such cases the textdomain for the errcontext string must be the same as for the surrounding ereport. So we can fix this by letting errstart initialize context_domain along with domain; then it will have the correct value no matter which order the calls occur in. (Note that error stack callback functions are not invoked until errfinish, so normal usage of errcontext won't affect what happens for errcontext calls within the ereport macro.) In passing, make sure that errcontext calls within the main backend set context_domain to something non-NULL. This isn't a live bug because NULL would select the current textdomain() setting which should be the right thing anyway --- but it seems better to handle this completely consistently with the regular domain field. Per report from Dmitry Voronin. Backpatch to 9.3; before that, there wasn't any attempt to ensure that errcontext strings were translated in an appropriate domain.
* Skip dead backends in MinimumActiveBackendsStephen Frost2015-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Back in ed0b409, PGPROC was split and moved to static variables in procarray.c, with procs in ProcArrayStruct replaced by an array of integers representing process numbers (pgprocnos), with -1 indicating a dead process which has yet to be removed. Access to procArray is generally done under ProcArrayLock and therefore most code does not have to concern itself with -1 entries. However, MinimumActiveBackends intentionally does not take ProcArrayLock, which means it has to be extra careful when accessing procArray. Prior to ed0b409, this was handled by checking for a NULL in the pointer array, but that check was no longer valid after the split. Coverity pointed out that the check could never happen and so it was removed in 5592eba. That didn't make anything worse, but it didn't fix the issue either. The correct fix is to check for pgprocno == -1 and skip over that entry if it is encountered. Back-patch to 9.2, since there can be attempts to access the arrays prior to their start otherwise. Note that the changes prior to 9.4 will look a bit different due to the change in 5592eba. Note that MinimumActiveBackends only returns a bool for heuristic purposes and any pre-array accesses are strictly read-only and so there is no security implication and the lack of fields complaints indicates it's very unlikely to run into issues due to this. Pointed out by Noah.
* xlogreader.c: Fix report_invalid_record translatability flagAlvaro Herrera2015-01-09
| | | | | | | | | For some reason I overlooked in GETTEXT_TRIGGERS that the right argument be read by gettext in 7fcbf6a405ffc12a4546a25b98592ee6733783fc. This will drop the translation percentages for the backend all the way back to 9.3 ... Problem reported by Heikki.
* Protect against XLogReaderAllocate() failing to allocate memory.Andres Freund2015-01-08
| | | | | | | | | | | | | logical.c's StartupDecodingContext() forgot to check whether XLogReaderAllocate() returns NULL indicating a memory allocation failure. This could lead, although quite unlikely, lead to a NULL pointer dereference. This only applies to 9.4 as earlier versions don't do logical decoding, and later versions don't return NULL after allocation failures in XLogReaderAllocate(). Michael Paquier, with minor changes by me.
* On Darwin, detect and report a multithreaded postmaster.Noah Misch2015-01-07
| | | | | | | | | Darwin --enable-nls builds use a substitute setlocale() that may start a thread. Buildfarm member orangutan experienced BackendList corruption on account of different postmaster threads executing signal handlers simultaneously. Furthermore, a multithreaded postmaster risks undefined behavior from sigprocmask() and fork(). Emit LOG messages about the problem and its workaround. Back-patch to 9.0 (all supported versions).
* Always set the six locale category environment variables in main().Noah Misch2015-01-07
| | | | | | | | | | | | | Typical server invocations already achieved that. Invalid locale settings in the initial postmaster environment interfered, as could malloc() failure. Setting "LC_MESSAGES=pt_BR.utf8 LC_ALL=invalid" in the postmaster environment will now choose C-locale messages, not Brazilian Portuguese messages. Most localized programs, including all PostgreSQL frontend executables, do likewise. Users are unlikely to observe changes involving locale categories other than LC_MESSAGES. CheckMyDatabase() ensures that we successfully set LC_COLLATE and LC_CTYPE; main() sets the remaining three categories to locale "C", which almost cannot fail. Back-patch to 9.0 (all supported versions).
* Reject ANALYZE commands during VACUUM FULL or another ANALYZE.Noah Misch2015-01-07
| | | | | | vacuum()'s static variable handling makes it non-reentrant; an ensuing null pointer deference crashed the backend. Back-patch to 9.0 (all supported versions).
* Correctly handle relcache invalidation corner case during logical decoding.Andres Freund2015-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using a historic snapshot for logical decoding it can validly happen that a relation that's in the relcache isn't visible to that historic snapshot. E.g. if a newly created relation is referenced in the query that uses the SQL interface for logical decoding and a sinval reset occurs. The earlier commit that fixed the error handling for that corner case already improves the situation as a ERROR is better than hitting an assertion... But it's obviously not good enough. So additionally allow that case without an error if a historic snapshot is set up - that won't allow an invalid entry to stay in the cache because it's a) already marked invalid and will thus be rebuilt during the next access b) the syscaches will be reset at the end of decoding. There might be prettier solutions to handle this case, but all that we could think of so far end up being much more complex than this quite simple fix. This fixes the assertion failures reported by the buildfarm (markhor, tick, leech) after the introduction of new regression tests in 89fd41b390a4. The failure there weren't actually directly caused by CLOBBER_CACHE_ALWAYS but the extraordinary long runtimes due to it lead to sinval resets triggering the behaviour. Discussion: 22459.1418656530@sss.pgh.pa.us Backpatch to 9.4 where logical decoding was introduced.
* Improve relcache invalidation handling of currently invisible relations.Andres Freund2015-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | The corner case where a relcache invalidation tried to rebuild the entry for a referenced relation but couldn't find it in the catalog wasn't correct. The code tried to RelationCacheDelete/RelationDestroyRelation the entry. That didn't work when assertions are enabled because the latter contains an assertion ensuring the refcount is zero. It's also more generally a bad idea, because by virtue of being referenced somebody might actually look at the entry, which is possible if the error is trapped and handled via a subtransaction abort. Instead just error out, without deleting the entry. As the entry is marked invalid, the worst that can happen is that the invalid (and at some point unused) entry lingers in the relcache. Discussion: 22459.1418656530@sss.pgh.pa.us There should be no way to hit this case < 9.4 where logical decoding introduced a bug that can hit this. But since the code for handling the corner case is there it should do something halfway sane, so backpatch all the the way back. The logical decoding bug will be handled in a separate commit.
* Fix thinko in lock mode enumAlvaro Herrera2015-01-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0e5680f4737a9c6aa94aa9e77543e5de60411322 contained a thinko mixing LOCKMODE with LockTupleMode. This caused misbehavior in the case where a tuple is marked with a multixact with at most a FOR SHARE lock, and another transaction tries to acquire a FOR NO KEY EXCLUSIVE lock; this case should block but doesn't. Include a new isolation tester spec file to explicitely try all the tuple lock combinations; without the fix it shows the problem: starting permutation: s1_begin s1_lcksvpt s1_tuplock2 s2_tuplock3 s1_commit step s1_begin: BEGIN; step s1_lcksvpt: SELECT * FROM multixact_conflict FOR KEY SHARE; SAVEPOINT foo; a 1 step s1_tuplock2: SELECT * FROM multixact_conflict FOR SHARE; a 1 step s2_tuplock3: SELECT * FROM multixact_conflict FOR NO KEY UPDATE; a 1 step s1_commit: COMMIT; With the fixed code, step s2_tuplock3 blocks until session 1 commits, which is the correct behavior. All other cases behave correctly. Backpatch to 9.3, like the commit that introduced the problem.
* Prevent WAL files created by pg_basebackup -x/X from being archived again.Andres Freund2015-01-03
| | | | | | | | | | | | | | | | | | | | | WAL (and timeline history) files created by pg_basebackup did not maintain the new base backup's archive status. That's currently not a problem if the new node is used as a standby - but if that node is promoted all still existing files can get archived again. With a high wal_keep_segment settings that can happen a significant time later - which is quite confusing. Change both the backend (for the -x/-X fetch case) and pg_basebackup (for -X stream) itself to always mark WAL/timeline files included in the base backup as .done. That's in line with walreceiver.c doing so. The verbosity of the pg_basebackup changes show pretty clearly that it needs some refactoring, but that'd result in not be backpatchable changes. Backpatch to 9.1 where pg_basebackup was introduced. Discussion: 20141205002854.GE21964@awork2.anarazel.de
* Add pg_string_endswith as the start of a string helper library in src/common.Andres Freund2015-01-03
| | | | | | Backpatch to 9.3 where src/common was introduce, because a bugfix that needs to be backpatched, requires the function. Earlier branches will have to duplicate the code.
* Treat negative values of recovery_min_apply_delay as having no effect.Tom Lane2015-01-03
| | | | | | | | | | | | | | | | | | | | | | At one point in the development of this feature, it was claimed that allowing negative values would be useful to compensate for timezone differences between master and slave servers. That was based on a mistaken assumption that commit timestamps are recorded in local time; but of course they're in UTC. Nor is a negative apply delay likely to be a sane way of coping with server clock skew. However, the committed patch still treated negative delays as doing something, and the timezone misapprehension survived in the user documentation as well. If recovery_min_apply_delay were a proper GUC we'd just set the minimum allowed value to be zero; but for the moment it seems better to treat negative settings as if they were zero. In passing do some extra wordsmithing on the parameter's documentation, including correcting a second misstatement that the parameter affects processing of Restore Point records. Issue noted by Michael Paquier, who also provided the code patch; doc changes by me. Back-patch to 9.4 where the feature was introduced.
* Fix trailing whitespace in PO filePeter Eisentraut2014-12-31
|
* Backpatch variable renaming in formatting.cBruce Momjian2014-12-29
| | | | | | | Backpatch a9c22d1480aa8e6d97a000292d05ef2b31bbde4e to make future backpatching easier. Backpatch through 9.0
* Grab heavyweight tuple lock only before sleepingAlvaro Herrera2014-12-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | We were trying to acquire the lock even when we were subsequently not sleeping in some other transaction, which opens us up unnecessarily to deadlocks. In particular, this is troublesome if an update tries to lock an updated version of a tuple and finds itself doing EvalPlanQual update chain walking; more than two sessions doing this concurrently will find themselves sleeping on each other because the HW tuple lock acquisition in heap_lock_tuple called from EvalPlanQualFetch races with the same tuple lock being acquired in heap_update -- one of these sessions sleeps on the other one to finish while holding the tuple lock, and the other one sleeps on the tuple lock. Per trouble report from Andrew Sackville-West in http://www.postgresql.org/message-id/20140731233051.GN17765@andrew-ThinkPad-X230 His scenario can be simplified down to a relatively simple isolationtester spec file which I don't include in this commit; the reason is that the current isolationtester is not able to deal with more than one blocked session concurrently and it blocks instead of raising the expected deadlock. In the future, if we improve isolationtester, it would be good to include the spec file in the isolation schedule. I posted it in http://www.postgresql.org/message-id/20141212205254.GC1768@alvh.no-ip.org Hat tip to Mark Kirkwood, who helped diagnose the trouble.
* Remove duplicate include of slot.h.Fujii Masao2014-12-25
| | | | Back-patch to 9.4, where this problem was added.
* Fix timestamp in end-of-recovery WAL records.Heikki Linnakangas2014-12-19
| | | | | | | We used time(null) to set a TimestampTz field, which gave bogus results. Noticed while looking at pg_xlogdump output. Backpatch to 9.3 and above, where the fast promotion was introduced.
* Prevent potentially hazardous compiler/cpu reordering during lwlock release.Andres Freund2014-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | In LWLockRelease() (and in 9.4+ LWLockUpdateVar()) we release enqueued waiters using PGSemaphoreUnlock(). As there are other sources of such unlocks backends only wake up if MyProc->lwWaiting is set to false; which is only done in the aforementioned functions. Before this commit there were dangers because the store to lwWaitLink could become visible before the store to lwWaitLink. This could both happen due to compiler reordering (on most compilers) and on some platforms due to the CPU reordering stores. The possible consequence of this is that a backend stops waiting before lwWaitLink is set to NULL. If that backend then tries to acquire another lock and has to wait there the list could become corrupted once the lwWaitLink store is finally performed. Add a write memory barrier to prevent that issue. Unfortunately the barrier support has been only added in 9.2. Given that the issue has not knowingly been observed in praxis it seems sufficient to prohibit compiler reordering using volatile for 9.0 and 9.1. Actual problems due to compiler reordering are more likely anyway. Discussion: 20140210134625.GA15246@awork2.anarazel.de
* Fix (re-)starting from a basebackup taken off a standby after a failure.Andres Freund2014-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | When starting up from a basebackup taken off a standby extra logic has to be applied to compute the point where the data directory is consistent. Normal base backups use a WAL record for that purpose, but that isn't possible on a standby. That logic had a error check ensuring that the cluster's control file indicates being in recovery. Unfortunately that check was too strict, disregarding the fact that the control file could also indicate that the cluster was shut down while in recovery. That's possible when the a cluster starting from a basebackup is shut down before the backup label has been removed. When everything goes well that's a short window, but when either restore_command or primary_conninfo isn't configured correctly the window can get much wider. That's because inbetween reading and unlinking the label we restore the last checkpoint from WAL which can need additional WAL. To fix simply also allow starting when the control file indicates "shutdown in recovery". There's nicer fixes imaginable, but they'd be more invasive. Backpatch to 9.2 where support for taking basebackups from standbys was added.
* Fix poorly worded error message.Tom Lane2014-12-17
| | | | Adam Brightwell, per report from Martín Marqués.
* Fix off-by-one loop count in MapArrayTypeName, and get rid of static array.Tom Lane2014-12-16
| | | | | | | | | | | | | | | | | | | | | MapArrayTypeName would copy up to NAMEDATALEN-1 bytes of the base type name, which of course is wrong: after prepending '_' there is only room for NAMEDATALEN-2 bytes. Aside from being the wrong result, this case would lead to overrunning the statically allocated work buffer. This would be a security bug if the function were ever used outside bootstrap mode, but it isn't, at least not in any currently supported branches. Aside from fixing the off-by-one loop logic, this patch gets rid of the static work buffer by having MapArrayTypeName pstrdup its result; the sole caller was already doing that, so this just requires moving the pstrdup call. This saves a few bytes but mainly it makes the API a lot cleaner. Back-patch on the off chance that there is some third-party code using MapArrayTypeName with less-secure input. Pushing pstrdup into the function should not cause any serious problems for such hypothetical code; at worst there might be a short term memory leak. Per Coverity scanning.
* Misc comment typo fixes.Heikki Linnakangas2014-12-16
| | | | | Backpatch the applicable parts, just to make backpatching future patches easier.
* Translation updatesAlvaro Herrera2014-12-15
|
* Translation updatesPeter Eisentraut2014-12-15
|
* Repair corner-case bug in array version of percentile_cont().Tom Lane2014-12-13
| | | | | | | | The code for advancing through the input rows overlooked the case that we might already be past the first row of the row pair now being considered, in case the previous percentile also fell between the same two input rows. Report and patch by Andrew Gierth; logic rewritten a bit for clarity by me.
* Fix planning of SELECT FOR UPDATE on child table with partial index.Tom Lane2014-12-11
| | | | | | | | | | | | | | | | | | | Ordinarily we can omit checking of a WHERE condition that matches a partial index's condition, when we are using an indexscan on that partial index. However, in SELECT FOR UPDATE we must include the "redundant" filter condition in the plan so that it gets checked properly in an EvalPlanQual recheck. The planner got this mostly right, but improperly omitted the filter condition if the index in question was on an inheritance child table. In READ COMMITTED mode, this could result in incorrectly returning just-updated rows that no longer satisfy the filter condition. The cause of the error is using get_parse_rowmark() when get_plan_rowmark() is what should be used during planning. In 9.3 and up, also fix the same mistake in contrib/postgres_fdw. It's currently harmless there (for lack of inheritance support) but wrong is wrong, and the incorrect code might get copied to someplace where it's more significant. Report and fix by Kyotaro Horiguchi. Back-patch to all supported branches.
* Fix corner case where SELECT FOR UPDATE could return a row twice.Tom Lane2014-12-11
| | | | | | | | | | | | | | | | In READ COMMITTED mode, if a SELECT FOR UPDATE discovers it has to redo WHERE-clause checking on rows that have been updated since the SELECT's snapshot, it invokes EvalPlanQual processing to do that. If this first occurs within a non-first child table of an inheritance tree, the previous coding could accidentally re-return a matching row from an earlier, already-scanned child table. (And, to add insult to injury, I think this could make it miss returning a row that should have been returned, if the updated row that this happens on should still have passed the WHERE qual.) Per report from Kyotaro Horiguchi; the added isolation test is based on his test case. This has been broken for quite awhile, so back-patch to all supported branches.
* Fix assorted confusion between Oid and int32.Tom Lane2014-12-11
| | | | | | | | | | | In passing, also make some debugging elog's in pgstat.c a bit more consistently worded. Back-patch as far as applicable (9.3 or 9.4; none of these mistakes are really old). Mark Dilger identified and patched the type violations; the message rewordings are mine.
* Fix minor thinko in convertToJsonb().Tom Lane2014-12-10
| | | | | | | | | | The amount of space to reserve for the value's varlena header is VARHDRSZ, not sizeof(VARHDRSZ). The latter coding accidentally failed to fail because of the way the VARHDRSZ macro is currently defined; but if we ever change it to return size_t (as one might reasonably expect it to do), convertToJsonb() would have failed. Spotted by Mark Dilger.
* Print wal_log_hints in the rm_desc routing of a parameter-change record.Heikki Linnakangas2014-12-05
| | | | | | | | | It was an oversight in the original commit. Also note in the sample config file that changing wal_log_hints requires a restart. Michael Paquier. Backpatch to 9.4, where wal_log_hints was added.
* Fix typosAlvaro Herrera2014-12-03
|
* Improve error messages for malformed array input strings.Tom Lane2014-12-02
| | | | | | | | | | | | | | | Make the error messages issued by array_in() uniformly follow the style ERROR: malformed array literal: "actual input string" DETAIL: specific complaint here and rewrite many of the specific complaints to be clearer. The immediate motivation for doing this is a complaint from Josh Berkus that json_to_record() produced an unintelligible error message when dealing with an array item, because it tries to feed the JSON-format array value to array_in(). Really it ought to be smart enough to perform JSON-to-Postgres array conversion, but that's a future feature not a bug fix. In the meantime, this change is something we agreed we could back-patch into 9.4, and it should help de-confuse things a bit.
* Don't skip SQL backends in logical decoding for visibility computation.Andres Freund2014-12-02
| | | | | | | | | | | | | | | | | | The logical decoding patchset introduced PROC_IN_LOGICAL_DECODING flag PGXACT flag, that allows such backends to be skipped when computing the xmin horizon/snapshots. That's fine and sensible for walsenders streaming out logical changes, but not at all fine for SQL backends doing logical decoding. If the latter set that flag any change they have performed outside of logical decoding will not be regarded as visible - which e.g. can lead to that change being vacuumed away. Note that not setting the flag for SQL backends isn't particularly bothersome - the SQL backend doesn't do streaming, so it only runs for a limited amount of time. Per buildfarm member 'tick' and Alvaro. Backpatch to 9.4, where logical decoding was introduced.
* Fix JSON aggregates to work properly when final function is re-executed.Tom Lane2014-12-02
| | | | | | | | | | | | Davide S. reported that json_agg() sometimes produced multiple trailing right brackets. This turns out to be because json_agg_finalfn() attaches the final right bracket, and was doing so by modifying the aggregate state in-place. That's verboten, though unfortunately it seems there's no way for nodeAgg.c to check for such mistakes. Fix that back to 9.3 where the broken code was introduced. In 9.4 and HEAD, likewise fix json_object_agg(), which had copied the erroneous logic. Make some cosmetic cleanups as well.
* Guard against bad "dscale" values in numeric_recv().Tom Lane2014-12-01
| | | | | | | | | | | | | | | | | | | | | | | | We were not checking to see if the supplied dscale was valid for the given digit array when receiving binary-format numeric values. While dscale can validly be more than the number of nonzero fractional digits, it shouldn't be less; that case causes fractional digits to be hidden on display even though they're there and participate in arithmetic. Bug #12053 from Tommaso Sala indicates that there's at least one broken client library out there that sometimes supplies an incorrect dscale value, leading to strange behavior. This suggests that simply throwing an error might not be the best response; it would lead to failures in applications that might seem to be working fine today. What seems the least risky fix is to truncate away any digits that would be hidden by dscale. This preserves the existing behavior in terms of what will be printed for the transmitted value, while preventing subsequent arithmetic from producing results inconsistent with that. In passing, throw a specific error for the case of dscale being outside the range that will fit into a numeric's header. Before you got "value overflows numeric format", which is a bit misleading. Back-patch to all supported branches.
* Fix hstore_to_json_loose's detection of valid JSON number values.Andrew Dunstan2014-12-01
| | | | | | | | | | We expose a function IsValidJsonNumber that internally calls the lexer for json numbers. That allows us to use the same test everywhere, instead of inventing a broken test for hstore conversions. The new function is also used in datum_to_json, replacing the code that is now moved to the new function. Backpatch to 9.3 where hstore_to_json_loose was introduced.
* Update transaction README for persistent multixactsAlvaro Herrera2014-11-28
| | | | | Multixacts are now maintained during recovery, but the README didn't get the memo. Backpatch to 9.3, where the divergence was introduced.
* Fix mishandling of system columns in FDW queries.Tom Lane2014-11-22
| | | | | | | | | | | | | | | | | | | | postgres_fdw would send query conditions involving system columns to the remote server, even though it makes no effort to ensure that system columns other than CTID match what the remote side thinks. tableoid, in particular, probably won't match and might have some use in queries. Hence, prevent sending conditions that include non-CTID system columns. Also, create_foreignscan_plan neglected to check local restriction conditions while determining whether to set fsSystemCol for a foreign scan plan node. This again would bollix the results for queries that test a foreign table's tableoid. Back-patch the first fix to 9.3 where postgres_fdw was introduced. Back-patch the second to 9.2. The code is probably broken in 9.1 as well, but the patch doesn't apply cleanly there; given the weak state of support for FDWs in 9.1, it doesn't seem worth fixing. Etsuro Fujita, reviewed by Ashutosh Bapat, and somewhat modified by me
* Fix WAL-logging of B-tree "unlink halfdead page" operation.Heikki Linnakangas2014-11-17
| | | | | | | | | | | There was some confusion on how to record the case that the operation unlinks the last non-leaf page in the branch being deleted. _bt_unlink_halfdead_page set the "topdead" field in the WAL record to the leaf page, but the redo routine assumed that it would be an invalid block number in that case. This commit fixes _bt_unlink_halfdead_page to do what the redo routine expected. This code is new in 9.4, so backpatch there.
* Translation updatesPeter Eisentraut2014-11-16
|
* Sync unlogged relations to disk after they have been reset.Andres Freund2014-11-15
| | | | | | | | | | | | | | | | | | | Unlogged relations are only reset when performing a unclean restart. That means they have to be synced to disk during clean shutdowns. During normal processing that's achieved by registering a buffer's file to be fsynced at the next checkpoint when flushed. But ResetUnloggedRelations() doesn't go through the buffer manager, so nothing will force reset relations to disk before the next shutdown checkpoint. So just make ResetUnloggedRelations() fsync the newly created main forks to disk. Discussion: 20140912112246.GA4984@alap3.anarazel.de Backpatch to 9.1 where unlogged tables were introduced. Abhijit Menon-Sen and Andres Freund
* Ensure unlogged tables are reset even if crash recovery errors out.Andres Freund2014-11-15
| | | | | | | | | | | | | | | | | | | | | | | | Unlogged relations are reset at the end of crash recovery as they're only synced to disk during a proper shutdown. Unfortunately that and later steps can fail, e.g. due to running out of space. This reset was, up to now performed after marking the database as having finished crash recovery successfully. As out of space errors trigger a crash restart that could lead to the situation that not all unlogged relations are reset. Once that happend usage of unlogged relations could yield errors like "could not open file "...": No such file or directory". Luckily clusters that show the problem can be fixed by performing a immediate shutdown, and starting the database again. To fix, just call ResetUnloggedRelations(UNLOGGED_RELATION_INIT) earlier, before marking the database as having successfully recovered. Discussion: 20140912112246.GA4984@alap3.anarazel.de Backpatch to 9.1 where unlogged tables were introduced. Abhijit Menon-Sen and Andres Freund
* Allow interrupting GetMultiXactIdMembersAlvaro Herrera2014-11-14
| | | | | | | | | | | This function has a loop which can lead to uninterruptible process "stalls" (actually infinite loops) when some bugs are triggered. Avoid that unpleasant situation by adding a check for interrupts in a place that shouldn't degrade performance in the normal case. Backpatch to 9.3. Older branches have an identical loop here, but the aforementioned bugs are only a problem starting in 9.3 so there doesn't seem to be any point in backpatching any further.
* Improve logical decoding log messagesPeter Eisentraut2014-11-13
| | | | suggestions from Robert Haas
* Fix and improve cache invalidation logic for logical decoding.Andres Freund2014-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are basically three situations in which logical decoding needs to perform cache invalidation. During/After replaying a transaction with catalog changes, when skipping a uninteresting transaction that performed catalog changes and when erroring out while replaying a transaction. Unfortunately these three cases were all done slightly differently - partially because 8de3e410fa, which greatly simplifies matters, got committed in the midst of the development of logical decoding. The actually problematic case was when logical decoding skipped transaction commits (and thus processed invalidations). When used via the SQL interface cache invalidation could access the catalog - bad, because we didn't set up enough state to allow that correctly. It'd not be hard to setup sufficient state, but the simpler solution is to always perform cache invalidation outside a valid transaction. Also make the different cache invalidation cases look as similar as possible, to ease code review. This fixes the assertion failure reported by Antonin Houska in 53EE02D9.7040702@gmail.com. The presented testcase has been expanded into a regression test. Backpatch to 9.4, where logical decoding was introduced.
* Fix xmin/xmax horizon computation during logical decoding initialization.Andres Freund2014-11-13
| | | | | | | | | | | | | | | When building the initial historic catalog snapshot there were scenarios where snapbuild.c would use incorrect xmin/xmax values when starting from a xl_running_xacts record. The values used were always a bit suspect, but happened to be correct in the easy to test cases. Notably the values used when the the initial snapshot was computed while no other transactions were running were correct. This is likely to be the cause of the occasional buildfarm failures on animals markhor and tick; but it's quite possible to reproduce problems without CLOBBER_CACHE_ALWAYS. Backpatch to 9.4, where logical decoding was introduced.