aboutsummaryrefslogtreecommitdiff
path: root/src/backend/storage
Commit message (Collapse)AuthorAge
* Fix race condition in predicate-lock init code in EXEC_BACKEND builds.Tom Lane2017-07-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trading a little too heavily on letting the code path be the same whether we were creating shared data structures or only attaching to them, InitPredicateLocks() inserted the "scratch" PredicateLockTargetHash entry unconditionally. This is just wrong if we're in a postmaster child, which would only reach this code in EXEC_BACKEND builds. Most of the time, the hash_search(HASH_ENTER) call would simply report that the entry already existed, causing no visible effect since the code did not bother to check for that possibility. However, if this happened while some other backend had transiently removed the "scratch" entry, then that other backend's eventual RestoreScratchTarget would suffer an assert failure; this appears to be the explanation for a recent failure on buildfarm member culicidae. In non-assert builds, there would be no visible consequences there either. But nonetheless this is a pretty bad bug for EXEC_BACKEND builds, for two reasons: 1. Each new backend would perform the hash_search(HASH_ENTER) call without holding any lock that would prevent concurrent access to the PredicateLockTargetHash hash table. This creates a low but certainly nonzero risk of corruption of that hash table. 2. In the event that the race condition occurred, by reinserting the scratch entry too soon, we were defeating the entire purpose of the scratch entry, namely to guarantee that transaction commit could move hash table entries around with no risk of out-of-memory failure. The odds of an actual OOM failure are quite low, but not zero, and if it did happen it would again result in corruption of the hash table. The user-visible symptoms of such corruption are a little hard to predict, but would presumably amount to misbehavior of SERIALIZABLE transactions that'd require a crash or postmaster restart to fix. To fix, just skip the hash insertion if IsUnderPostmaster. I also inserted a bunch of assertions that the expected things happen depending on whether IsUnderPostmaster is true. That might be overkill, since most comparable code in other functions isn't quite that paranoid, but once burnt twice shy. In passing, also move a couple of lines to places where they seemed to make more sense. Diagnosis of problem by Thomas Munro, patch by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/10593.1500670709@sss.pgh.pa.us
* Fix race between GetNewTransactionId and GetOldestActiveTransactionId.Heikki Linnakangas2017-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The race condition goes like this: 1. GetNewTransactionId advances nextXid e.g. from 100 to 101 2. GetOldestActiveTransactionId reads the new nextXid, 101 3. GetOldestActiveTransactionId loops through the proc array. There are no active XIDs there, so it returns 101 as the oldest active XID. 4. GetNewTransactionid stores XID 100 to MyPgXact->xid So, GetOldestActiveTransactionId returned XID 101, even though 100 only just started and is surely still running. This would be hard to hit in practice, and even harder to spot any ill effect if it happens. GetOldestActiveTransactionId is only used when creating a checkpoint in a master server, and the race condition can only happen on an online checkpoint, as there are no backends running during a shutdown checkpoint. The oldestActiveXid value of an online checkpoint is only used when starting up a hot standby server, to determine the starting point where pg_subtrans is initialized from. For the race condition to happen, there must be no other XIDs in the proc array that would hold back the oldest-active XID value, which means that the missed XID must be a top transaction's XID. However, pg_subtrans is not used for top XIDs, so I believe an off-by-one error is in fact inconsequential. Nevertheless, let's fix it, as it's clearly wrong and the fix is simple. This has been wrong ever since hot standby was introduced, so backport to all supported versions. Discussion: https://www.postgresql.org/message-id/e7258662-82b6-7a45-56d4-99b337a32bf7@iki.fi
* Fix variable and type name in comment.Heikki Linnakangas2017-07-12
| | | | | | Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/20170711.163441.241981736.horiguchi.kyotaro@lab.ntt.co.jp
* Prevent possibility of panics during shutdown checkpoint.Andres Freund2017-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the checkpointer writes the shutdown checkpoint, it checks afterwards whether any WAL has been written since it started and throws a PANIC if so. At that point, only walsenders are still active, so one might think this could not happen, but walsenders can also generate WAL, for instance in BASE_BACKUP and logical decoding related commands (e.g. via hint bits). So they can trigger this panic if such a command is run while the shutdown checkpoint is being written. To fix this, divide the walsender shutdown into two phases. First, checkpointer, itself triggered by postmaster, sends a PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend is idle or runs an SQL query this causes the backend to shutdown, if logical replication is in progress all existing WAL records are processed followed by a shutdown. Otherwise this causes the walsender to switch to the "stopping" state. In this state, the walsender will reject any further replication commands. The checkpointer begins the shutdown checkpoint once all walsenders are confirmed as stopping. When the shutdown checkpoint finishes, the postmaster sends us SIGUSR2. This instructs walsender to send any outstanding WAL, including the shutdown checkpoint record, wait for it to be replicated to the standby, and then exit. Author: Andres Freund, based on an earlier patch by Michael Paquier Reported-By: Fujii Masao, Andres Freund Reviewed-By: Michael Paquier Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de Backpatch: 9.4, where logical decoding was introduced
* Assorted translatable string fixesAlvaro Herrera2017-06-04
| | | | | Mark our rusage reportage string translatable; remove quotes from type names; unify formatting of very similar messages.
* Fix new warnings from GCC 7Peter Eisentraut2017-05-16
| | | | | This addresses the new warning types -Wformat-truncation -Wformat-overflow that are part of -Wall, via -Wformat, in GCC 7.
* Cope with glibc too old to have epoll_create1().Tom Lane2017-04-27
| | | | | | | | | Commit fa31b6f4e supposed that we didn't have to worry about that anymore, but it seems that RHEL5 is like that, and that's still a supported platform. Put back the prior coding under an #ifdef, adding an explicit fcntl() to retain the desired CLOEXEC property. Discussion: https://postgr.es/m/12307.1493325329@sss.pgh.pa.us
* Preserve required !catalog tuples while computing initial decoding snapshot.Andres Freund2017-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The logical decoding machinery already preserved all the required catalog tuples, which is sufficient in the course of normal logical decoding, but did not guarantee that non-catalog tuples were preserved during computation of the initial snapshot when creating a slot over the replication protocol. This could cause a corrupted initial snapshot being exported. The time window for issues is usually not terribly large, but on a busy server it's perfectly possible to it hit it. Ongoing decoding is not affected by this bug. To avoid increased overhead for the SQL API, only retain additional tuples when a logical slot is being created over the replication protocol. To do so this commit changes the signature of CreateInitDecodingContext(), but it seems unlikely that it's being used in an extension, so that's probably ok. In a drive-by fix, fix handling of ReplicationSlotsComputeRequiredXmin's already_locked argument, which should only apply to ProcArrayLock, not ReplicationSlotControlLock. Reported-By: Erik Rijkers Analyzed-By: Petr Jelinek Author: Petr Jelinek, heavily editorialized by Andres Freund Reviewed-By: Andres Freund Discussion: https://postgr.es/m/9a897b86-46e1-9915-ee4c-da02e4ff6a95@2ndquadrant.com Backport: 9.4, where logical decoding was introduced.
* Make latch.c more paranoid about child-process cases.Tom Lane2017-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | Although the postmaster doesn't currently create a self-pipe or any latches, there's discussion of it doing so in future. It's also conceivable that a shared_preload_libraries extension would try to create such a thing in the postmaster process today. In that case the self-pipe FDs would be inherited by forked child processes. latch.c was entirely unprepared for such a case and could suffer an assertion failure, or worse try to use the inherited pipe if somebody called WaitLatch without having called InitializeLatchSupport in that process. Make it keep track of whether InitializeLatchSupport has been called in the *current* process, and do the right thing if state has been inherited from a parent. Apply FD_CLOEXEC to file descriptors created in latch.c (the self-pipe, as well as epoll event sets). This ensures that child processes spawned in backends, the archiver, etc cannot accidentally or intentionally mess with these FDs. It also ensures that we end up with the right state for the self-pipe in EXEC_BACKEND processes, which otherwise wouldn't know to close the postmaster's self-pipe FDs. Back-patch to 9.6, mainly to keep latch.c looking similar in all branches it exists in. Discussion: https://postgr.es/m/8322.1493240739@sss.pgh.pa.us
* Avoid depending on non-POSIX behavior of fcntl(2).Tom Lane2017-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The POSIX standard does not say that the success return value for fcntl(F_SETFD) and fcntl(F_SETFL) is zero; it says only that it's not -1. We had several calls that were making the stronger assumption. Adjust them to test specifically for -1 for strict spec compliance. The standard further leaves open the possibility that the O_NONBLOCK flag bit is not the only active one in F_SETFL's argument. Formally, therefore, one ought to get the current flags with F_GETFL and store them back with only the O_NONBLOCK bit changed when trying to change the nonblock state. In port/noblock.c, we were doing the full pushup in pg_set_block but not in pg_set_noblock, which is just weird. Make both of them do it properly, since they have little business making any assumptions about the socket they're handed. The other places where we're issuing F_SETFL are working with FDs we just got from pipe(2), so it's reasonable to assume the FDs' properties are all default, so I didn't bother adding F_GETFL steps there. Also, while pg_set_block deserves some points for trying to do things right, somebody had decided that it'd be even better to cast fcntl's third argument to "long". Which is completely loony, because POSIX clearly says the third argument for an F_SETFL call is "int". Given the lack of field complaints, these missteps apparently are not of significance on any common platforms. But they're still wrong, so back-patch to all supported branches. Discussion: https://postgr.es/m/30882.1492800880@sss.pgh.pa.us
* Revert unintentional change in increasing usage count during pin of buffers,Teodor Sigaev2017-03-20
| | | | | | | | | | | | this makes buffer access strategy have no effect. Change was a part of commit 48354581a49c30f5757c203415aa8412d85b0f70 during 9.6 release cycle, so backpath to 9.6 Reported-by: Jim Nasby Author: Alexander Korotkov Reviewed-by: Jim Nasby, Andres Freund https://commitfest.postgresql.org/13/1029/
* Fix WaitEventSetWait() to handle write-ready waits properly on Windows.Tom Lane2017-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | Windows apparently will not detect socket write-ready events unless a preceding send attempt returned WSAEWOULDBLOCK. In many usage patterns that's satisfied by the caller of WaitEvenSetWait(), but not always. Apply the same solution that we already had in pgwin32_select(), namely to perform a dummy WSASend() call with len=0. This will return WSAEWOULDBLOCK if there's no buffer space (even though it could legitimately do nothing and report success, which makes me a bit nervous about this solution; but since it's been working fine in libpq, let's roll with it). In passing, improve the comments about this in pgwin32_select(), and remove duplicated code there. Back-patch to 9.6 where WaitEventSetWait() was introduced. We might need to back-patch something similar into predecessor code. But given the lack of complaints so far, it's not clear that the case ever gets exercised in the back branches, so I'm not going to expend effort on it right now. This should resolve recurring failures on buildfarm member bowerbird, which has been failing since 1e8a85009 went in. Diagnosis and patch by Petr Jelinek, cosmetic adjustments by me. Discussion: https://postgr.es/m/5b6a6d6d-fb45-0afb-2e95-5600063c3dbd@2ndquadrant.com
* Fix failure to mark init buffers as BM_PERMANENT.Robert Haas2017-03-14
| | | | | | | | | | | | | This could result in corruption of the init fork of an unlogged index if the ambuildempty routine for that index used shared buffers to create the init fork, which was true for brin, gin, gist, and hash indexes. Patch by me, based on an earlier patch by Michael Paquier, who also reviewed this one. This also incorporates an idea from Artur Zakirov. Discussion: http://postgr.es/m/CACYUyc8yccE4xfxhqxfh_Mh38j7dRFuxfaK1p6dSNAEUakxUyQ@mail.gmail.com
* Fix sloppy handling of corner-case errors in fd.c.Tom Lane2017-02-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several places in fd.c had badly-thought-through handling of error returns from lseek() and close(). The fact that those would seldom fail on valid FDs is probably the reason we've not noticed this up to now; but if they did fail, we'd get quite confused. LruDelete and LruInsert actually just Assert'd that lseek never fails, which is pretty awful on its face. In LruDelete, we indeed can't throw an error, because that's likely to get called during error abort and so throwing an error would probably just lead to an infinite loop. But by the same token, throwing an error from the close() right after that was ill-advised, not to mention that it would've left the LRU state corrupted since we'd already unlinked the VFD from the list. I also noticed that really, most of the time, we should know the current seek position and it shouldn't be necessary to do an lseek here at all. As patched, if we don't have a seek position and an lseek attempt doesn't give us one, we'll close the file but then subsequent re-open attempts will fail (except in the somewhat-unlikely case that a FileSeek(SEEK_SET) call comes between and allows us to re-establish a known target seek position). This isn't great but it won't result in any state corruption. Meanwhile, having an Assert instead of an honest test in LruInsert is really dangerous: if that lseek failed, a subsequent read or write would read or write from the start of the file, not where the caller expected, leading to data corruption. In both LruDelete and FileClose, if close() fails, just LOG that and mark the VFD closed anyway. Possibly leaking an FD is preferable to getting into an infinite loop or corrupting the VFD list. Besides, as far as I can tell from the POSIX spec, it's unspecified whether or not the file has been closed, so treating it as still open could be the wrong thing anyhow. I also fixed a number of other places that were being sloppy about behaving correctly when the seekPos is unknown. Also, I changed FileSeek to return -1 with EINVAL for the cases where it detects a bad offset, rather than throwing a hard elog(ERROR). It seemed pretty inconsistent that some bad-offset cases would get a failure return while others got elog(ERROR). It was missing an offset validity check for the SEEK_CUR case on a closed file, too. Back-patch to all supported branches, since all this code is fundamentally identical in all of them. Discussion: https://postgr.es/m/2982.1487617365@sss.pgh.pa.us
* Fix typos in comments.Heikki Linnakangas2017-02-06
| | | | | | | | | Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
* Don't count background workers against a user's connection limit.Andrew Dunstan2017-02-01
| | | | | | | | | | Doing so doesn't seem to be within the purpose of the per user connection limits, and has particularly unfortunate effects in conjunction with parallel queries. Backpatch to 9.6 where parallel queries were introduced. David Rowley, reviewed by Robert Haas and Albe Laurenz.
* Check interrupts during hot standby waitsSimon Riggs2017-01-27
|
* Fix comments in StrategyNotifyBgWriter().Tatsuo Ishii2017-01-24
| | | | | | | | The interface for the function was changed in d72731a70450b5e7084991b9caa15cb58a2820df but the comments of the function was not updated. Patch by Yugo Nagata.
* Fix possible leak of semaphore count.Robert Haas2017-01-05
| | | | | | | | | | | | | Commit 4aec49899e5782247e134f94ce1c6ee926f88e1c reorganized the order of operations here so that we no longer increment the number of "extra waits" before locking the semaphore, but it did not change the starting value of extraWaits from 0 to -1 to compensate. In the worst case, this could leak a semaphore count, but that seems to be unlikely in practice. Discussion: http://postgr.es/m/CAA4eK1JyVqXiMba+-a589Rk0pyHsyKkGxeumVKjU6Y74hdrVLQ@mail.gmail.com Amit Kapila, per an off-list report by Dilip Kumar. Reviewed by me.
* Silence compiler warningsJoe Conway2017-01-02
| | | | | | | | | | | Rearrange a bit of code to ensure that 'mode' in LWLockRelease is obviously always set, which seems a bit cleaner and avoids a compiler warning (thanks to Robert for the suggestion!). Back-patch back to 9.5 where the warning is first seen. Author: Stephen Frost Discussion: https://postgr.es/m/20161129152102.GR13284%40tamriel.snowman.net
* Fix corner-case bug in WaitEventSetWaitBlock on Windows.Robert Haas2016-12-21
| | | | | | | | | | | | | | | | | If we do not reset the FD_READ event, WaitForMultipleObjects won't return it again again unless we've meanwhile read from the socket, which is generally true but not guaranteed. WaitEventSetWaitBlock itself may fail to return the event to the caller if the latch is also set, and even if we changed that, the caller isn't obliged to handle all returned events at once. On non-Windows systems, the socket-read event is purely level-triggered, so this issue does not exist. To fix, make Windows reset the event when needed. This bug was introduced by 98a64d0bd713cb89e61bef6432befc4b7b5da59e, and causes hangs when trying to use the pldebugger extension. Patch by Amit Kapial. Reported and tested by Ashutosh Sharma, who also provided some analysis. Further analysis by Michael Paquier.
* Fix WAL-logging of FSM and VM truncation.Heikki Linnakangas2016-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | When a relation is truncated, it is important that the FSM is truncated as well. Otherwise, after recovery, the FSM can return a page that has been truncated away, leading to errors like: ERROR: could not read block 28991 in file "base/16390/572026": read only 0 of 8192 bytes We were using MarkBufferDirtyHint() to dirty the buffer holding the last remaining page of the FSM, but during recovery, that might in fact not dirty the page, and the FSM update might be lost. To fix, use the stronger MarkBufferDirty() function. MarkBufferDirty() requires us to do WAL-logging ourselves, to protect from a torn page, if checksumming is enabled. Also fix an oversight in visibilitymap_truncate: it also needs to WAL-log when checksumming is enabled. Analysis by Pavan Deolasee. Discussion: <CABOikdNr5vKucqyZH9s1Mh0XebLs_jRhKv6eJfNnD2wxTn=_9A@mail.gmail.com> Backpatch to 9.3, where we got data checksums.
* Fix fallback implementation of pg_atomic_write_u32().Andres Freund2016-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I somehow had assumed that in the spinlock (in turn possibly using semaphores) based fallback atomics implementation 32 bit writes could be done without a lock. As far as the write goes that's correct, since postgres supports only platforms with single-copy atomicity for aligned 32bit writes. But writing without holding the spinlock breaks read-modify-write operations like pg_atomic_compare_exchange_u32(), since they'll potentially "miss" a concurrent write, which can't happen in actual hardware implementations. In 9.6+ when using the fallback atomics implementation this could lead to buffer header locks not being properly marked as released, and potentially some related state corruption. I don't see a related danger in 9.5 (earliest release with the API), because pg_atomic_write_u32() wasn't used in a concurrent manner there. The state variable of local buffers, before this change, were manipulated using pg_atomic_write_u32(), to avoid unnecessary synchronization overhead. As that'd not be the case anymore, introduce and use pg_atomic_unlocked_write_u32(), which does not correctly interact with RMW operations. This bug only caused issues when postgres is compiled on platforms without atomics support (i.e. no common new platform), or when compiled with --disable-atomics, which explains why this wasn't noticed in testing. Reported-By: Tom Lane Discussion: <14947.1475690465@sss.pgh.pa.us> Backpatch: 9.5-, where the atomic operations API was introduced.
* Don't trust CreateFileMapping() to clear the error code on success.Tom Lane2016-09-23
| | | | | | | | | | | | | We must test GetLastError() even when CreateFileMapping() returns a non-null handle. If that value were left over from some previous system call, we might be fooled into thinking the segment already existed. Experimentation on Windows 7 suggests that CreateFileMapping() clears the error code on success, but it is not documented to do so, so let's not rely on that happening in all Windows releases. Amit Kapila Discussion: <20811.1474390987@sss.pgh.pa.us>
* Avoid using PostmasterRandom() for DSM control segment ID.Tom Lane2016-09-23
| | | | | | | | | | | | | | | Commits 470d886c3 et al intended to fix the problem that the postmaster selected the same "random" DSM control segment ID on every start. But using PostmasterRandom() for that destroys the intended property that the delay between random_start_time and random_stop_time will be unpredictable. (Said delay is probably already more predictable than we could wish, but that doesn't mean that reducing it by a couple orders of magnitude is OK.) Revert the previous patch and add a comment warning against misuse of PostmasterRandom. Fix the original problem by calling srandom() early in PostmasterMain, using a low-security seed that will later be overwritten by PostmasterRandom. Discussion: <20789.1474390434@sss.pgh.pa.us>
* Use PostmasterRandom(), not random(), for DSM control segment ID.Robert Haas2016-09-20
| | | | | Otherwise, every startup gets the same "random" value, which is definitely not what was intended.
* Retry DSM control segment creation if Windows indicates access denied.Robert Haas2016-09-20
| | | | | | | | | | | | | | | Otherwise, attempts to run multiple postmasters running on the same machine may fail, because Windows sometimes returns ERROR_ACCESS_DENIED rather than ERROR_ALREADY_EXISTS when there is an existing segment. Hitting this bug is much more likely because of another defect not fixed by this patch, namely that dsm_postmaster_startup() uses random() which returns the same value every time. But that's not a reason not to fix this. Kyotaro Horiguchi and Amit Kapila, reviewed by Michael Paquier Discussion: <CAA4eK1JyNdMeF-dgrpHozDecpDfsRZUtpCi+1AbtuEkfG3YooQ@mail.gmail.com>
* Fix copy/pasto in file identificationSimon Riggs2016-09-12
| | | | Daniel Gustafsson
* Fix mdtruncate() to close fd.c handle of deleted segments.Andres Freund2016-09-08
| | | | | | | | | | | | | | | | | mdtruncate() forgot to FileClose() a segment's mdfd_vfd, when deleting it. That lead to a fd.c handle to a truncated file being kept open until backend exit. The issue appears to have been introduced way back in 1a5c450f3024ac5, before that the handle was closed inside FileUnlink(). The impact of this bug is limited - only VACUUM and ON COMMIT TRUNCATE for temporary tables, truncate files in place (i.e. TRUNCATE itself is not affected), and the relation has to be bigger than 1GB. The consequences of a leaked fd.c handle aren't severe either. Discussion: <20160908220748.oqh37ukwqqncbl3n@alap3.anarazel.de> Backpatch: all supported releases
* Add macros to make AllocSetContextCreate() calls simpler and safer.Tom Lane2016-08-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>
* Don't CHECK_FOR_INTERRUPTS between WaitLatch and ResetLatch.Tom Lane2016-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | This coding pattern creates a race condition, because if an interesting interrupt happens after we've checked InterruptPending but before we reset our latch, the latch-setting done by the signal handler would get lost, and then we might block at WaitLatch in the next iteration without ever noticing the interrupt condition. You can put the CHECK_FOR_INTERRUPTS before WaitLatch or after ResetLatch, but not between them. Aside from fixing the bugs, add some explanatory comments to latch.h to perhaps forestall the next person from making the same mistake. In HEAD, also replace gather_readnext's direct call of HandleParallelMessages with CHECK_FOR_INTERRUPTS. It does not seem clean or useful for this one caller to bypass ProcessInterrupts and go straight to HandleParallelMessages; not least because that fails to consider the InterruptPending flag, resulting in useless work both here (if InterruptPending isn't set) and in the next CHECK_FOR_INTERRUPTS call (if it is). This thinko seems to have been introduced in the initial coding of storage/ipc/shm_mq.c (commit ec9037df2), and then blindly copied into all the subsequent parallel-query support logic. Back-patch relevant hunks to 9.4 to extirpate the error everywhere. Discussion: <1661.1469996911@sss.pgh.pa.us>
* Adjust spellings of forms of "cancel"Peter Eisentraut2016-07-14
|
* Revert "Add some temporary code to record stack usage at server process exit."Tom Lane2016-07-10
| | | | | | | This reverts commit 88cf37d2a86d5b66380003d7c3384530e3f91e40 as well as follow-on commits ea9c4a16d5ad88a1d28d43ef458e3209b53eb106 and c57562725d219c4249b82f4a4fb5aaeee3ae0d53. We've learned about as much as we can from the buildfarm.
* Improve recording of IA64 stack data.Tom Lane2016-07-09
| | | | | | | | Examination of the results from anole and gharial suggests that we're only managing to track the size of one of the two stacks of IA64 machines. Some googling gave the answer: on HPUX11, the register stack is reported as a page type I don't see in pstat.h on my HPUX10 box. Let's try testing for that.
* Add more temporary code to record stack usage at server process exit.Tom Lane2016-07-08
| | | | | | | | After a look at preliminary results from commit 88cf37d2a86d5b66, I realized it'd be a good idea to spew out the maximum depth measurement seen by check_stack_depth. So add some quick-n-dirty code to do that. Like the previous commit, this will be reverted once we've gathered a set of buildfarm runs with it.
* Add some temporary code to record stack usage at server process exit.Tom Lane2016-07-08
| | | | | | | | | | | | | This patch is meant to gather information from the buildfarm members, and will be reverted in a day or so. The idea is to try to find out the high-water stack consumption while running the regression tests, particularly on IA64 which is suspected to use much more stack than other architectures. On machines with pmap, we can use that; but the IA64 farm members are running HPUX, so also include some bespoke code for HPUX. (I've tested the latter on HPUX 10/HPPA; not entirely sure it will work on HPUX 11/IA64, but we'll soon find out.) Discussion: <CAM-w4HMwwcwaVvYcAH0_FGtG5GeXdYVRfvG81pXnSJWHnCfosQ@mail.gmail.com>
* Improve WritebackContextInit() comment and prototype argument names.Andres Freund2016-07-01
| | | | | Author: Masahiko Sawada Discussion: CAD21AoBD=Of1OzL90Xx4Q-3j=-2q7=S87cs75HfutE=eCday2w@mail.gmail.com
* Fix obsolete comment.Robert Haas2016-06-29
| | | | | | | Commit 3bd261ca18c67eafe18088e58fab511e3b965418 should have updated this, but didn't. Extracted from a larger patch by Piotr Stefaniak.
* Finish up XLOG_HINT renamingAlvaro Herrera2016-06-17
| | | | | | | Commit b8fd1a09f3 renamed XLOG_HINT to XLOG_FPI, but neglected two places. Backpatch to 9.3, like that commit.
* Finish pgindent run for 9.6: Perl files.Noah Misch2016-06-12
|
* Fix interaction between CREATE INDEX and "snapshot too old".Kevin Grittner2016-06-10
| | | | | | | | | | | | | | | Since indexes are created without valid LSNs, an index created while a snapshot older than old_snapshot_threshold existed could cause queries to return incorrect results when those old snapshots were used, if any relevant rows had been subject to early pruning before the index was built. Prevent usage of a newly created index until all such snapshots are released, for relations where this can happen. Questions about the interaction of "snapshot too old" with index creation were initially raised by Andres Freund. Reviewed by Robert Haas.
* Improve the situation for parallel query versus temp relations.Tom Lane2016-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Transmit the leader's temp-namespace state to workers. This is important because without it, the workers do not really have the same search path as the leader. For example, there is no good reason (and no extant code either) to prevent a worker from executing a temp function that the leader created previously; but as things stood it would fail to find the temp function, and then either fail or execute the wrong function entirely. We still prohibit a worker from creating a temp namespace on its own. In effect, a worker can only see the session's temp namespace if the leader had created it before starting the worker, which seems like the right semantics. Also, transmit the leader's BackendId to workers, and arrange for workers to use that when determining the physical file path of a temp relation belonging to their session. While the original intent was to prevent such accesses entirely, there were a number of holes in that, notably in places like dbsize.c which assume they can safely access temp rels of other sessions anyway. We might as well get this right, as a small down payment on someday allowing workers to access the leader's temp tables. (With this change, directly using "MyBackendId" as a relation or buffer backend ID is deprecated; you should use BackendIdForTempRelations() instead. I left a couple of such uses alone though, as they're not going to be reachable in parallel workers until we do something about localbuf.c.) Move the thou-shalt-not-access-thy-leader's-temp-tables prohibition down into localbuf.c, which is where it actually matters, instead of having it in relation_open(). This amounts to recognizing that access to temp tables' catalog entries is perfectly safe in a worker, it's only the data in local buffers that is problematic. Having done all that, we can get rid of the test in has_parallel_hazard() that says that use of a temp table's rowtype is unsafe in parallel workers. That test was unduly expensive, and if we really did need such a prohibition, that was not even close to being a bulletproof guard for it. (For example, any user-defined function executed in a parallel worker might have attempted such access.)
* pgindent run for 9.6Robert Haas2016-06-09
|
* Correct phrasing in dsm.c commentsSimon Riggs2016-06-07
|
* shm_mq: After a send fails with SHM_MQ_DETACHED, later ones should too.Robert Haas2016-06-06
| | | | | | | | | | | | | | | | Prior to this patch, it was occasionally possible, after shm_mq_sendv had previously returned SHM_MQ_DETACHED, for a later shm_mq_sendv operation to fail an assertion instead of just again returning SHM_MQ_ATTACHED. From the shm_mq code's point of view, it was expecting to be called again with the same arguments, since the previous operation had only partially completed. However, a caller who isn't using non-blocking mode won't be prepared to repeat the call with the same arguments, and this code shouldn't expect that they will. Repair in such a way that we'll be OK whether the next call uses the same arguments or not. Found by Andreas Seltenreich. Analysis and sketch of fix by Amit Kapila. Patch by me, reviewed by Amit Kapila.
* Fix whitespacePeter Eisentraut2016-06-05
|
* Fix various common mispellings.Greg Stark2016-06-03
| | | | | | | | | | Mostly these are just comments but there are a few in documentation and a handful in code and tests. Hopefully this doesn't cause too much unnecessary pain for backpatching. I relented from some of the most common like "thru" for that reason. The rest don't seem numerous enough to cause problems. Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings
* Be conservative about alignment requirements of struct epoll_event.Greg Stark2016-06-02
| | | | | | | | | | | | Use MAXALIGN size/alignment to guarantee that later uses of memory are aligned correctly. E.g. epoll_event might need 8 byte alignment on some platforms, but earlier allocations like WaitEventSet and WaitEvent might not sized to guarantee that when purely using sizeof(). Found by myself while testing on an Sun Ultra 5 (Sparc IIi) with some editorializing by Andres Freund. In passing fix a couple typos in the area
* Fix PageAddItem BRIN bugAlvaro Herrera2016-05-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BRIN was relying on the ability to remove a tuple from an index page, then putting another tuple in the same line pointer. But PageAddItem refuses to add a tuple beyond the first free item past the last used item, and in particular, it rejects an attempt to add an item to an empty page anywhere other than the first line pointer. PageAddItem issues a WARNING and indicates to the caller that it failed, which in turn causes the BRIN calling code to issue a PANIC, so the whole sequence looks like this: WARNING: specified item offset is too large PANIC: failed to add BRIN tuple To fix, create a new function PageAddItemExtended which is like PageAddItem except that the two boolean arguments become a flags bitmap; the "overwrite" and "is_heap" boolean flags in PageAddItem become PAI_OVERWITE and PAI_IS_HEAP flags in the new function, and a new flag PAI_ALLOW_FAR_OFFSET enables the behavior required by BRIN. PageAddItem() retains its original signature, for compatibility with third-party modules (other callers in core code are not modified, either). Also, in the belt-and-suspenders spirit, I added a new sanity check in brinGetTupleForHeapBlock to raise an error if an TID found in the revmap is not marked as live by the page header. This causes it to react with "ERROR: corrupted BRIN index" to the bug at hand, rather than a hard crash. Backpatch to 9.5. Bug reported by Andreas Seltenreich as detected by his handy sqlsmith fuzzer. Discussion: https://www.postgresql.org/message-id/87mvni77jh.fsf@elite.ansel.ydns.eu
* Fix range check for effective_io_concurrencyAlvaro Herrera2016-05-24
| | | | | | | | Commit 1aba62ec moved the range check of that option form guc.c into bufmgr.c, but introduced a bug by changing a >= 0.0 to > 0.0, which made the value 0 no longer accepted. Put it back. Reported by Jeff Janes, diagnosed by Tom Lane