aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Attempt to fix some issues in our Windows socket code.Tom Lane2012-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure WaitLatchOrSocket regards FD_CLOSE as a read-ready condition. We might want to tweak this further, but it was surely wrong as-is. Make pgwin32_waitforsinglesocket detach its private event object from the passed socket before returning. I suspect that failure to do so leads to race conditions when other code (such as WaitLatchOrSocket) attaches a different event object to the same socket. Moreover, the existing coding meant that repeated calls to pgwin32_waitforsinglesocket would perform ResetEvent on an event actively connected to a socket, which is rumored to be an unsafe practice; the WSAEventSelect documentation appears to recommend against this, though it does not say not to do it in so many words. Also, uniformly use the coding pattern "WSAEventSelect(s, NULL, 0)" to detach events from sockets, rather than passing the event in the second parameter. The WSAEventSelect documentation says that the second parameter is ignored if the third is 0, so theoretically this should make no difference. However, elsewhere on the same reference page the use of NULL in this context is recommended, and I have found suggestions on the net that some versions of Windows have bugs with a non-NULL second parameter in this usage. Some other mostly-cosmetic cleanup, such as using the right one of WSAGetLastError and GetLastError for reporting errors from these functions.
* Fix bogus declaration of local variable.Tom Lane2012-05-13
| | | | | | rc should be an int here, not a pgsocket. Fairly harmless as long as pgsocket is an integer type, but nonetheless wrong. Error introduced in commit 87091cb1f1ed914e2ddca424fa28f94fdf8461d2.
* Avoid unnecessary process wakeups in the log collector.Tom Lane2012-05-12
| | | | | | | | syslogger was coded to wake up once per second whether there was anything useful to do or not. As part of our campaign to reduce the server's idle power consumption, change it to use a latch for waiting. Now, in the absence of any data to log or any signals to service, it will only wake up at the programmed logfile rotation times (if any).
* Fix WaitLatchOrSocket to handle EOF on socket correctly.Tom Lane2012-05-12
| | | | | | | | | | When using poll(), EOF on a socket is reported with the POLLHUP not POLLIN flag (at least on Linux). WaitLatchOrSocket failed to check this bit, causing it to go into a busy-wait loop if EOF occurs. We earlier fixed the same mistake in the test for the state of the postmaster_alive socket, but missed it for the caller-supplied socket. Fortunately, this error is new in 9.2, since 9.1 only had a select() based code path not a poll() based one.
* Ensure backwards compatibility for GetStableLatestTransactionId()Simon Riggs2012-05-12
|
* Fix obsolescent C declaration syntaxPeter Eisentraut2012-05-12
| | | | | gcc -Wextra/-Wold-style-declaration thinks that "inline" should go before the function return type.
* Cosmetic adjustments for postmaster's handling of checkpointer.Tom Lane2012-05-11
| | | | | Correct some comments, order some operations a bit more consistently. No functional changes.
* Prevent loss of init fork when truncating an unlogged table.Robert Haas2012-05-11
| | | | Fixes bug #6635, reported by Akira Kurosawa.
* Remove extraneous #include "storage/proc.h"Simon Riggs2012-05-11
|
* Ensure age() returns a stable value rather than the latest valueSimon Riggs2012-05-11
|
* On GiST page split, release the locks on child pages before recursing up.Heikki Linnakangas2012-05-11
| | | | | | | | | | | | | | | | | | When inserting the downlinks for a split gist page, we used hold the locks on the child pages until the insertion into the parent - and recursively its parent if it had to be split too - were all completed. Change that so that the locks on child pages are released after the insertion in the immediate parent is done, before recursing further up the tree. This reduces the number of lwlocks that are held simultaneously. Holding many locks is bad for concurrency, and in extreme cases you can even hit the limit of 100 simultaneously held lwlocks in a backend. If you're really unlucky, you can hit the limit while in a critical section, which brings down the whole system. This fixes bug #6629 reported by Tom Forbes. Backpatch to 9.1. The page splitting code was rewritten in 9.1, and the old code did not have this problem.
* Temporarily revert stats collector latch changes so we can ship beta1.Tom Lane2012-05-10
| | | | | | | | | This patch reverts commit 49340037ee3ab46cb24144a86705e35f272c24d5 and some follow-on tweaking in pgstat.c. While the basic scheme of latch-ifying the stats collector seems sound enough, it's failing on most Windows buildfarm members for unknown reasons, and there's no time left to debug that before 9.2beta1. Better to ship a beta version without this improvement. I hope to re-revert this once beta1 is out, though.
* Make WaitLatch's WL_POSTMASTER_DEATH result trustworthy; simplify callers.Tom Lane2012-05-10
| | | | | | | | Per a suggestion from Peter Geoghegan, make WaitLatch responsible for verifying that the WL_POSTMASTER_DEATH bit it returns is truthful (by testing PostmasterIsAlive). Then simplify its callers, who no longer need to do that for themselves. Remove weasel wording about falsely-set result bits from WaitLatch's API contract.
* Fix Windows implementation of PGSemaphoreLock.Tom Lane2012-05-10
| | | | | | | | | | | | | The original coding failed to reset ImmediateInterruptOK before returning, which would potentially allow a subsequent query-cancel interrupt to be accepted at an unsafe point. This is a really nasty bug since it's so hard to predict the consequences, but they could be unpleasant. Also, ensure that signal handlers are serviced before this function returns, even if the semaphore is already set. This should make the behavior more like Unix. Back-patch to all supported versions.
* Improve Windows implementation of WaitLatch/WaitLatchOrSocket.Tom Lane2012-05-10
| | | | | | | | | | Ensure that signal handlers are serviced before this function returns. This should make the behavior more like Unix. Also, add some more error checking, and make some other cosmetic improvements. No back-patch since it's not clear whether this is fixing any live bug that would affect 9.1. I'm more concerned about 9.2 anyway given our considerable recent expansions in the usage of WaitLatch.
* Improve tests for postmaster death in auxiliary processes.Tom Lane2012-05-10
| | | | | | | | | | | | | In checkpointer and walwriter, avoid calling PostmasterIsAlive unless WaitLatch has reported WL_POSTMASTER_DEATH. This saves a kernel call per iteration of the process's outer loop, which is not all that much, but a cycle shaved is a cycle earned. I had already removed the unconditional PostmasterIsAlive calls in bgwriter and pgstat in previous patches, but forgot that WL_POSTMASTER_DEATH is supposed to be treated as untrustworthy (per comment in unix_latch.c); so adjust those two cases to match. There are a few other places where the same idea might be applied, but only after substantial code rearrangement, so I didn't bother.
* Further tweaking of nomenclature in checkpointer.c.Tom Lane2012-05-10
| | | | | | Get rid of some more naming choices that only make sense if you know that this code used to be in the bgwriter, as well as some stray comments referencing the bgwriter.
* Improve control logic for bgwriter hibernation mode.Tom Lane2012-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6d90eaaa89a007e0d365f49d6436f35d2392cfeb added a hibernation mode to the bgwriter to reduce the server's idle-power consumption. However, its interaction with the detailed behavior of BgBufferSync's feedback control loop wasn't very well thought out. That control loop depends primarily on the rate of buffer allocation, not the rate of buffer dirtying, so the hibernation mode has to be designed to operate only when no new buffer allocations are happening. Also, the check for whether the system is effectively idle was not quite right and would fail to detect a constant low level of activity, thus allowing the bgwriter to go into hibernation mode in a way that would let the cycle time vary quite a bit, possibly further confusing the feedback loop. To fix, move the wakeup support from MarkBufferDirty and SetBufferCommitInfoNeedsSave into StrategyGetBuffer, and prevent the bgwriter from entering hibernation mode unless no buffer allocations have happened recently. In addition, fix the delaying logic to remove the problem of possibly not responding to signals promptly, which was basically caused by trying to use the process latch's is_set flag for multiple purposes. I can't prove it but I'm suspicious that that hack was responsible for the intermittent "postmaster does not shut down" failures we've been seeing in the buildfarm lately. In any case it did nothing to improve the readability or robustness of the code. In passing, express the hibernation sleep time as a multiplier on BgWriterDelay, not a constant. I'm not sure whether there's any value in exposing the longer sleep time as an independently configurable setting, but we can at least make it act like this for little extra code.
* Add make dependency so that postgres.bki is rebuilt in major version changePeter Eisentraut2012-05-09
| | | | | | | | | | | | | | | Every time since the current rule for postgres.bki was put in place when we change the major version, people complain that their tests fail in strange ways. This is because the version number in postgres.bki is not updated, because it has no dependency for that. And you can't even force the rebuild manually if you don't happen to know which file has the problem. Fix that now before it will happen again. The only remaining problem with switching major versions, as far as the regression tests are concerned, is that contrib needs to be rebuilt. But that's easily invoked, and in any case the failure modes are more friendly if you forget that.
* Rename BgWriterShmem/Request to CheckpointerShmem/RequestSimon Riggs2012-05-09
|
* Rename BgWriterCommLock to CheckpointerCommLockSimon Riggs2012-05-09
|
* Avoid xid error from age() function when run on Hot StandbySimon Riggs2012-05-09
|
* Fix an issue in recent walwriter hibernation patch.Tom Lane2012-05-08
| | | | | | | | | Users of asynchronous-commit mode expect there to be a guaranteed maximum delay before an async commit's WAL records get flushed to disk. The original version of the walwriter hibernation patch broke that. Add an extra shared-memory flag to allow async commits to kick the walwriter out of hibernation mode, without adding any noticeable overhead in cases where no action is needed.
* Reduce idle power consumption of stats collector process.Tom Lane2012-05-08
| | | | | | | | | | | Latch-ify the stats collector, so that it does not need an arbitrary wakeup cycle to check for postmaster death. The incremental savings in idle power is pretty marginal, since we only had it waking every two seconds; but I believe that this patch may also improve the collector's performance under load, by reducing the number of kernel calls made per message when messages are arriving constantly (we now avoid a select/poll call except when we need to sleep). The change also reduces the time needed for a normal database shutdown on platforms where signals don't interrupt select().
* Reduce idle power consumption of walwriter and checkpointer processes.Tom Lane2012-05-08
| | | | | | | | | | | | | | | | | | | | | | | This patch modifies the walwriter process so that, when it has not found anything useful to do for many consecutive wakeup cycles, it extends its sleep time to reduce the server's idle power consumption. It reverts to normal as soon as it's done any successful flushes. It's still true that during any async commit, backends check for completed, unflushed pages of WAL and signal the walwriter if there are any; so that in practice the walwriter can get awakened and returned to normal operation sooner than the sleep time might suggest. Also, improve the checkpointer so that it uses a latch and a computed delay time to not wake up at all except when it has something to do, replacing a previous hardcoded 0.5 sec wakeup cycle. This also is primarily useful for reducing the server's power consumption when idle. In passing, get rid of the dedicated latch for signaling the walwriter in favor of using its procLatch, since that comports better with possible generic signal handlers using that latch. Also, fix a pre-existing bug with failure to save/restore errno in walwriter's signal handlers. Peter Geoghegan, somewhat simplified by Tom
* Make "unexpected EOF" messages DEBUG1 unless in an open transactionMagnus Hagander2012-05-07
| | | | | | | "Unexpected EOF on client connection" without an open transaction is mostly noise, so turn it into DEBUG1. With an open transaction it's still indicating a problem, so keep those as ERROR, and change the message to indicate that it happened in a transaction.
* Overdue code review for transaction-level advisory locks patch.Tom Lane2012-05-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 62c7bd31c8878dd45c9b9b2429ab7a12103f3590 had assorted problems, most visibly that it broke PREPARE TRANSACTION in the presence of session-level advisory locks (which should be ignored by PREPARE), as per a recent complaint from Stephen Rees. More abstractly, the patch made the LockMethodData.transactional flag not merely useless but outright dangerous, because in point of fact that flag no longer tells you anything at all about whether a lock is held transactionally. This fix therefore removes that flag altogether. We now rely entirely on the convention already in use in lock.c that transactional lock holds must be owned by some ResourceOwner, while session holds are never so owned. Setting the locallock struct's owner link to NULL thus denotes a session hold, and there is no redundant marker for that. PREPARE TRANSACTION now works again when there are session-level advisory locks, and it is also able to transfer transactional advisory locks to the prepared transaction, but for implementation reasons it throws an error if we hold both types of lock on a single lockable object. Perhaps it will be worth improving that someday. Assorted other minor cleanup and documentation editing, as well. Back-patch to 9.1, except that in the 9.1 branch I did not remove the LockMethodData.transactional flag for fear of causing an ABI break for any external code that might be examining those structs.
* Remove BSD/OS (BSDi) port. There are no known users upgrading toBruce Momjian2012-05-03
| | | | Postgres 9.2, and perhaps no existing users either.
* Even more duplicate word removal, in the spirit of the seasonPeter Eisentraut2012-05-02
|
* Avoid repeated CLOG access from heap_hot_search_buffer.Robert Haas2012-05-02
| | | | | | | | | | | At the time we check whether the tuple is dead to all running transactions, we've already verified that it isn't visible to our scan, setting hint bits if appropriate. So there's no need to recheck CLOG for the all-dead test we do just a moment later. So, add HeapTupleIsSurelyDead() to test the appropriate condition under the assumption that all relevant hit bits are already set. Review by Tom Lane.
* Further corrections from the department of redundancy department.Robert Haas2012-05-02
| | | | Thom Brown
* More duplicate word removal.Robert Haas2012-05-02
|
* Remove duplicate words in comments.Heikki Linnakangas2012-05-02
| | | | Found these with grep -r "for for ".
* Kill some remaining references to SVR4 and univel.Tom Lane2012-05-02
| | | | | Both terms still appear in a few places, but I thought it best to leave those alone in context.
* Remove dead portsPeter Eisentraut2012-05-01
| | | | | | | | | | | | | | Remove the following ports: - dgux - nextstep - sunos4 - svr4 - ultrix4 - univel These are obsolete and not worth rescuing. In most cases, there is circumstantial evidence that they wouldn't work anymore anyway.
* Converge all SQL-level statistics timing values to float8 milliseconds.Tom Lane2012-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | This patch adjusts the core statistics views to match the decision already taken for pg_stat_statements, that values representing elapsed time should be represented as float8 and measured in milliseconds. By using float8, we are no longer tied to a specific maximum precision of timing data. (Internally, it's still microseconds, but we could now change that without needing changes at the SQL level.) The columns affected are pg_stat_bgwriter.checkpoint_write_time pg_stat_bgwriter.checkpoint_sync_time pg_stat_database.blk_read_time pg_stat_database.blk_write_time pg_stat_user_functions.total_time pg_stat_user_functions.self_time pg_stat_xact_user_functions.total_time pg_stat_xact_user_functions.self_time The first four of these are new in 9.2, so there is no compatibility issue from changing them. The others require a release note comment that they are now double precision (and can show a fractional part) rather than bigint as before; also their underlying statistics functions now match the column definitions, instead of returning bigint microseconds.
* Remove duplicate word in comment.Robert Haas2012-04-30
| | | | Noted by Peter Geoghegan.
* Rename I/O timing statistics columns to blk_read_time and blk_write_time.Tom Lane2012-04-29
| | | | | This seems more consistent with the pre-existing choices for names of other statistics columns. Rename assorted internal identifiers to match.
* Rename track_iotiming GUC to track_io_timing.Tom Lane2012-04-29
| | | | This spelling seems significantly more readable to me.
* Change return type of ExceptionalCondition to void and mark it noreturnPeter Eisentraut2012-04-29
| | | | | | In ancient times, it was thought that this wouldn't work because of TrapMacro/AssertMacro, but changing those to use a comma operator appears to work without compiler warnings.
* Clear I/O timing counters after sending them to the stats collector.Tom Lane2012-04-28
| | | | | This oversight caused the reported times to accumulate in an O(N^2) fashion the longer a backend runs.
* Fix printing of whole-row Vars at top level of a SELECT targetlist.Tom Lane2012-04-27
| | | | | | | | | | | | | | | | | | | | | Normally whole-row Vars are printed as "tabname.*". However, that does not work at top level of a targetlist, because per SQL standard the parser will think that the "*" should result in column-by-column expansion; which is not at all what a whole-row Var implies. We used to just print the table name in such cases, which works most of the time; but it fails if the table name matches a column name available anywhere in the FROM clause. This could lead for instance to a view being interpreted differently after dump and reload. Adding parentheses doesn't fix it, but there is a reasonably simple kluge we can use instead: attach a no-op cast, so that the "*" isn't syntactically at top level anymore. This makes the printing of such whole-row Vars a lot more consistent with other Vars, and may indeed fix more cases than just the reported one; I'm suspicious that cases involving schema qualification probably didn't work properly before, either. Per bug report and fix proposal from Abbas Butt, though this patch is quite different in detail from his. Back-patch to all supported versions.
* Fix syslogger's rotation disable/re-enable logic.Tom Lane2012-04-27
| | | | | | | | | | | | | | | | | | If it fails to open a new log file, the syslogger assumes there's something wrong with its parameters (such as log_directory), and stops attempting automatic time-based or size-based log file rotations. Sending it SIGHUP is supposed to start that up again. However, the original coding for that was really bogus, involving clobbering a couple of GUC variables and hoping that SIGHUP processing would restore them. Get rid of that technique in favor of maintaining a separate flag showing we've turned rotation off. Per report from Mark Kirkwood. Also, the syslogger will automatically attempt to create the log_directory directory if it doesn't exist, but that was only happening at startup. For consistency and ease of use, it should do the same whenever the value of log_directory is changed by SIGHUP. Back-patch to all supported branches.
* Prevent index-only scans from returning wrong answers under Hot Standby.Robert Haas2012-04-26
| | | | | | | | | The alternative of disallowing index-only scans in HS operation was discussed, but the consensus was that it was better to treat marking a page all-visible as a recovery conflict for snapshots that could still fail to see XIDs on that page. We may in the future try to soften this, so that we simply force index scans to do heap fetches in cases where this may be an issue, rather than throwing a hard conflict.
* Fix oversight in recent parameterized-path patch.Tom Lane2012-04-26
| | | | | | bitmap_scan_cost_est() has to be able to cope with a BitmapOrPath, but I'd taken a shortcut that didn't work for that case. Noted by Heikki. Add some regression tests since this area is evidently under-covered.
* Fix planner's handling of RETURNING lists in writable CTEs.Tom Lane2012-04-25
| | | | | | | | | | | | | | | | | | | | | | | | setrefs.c failed to do "rtoffset" adjustment of Vars in RETURNING lists, which meant they were left with the wrong varnos when the RETURNING list was in a subquery. That was never possible before writable CTEs, of course, but now it's broken. The executor fails to notice any problem because ExecEvalVar just references the ecxt_scantuple for any normal varno; but EXPLAIN breaks when the varno is wrong, as illustrated in a recent complaint from Bartosz Dmytrak. Since the eventual rtoffset of the subquery is not known at the time we are preparing its plan node, the previous scheme of executing set_returning_clause_references() at that time cannot handle this adjustment. Fortunately, it turns out that we don't really need to do it that way, because all the needed information is available during normal setrefs.c execution; we just have to dig it out of the ModifyTable node. So, do that, and get rid of the kluge of early setrefs processing of RETURNING lists. (This is a little bit of a cheat in the case of inherited UPDATE/DELETE, because we are not passing a "root" struct that corresponds exactly to what the subplan was built with. But that doesn't matter, and anyway this is less ugly than early setrefs processing was.) Back-patch to 9.1, where the problem became possible to hit.
* Another trivial comment-typo fix.Tom Lane2012-04-25
|
* Casts to or from a domain type are ignored; warn and document.Robert Haas2012-04-24
| | | | | | | | Prohibiting this outright would break dumps taken from older versions that contain such casts, which would create far more pain than is justified here. Per report by Jaime Casanova and subsequent discussion.
* Lots of doc corrections.Robert Haas2012-04-23
| | | | Josh Kupershmidt
* Rearrange lazy_scan_heap to avoid visibility map race conditions.Robert Haas2012-04-23
| | | | | | | | | | | | | | | | | | | We must set the visibility map bit before releasing our exclusive lock on the heap page; otherwise, someone might clear the heap page bit before we set the visibility map bit, leading to a situation where the visibility map thinks the page is all-visible but it's really not. This problem has existed since 8.4, but it wasn't critical before we had index-only scans, since the worst case scenario was that the page wouldn't get vacuumed until the next scan_all vacuum. Along the way, a couple of minor, related improvements: (1) if we pause the heap scan to do an index vac cycle, release any visibility map page we're holding, since really long-running pins are not good for a variety of reasons; and (2) warn if we see a page that's marked all-visible in the visibility map but not on the page level, since that should never happen any more (it was allowed in previous releases, but not in 9.2).