postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Make lazy_vacuum_rel call pg_rusage_init only if needed.	Robert Haas	2011-08-18
\| \| \| \| \| \|	do_analyze_rel already does it this way. Euler Taveira de Oliveira
*	Remove obsolete README file.	Robert Haas	2011-08-18
\| \| \| \| \| \|	Perhaps we ought to add some other kind of documentation here instead, but for now let's get rid of this woefully obsolete description of the sinval machinery.
*	Translation updates	Peter Eisentraut	2011-08-17
\|
*	Fix comment about which version had BACKUP METHOD line in backup_lable, again.	Heikki Linnakangas	2011-08-17
\| \| \| \|	It was invalidated again by Fujii's patch to 9.1.
*	Revise sinval code to remove no-longer-used tuple TID from inval messages.	Tom Lane	2011-08-16
\| \| \| \| \| \| \| \| \| \|	This requires adjusting the API for syscache callback functions: they now get a hash value, not a TID, to identify the target tuple. Most of them weren't paying any attention to that argument anyway, but plancache did require a small amount of fixing. Also, improve performance a trifle by avoiding sending duplicate inval messages when a heap_update isn't changing the catcache lookup columns.
*	Forget about targeting catalog cache invalidations by tuple TID.	Tom Lane	2011-08-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The TID isn't stable enough: we might queue an sinval event before a VACUUM FULL, and then process it afterwards, when the target tuple no longer has the same TID. So we must invalidate entries on the basis of hash value only. The old coding can be shown to result in various bizarre, hard-to-reproduce errors in the presence of concurrent VACUUM FULLs on system catalogs, and could easily result in permanent catalog corruption, up to and including complete loss of tables. This commit is just a minimal fix that removes the unsafe comparison. We should remove transmission of the tuple TID from sinval messages altogether, and then arrange to suppress the extra message in the common case of a heap_update that doesn't change the key hashvalue. But that's going to be much more invasive, and will only produce a probably-marginal performance gain, so it doesn't seem like material for a back-patch. Back-patch to 9.0. Before that, VACUUM FULL refused to do any tuple moving if it found any INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples (and CLUSTER would give up altogether), so there was no risk of moving a tuple that might be the subject of an unsent sinval message.
*	Fix incorrect order of operations during sinval reset processing.	Tom Lane	2011-08-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have to be sure that we have revalidated each nailed-in-cache relcache entry before we try to use it to load data for some other relcache entry. The introduction of "mapped relations" in 9.0 broke this, because although we updated the state kept in relmapper.c early enough, we failed to propagate that information into relcache entries soon enough; in particular, we could try to fetch pg_class rows out of pg_class before we'd updated its relcache entry's rd_node.relNode value from the map. This bug accounts for Dave Gould's report of failures after "vacuum full pg_class", and I believe that there is risk for other system catalogs as well. The core part of the fix is to copy relmapper data into the relcache entries during "phase 1" in RelationCacheInvalidate(), before they'll be used in "phase 2". To try to future-proof the code against other similar bugs, I also rearranged the order in which nailed relations are visited during phase 2: now it's pg_class first, then pg_class_oid_index, then other nailed relations. This should ensure that RelationClearRelation can apply RelationReloadIndexInfo to all nailed indexes without risking use of not-yet-revalidated relcache entries. Back-patch to 9.0 where the relation mapper was introduced.
*	Preserve toast value OIDs in toast-swap-by-content for CLUSTER/VACUUM FULL.	Tom Lane	2011-08-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This works around the problem that a catalog cache entry might contain a toast pointer that we try to dereference just as a VACUUM FULL completes on that catalog. We will see the sinval message on the cache entry when we acquire lock on the toast table, but by that point we've already told tuptoaster.c "here's the pointer to fetch", so it's difficult from a code structural standpoint to update the pointer before we use it. Much less painful to ensure that toast pointers are not invalidated in the first place. We have to add a bit of code to deal with the case that a value that previously wasn't toasted becomes so; but that should be a seldom-exercised corner case, so the inefficiency shouldn't be significant. Back-patch to 9.0. In prior versions, we didn't allow CLUSTER on system catalogs, and VACUUM FULL didn't result in reassignment of toast OIDs, so there was no problem.
*	Fix race condition in relcache init file invalidation.	Tom Lane	2011-08-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous code tried to synchronize by unlinking the init file twice, but that doesn't actually work: it leaves a window wherein a third process could read the already-stale init file but miss the SI messages that would tell it the data is stale. The result would be bizarre failures in catalog accesses, typically "could not read block 0 in file ..." later during startup. Instead, hold RelCacheInitLock across both the unlink and the sending of the SI messages. This is more straightforward, and might even be a bit faster since only one unlink call is needed. This has been wrong since it was put in (in 2002!), so back-patch to all supported releases.
*	Fix bogus comment that claimed that the new BACKUP METHOD line in	Heikki Linnakangas	2011-08-16
\| \| \| \|	backup_label was new in 9.0. Spotted by Fujii Masao.
*	Add "Reason code" prefix to internal SSI error messages	Peter Eisentraut	2011-08-15
\| \| \| \| \| \| \| \| \| \|	This makes it clearer that the error message is perhaps not supposed to be understood by users, and it also makes it somewhat clearer that it was not accidentally omitted from translation. Idea from Heikki Linnakangas, except that we don't mark "Reason code" for translation at this point, because that would make the implementation too cumbersome.
*	Fix unsafe order of operations in foreign-table DDL commands.	Tom Lane	2011-08-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When updating or deleting a system catalog tuple, it's necessary to acquire RowExclusiveLock on the catalog before looking up the tuple; otherwise a concurrent VACUUM FULL on the catalog might move the tuple to a different TID before we can apply the update. Coding patterns that find the tuple via a table scan aren't at risk here, but when obtaining the tuple from a catalog cache, correct ordering is important; and several routines in foreigncmds.c got it wrong. Noted while running the regression tests in parallel with VACUUM FULL of assorted system catalogs. For consistency I moved all the heap_open calls to the starts of their functions, including a couple for which there was no actual bug. Back-patch to 8.4 where foreigncmds.c was added.
*	Fix incorrect timeout handling during initial authentication transaction.	Tom Lane	2011-08-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The statement start timestamp was not set before initiating the transaction that is used to look up client authentication information in pg_authid. In consequence, enable_sig_alarm computed a wrong value (far in the past) for statement_fin_time. That didn't have any immediate effect, because the timeout alarm was set without reference to statement_fin_time; but if we subsequently blocked on a lock for a short time, CheckStatementTimeout would consult the bogus value when we cancelled the lock timeout wait, and then conclude we'd timed out, leading to immediate failure of the connection attempt. Thus an innocent "vacuum full pg_authid" would cause failures of concurrent connection attempts. Noted while testing other, more serious consequences of vacuum full on system catalogs. We should set the statement timestamp before StartTransactionCommand(), so that the transaction start timestamp is also valid. I'm not sure if there are any non-cosmetic effects of it not being valid, but the xact timestamp is at least sent to the statistics machinery. Back-patch to 9.0. Before that, the client authentication timeout was done outside any transaction and did not depend on this state to be valid.
*	Teach unix_latch.c to use poll() where available.	Tom Lane	2011-08-11
\| \| \| \| \| \| \| \| \|	poll() is preferred over select() on platforms where both are available, because it tends to be a bit faster and it doesn't have an arbitrary limit on the range of FD numbers that can be accessed. The FD range limit does not appear to be a risk factor for any 9.1 usages, so this doesn't need to be back-patched, but we need to have it in place if we keep on expanding the uses of WaitLatch.
*	Unbreak legacy syntax "COMMENT ON RULE x IS y", with no relation name.	Robert Haas	2011-08-11
\| \| \| \| \|	check_object_ownership() isn't happy about the null relation pointer. We could fix it there, but this seems more future-proof.
*	Remove wal_sender_delay GUC, because it's no longer useful.	Tom Lane	2011-08-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The latch infrastructure is now capable of detecting all cases where the walsender loop needs to wake up, so there is no reason to have an arbitrary timeout. Also, modify the walsender loop logic to follow the standard pattern of ResetLatch, test for work to do, WaitLatch. The previous coding was both hard to follow and buggy: it would sometimes busy-loop despite having nothing available to do, eg between receipt of a signal and the next time it was caught up with new WAL, and it also had interesting choices like deciding to update to WALSNDSTATE_STREAMING on the strength of information known to be obsolete.
*	Add a bit of debug logging to backend_read_statsfile().	Tom Lane	2011-08-10
\| \| \| \| \| \| \| \|	This is in hopes of learning more about what causes "pgstat wait timeout" warnings in the buildfarm. This patch should probably be reverted once we've learned what we can. As coded, it will result in regression test "failures" at half the delay that the existing code does, so I expect to see a few more than before.
*	Change the autovacuum launcher to use WaitLatch instead of a poll loop.	Tom Lane	2011-08-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In pursuit of this (and with the expectation that WaitLatch will be needed in more places), convert the latch field that was already added to PGPROC for sync rep into a generic latch that is activated for all PGPROC-owning processes, and change many of the standard backend signal handlers to set that latch when a signal happens. This will allow WaitLatch callers to be wakened properly by these signals. In passing, fix a whole bunch of signal handlers that had been hacked to do things that might change errno, without adding the necessary save/restore logic for errno. Also make some minor fixes in unix_latch.c, and clean up bizarre and unsafe scheme for disowning the process's latch. Much of this has to be back-patched into 9.1. Peter Geoghegan, with additional work by Tom
*	If backup-end record is not seen, and we reach end of recovery from a	Heikki Linnakangas	2011-08-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	streamed backup, throw an error and refuse to start up. The restore has not finished correctly in that case and the data directory is possibly corrupt. We already errored out in case of archive recovery, but could not during crash recovery because we couldn't distinguish between the case that pg_start_backup() was called and the database then crashed (must not error, data is OK), and the case that we're restoring from a backup and not all the needed WAL was replayed (data can be corrupt). To distinguish those cases, add a line to backup_label to indicate whether the backup was taken with pg_start/stop_backup(), or by streaming (ie. pg_basebackup). This requires re-initdb, because of a new field added to the control file.
*	Measure WaitLatch's timeout parameter in milliseconds, not microseconds.	Tom Lane	2011-08-09
\| \| \| \| \| \| \| \| \| \| \| \|	The original definition had the problem that timeouts exceeding about 2100 seconds couldn't be specified on 32-bit machines. Milliseconds seem like sufficient resolution, and finer grain than that would be fantasy anyway on many platforms. Back-patch to 9.1 so that this aspect of the latch API won't change between 9.1 and later releases. Peter Geoghegan
*	Documentation improvement and minor code cleanups for the latch facility.	Tom Lane	2011-08-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improve the documentation around weak-memory-ordering risks, and do a pass of general editorialization on the comments in the latch code. Make the Windows latch code more like the Unix latch code where feasible; in particular provide the same Assert checks in both implementations. Fix poorly-placed WaitLatch call in syncrep.c. This patch resolves, for the moment, concerns around weak-memory-ordering bugs in latch-related code: we have documented the restrictions and checked that existing calls meet them. In 9.2 I hope that we will install suitable memory barrier instructions in SetLatch/ResetLatch, so that their callers don't need to be quite so careful.
*	Avoid creating PlaceHolderVars immediately within PlaceHolderVars.	Tom Lane	2011-08-09
\| \| \| \| \| \| \| \| \|	Such a construction is useless since the lower PlaceHolderVar is already nullable; no need to make it more so. Noted while pursuing bug #6154. This is just a minor planner efficiency improvement, since the final plan will come out the same anyway after PHVs are flattened. So not worth the risk of back-patching.
*	Use clearer notation for getnameinfo() return handling	Peter Eisentraut	2011-08-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Writing if (getnameinfo(...)) handle_error(); reads quite strangely, so use something like if (getnameinfo(...) != 0) handle_error(); instead.
*	Change the way string relopts are allocated.	Heikki Linnakangas	2011-08-09
\| \| \| \| \| \| \| \| \| \| \| \|	Don't try to allocate the default value for a string relopt in the same palloc chunk as the relopt_string struct. That didn't work too well if you added a built-in string relopt in the stringRelOpts array, as it's not possible to have an initializer for a variable length struct in C. This makes the code slightly simpler too. While we're at it, move the call to validator function in add_string_reloption to before the allocation, so that if someone does pass a bogus default value, we don't leak memory.
*	Fix grammar and spelling in log message.	Heikki Linnakangas	2011-08-09
\|
*	Fix nested PlaceHolderVar expressions that appear only in targetlists.	Tom Lane	2011-08-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A PlaceHolderVar's expression might contain another, lower-level PlaceHolderVar. If the outer PlaceHolderVar is used, the inner one certainly will be also, and so we have to make sure that both of them get into the placeholder_list with correct ph_may_need values during the initial pre-scan of the query (before deconstruct_jointree starts). We did this correctly for PlaceHolderVars appearing in the query quals, but overlooked the issue for those appearing in the top-level targetlist; with the result that nested placeholders referenced only in the targetlist did not work correctly, as illustrated in bug #6154. While at it, add some error checking to find_placeholder_info to ensure that we don't try to create new placeholders after it's too late to do so; they have to all be created before deconstruct_jointree starts. Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.
*	Clean up ill-advised attempt to invent a private set of Node tags.	Tom Lane	2011-08-06
\| \| \| \| \| \| \| \| \| \| \|	Somebody thought it'd be cute to invent a set of Node tag numbers that were defined independently of, and indeed conflicting with, the main tag-number list. While this accidentally failed to fail so far, it would certainly lead to trouble as soon as anyone wanted to, say, apply copyObject to these node types. Clang was already complaining about the use of makeNode on these tags, and I think quite rightly so. Fix by pushing these node definitions into the mainstream, including putting replnodes.h where it belongs.
*	Reduce PG_SYSLOG_LIMIT to 900 bytes.	Tom Lane	2011-08-05
\| \| \| \| \| \| \| \| \| \| \| \| \|	The previous limit of 1024 was set on the assumption that all modern syslog implementations have line length limits of 2KB or so. However, this is false, as at least Solaris and sysklogd truncate at only 1KB. 900 seems to leave enough room for the max likely length of the tacked-on prefixes, so let's go with that. As with the previous change, it doesn't seem wise to back-patch this into already-released branches; but it should be OK to sneak it into 9.1. Noah Misch
*	Allow per-column foreign data wrapper options.	Robert Haas	2011-08-05
\| \| \| \|	Shigeru Hanada, with fairly minor editing by me.
*	Create VXID locks "lazily" in the main lock table.	Robert Haas	2011-08-04
\| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of entering them on transaction startup, we materialize them only when someone wants to wait, which will occur only during CREATE INDEX CONCURRENTLY. In Hot Standby mode, the startup process must also be able to probe for conflicting VXID locks, but the lock need never be fully materialized, because the startup process does not use the normal lock wait mechanism. Since most VXID locks never need to touch the lock manager partition locks, this can significantly reduce blocking contention on read-heavy workloads. Patch by me. Review by Jeff Davis.
*	Make pgbench use erand48() rather than random().	Robert Haas	2011-08-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	glibc renders random() thread-safe by wrapping a futex lock around it; testing reveals that this limits the performance of pgbench on machines with many CPU cores. Rather than switching to random_r(), which is only available on GNU systems and crashes unless you use undocumented alchemy to initialize the random state properly, switch to our built-in implementation of erand48(), which is both thread-safe and concurrent. Since the list of reasons not to use the operating system's erand48() is getting rather long, rename ours to pg_erand48() (and similarly for our implementations of lrand48() and srand48()) and just always use those. We were already doing this on Cygwin anyway, and the glibc implementation is not quite thread-safe, so pgbench wouldn't be able to use that either. Per discussion with Tom Lane.
*	Move CheckRecoveryConflictDeadlock() call to a safer place.	Tom Lane	2011-08-02
\| \| \| \| \| \| \| \| \| \| \| \| \|	This kluge was inserted in a spot apparently chosen at random: the lock manager's state is not yet fully set up for the wait, and in particular LockWaitCancel hasn't been armed by setting lockAwaited, so the ProcLock will not get cleaned up if the ereport is thrown. This seems to not cause any observable problem in trivial test cases, because LockReleaseAll will silently clean up the debris; but I was able to cause failures with tests involving subtransactions. Fixes breakage induced by commit c85c941470efc44494fd7a5f426ee85fc65c268c. Back-patch to all affected branches.
*	Fix incorrect initialization of ProcGlobal->startupBufferPinWaitBufId.	Tom Lane	2011-08-02
\| \| \| \| \| \| \|	It was initialized in the wrong place and to the wrong value. With bad luck this could result in incorrect query-cancellation failures in hot standby sessions, should a HS backend be holding pin on buffer number 1 while trying to acquire a lock.
*	Avoid integer overflow when LIMIT + OFFSET >= 2^63.	Heikki Linnakangas	2011-08-02
\| \| \| \|	This fixes bug #6139 reported by Hitoshi Harada.
*	Minor stylistic corrections.	Robert Haas	2011-08-01
\|
*	Add host name resolution information to pg_hba.conf error messages	Peter Eisentraut	2011-07-31
\| \| \| \|	This is to be able to analyze issues with host names in pg_hba.conf.
*	Reduce sinval synchronization overhead.	Robert Haas	2011-07-29
\| \| \| \| \| \| \| \| \| \| \|	Testing shows that the overhead of acquiring and releasing SInvalReadLock and msgNumLock on high-core count boxes can waste a lot of CPU time and hurt performance. This patch adds a per-backend flag that allows us to skip all that locking in most cases. Further testing shows that this improves performance even when sinval traffic is very high. Patch by me. Review and testing by Noah Misch.
*	Minor message style adjustment	Peter Eisentraut	2011-07-27
\|
*	Check to see whether libxml2 handles error context the way we expect.	Tom Lane	2011-07-26
\| \| \| \| \| \| \|	It turns out to be possible to link against a libxml2.so that does this differently than the version we configured and built against, so we need a runtime check to avoid bizarre behavior. Per report from Bernd Helmle. Patch by Florian Pflug.
*	Replace printf format %i by %d	Peter Eisentraut	2011-07-26
\| \| \| \| \|	They are identical, but the overwhelming majority of the code uses %d, so standardize on that.
*	Silence compiler warning about uninitialized variable.	Andrew Dunstan	2011-07-25
\| \| \| \| \|	It is set correctly on the only path that uses it, but the compiler can't know that.
*	Use OpenSSL's SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER flag.	Tom Lane	2011-07-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This disables an entirely unnecessary "sanity check" that causes failures in nonblocking mode, because OpenSSL complains if we move or compact the write buffer. The only actual requirement is that we not modify pending data once we've attempted to send it, which we don't. Per testing and research by Martin Pihlak, though this fix is a lot simpler than his patch. I put the same change into the backend, although it's less clear whether it's necessary there. We do use nonblock mode in some situations in streaming replication, so seems best to keep the same behavior in the backend as in libpq. Back-patch to all supported releases.
*	Rethink behavior of CREATE OR REPLACE during CREATE EXTENSION.	Tom Lane	2011-07-23
\| \| \| \| \| \| \| \| \|	The original implementation simply did nothing when replacing an existing object during CREATE EXTENSION. The folly of this was exposed by a report from Marc Munro: if the existing object belongs to another extension, we are left in an inconsistent state. We should insist that the object does not belong to another extension, and then add it to the current extension if not already a member.
*	Unbreak unlogged tables.	Robert Haas	2011-07-22
\| \| \| \| \| \|	I broke this in commit 5da79169d3e9f0fab47da03318c44075b3f824c5, which was obviously insufficiently well tested. Add some regression tests in the hope of making future slip-ups more likely to be noticed.
*	Make xpath() do something useful with XPath expressions that return scalars.	Tom Lane	2011-07-21
\| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, xpath() simply returned an empty array if the expression did not yield a node set. This is useless for expressions that return scalars, such as one with name() at the top level. Arrange to return the scalar value as a single-element xml array, instead. (String values will be suitably escaped.) This change will also cause xpath_exists() to return true, not false, for such expressions. Florian Pflug, reviewed by Radoslaw Smogura
*	Ensure that xpath() escapes special characters in string values.	Tom Lane	2011-07-20
\| \| \| \| \| \| \| \| \| \| \|	Without this it's possible for the output to not be legal XML, as illustrated by the added regression test cases. NB: this change will need to be called out as an incompatibility in the 9.2 release notes, since it's possible somebody was relying on the old behavior, even though it's clearly wrong. Florian Pflug, reviewed by Radoslaw Smogura
*	Support SECURITY LABEL on databases, tablespaces, and roles.	Robert Haas	2011-07-20
\| \| \| \| \| \| \| \| \| \| \|	This requires a new shared catalog, pg_shseclabel. Along the way, fix the security_label regression tests so that they don't monkey with the labels of any pre-existing objects. This is unlikely to matter in practice, since only the label for the "dummy" provider was being manipulated. But this way still seems cleaner. KaiGai Kohei, with fairly extensive hacking by me.
*	Rewrite libxml error handling to be more robust.	Tom Lane	2011-07-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	libxml reports some errors (like invalid xmlns attributes) via the error handler hook, but still returns a success indicator to the library caller. This causes us to miss some errors that are important to report. Since the "generic" error handler hook doesn't know whether the message it's getting is for an error, warning, or notice, stop using that and instead start using the "structured" error handler hook, which gets enough information to be useful. While at it, arrange to save and restore the error handler hook setting in each libxml-using function, rather than assuming we can set and forget the hook. This should improve the odds of working nicely with third-party libraries that also use libxml. In passing, volatile-ize some local variables that get modified within PG_TRY blocks. I noticed this while testing with an older gcc version than I'd previously tried to compile xml.c with. Florian Pflug and Tom Lane, with extensive review/testing by Noah Misch
*	Remove O(N^2) performance issue with multiple SAVEPOINTs.	Simon Riggs	2011-07-19
\| \| \| \| \| \| \| \| \|	Subtransaction locks now released en masse at main commit, rather than repeatedly re-scanning for locks as we ascend the nested transaction tree. Split transaction state TBLOCK_SUBEND into two states, TBLOCK_SUBCOMMIT and TBLOCK_SUBRELEASE to allow the commit path to be optimised using the existing code in ResourceOwnerRelease() which appears to have been intended for this usage, judging from comments therein.
*	Some refinement for the "fast path" lock patch.	Robert Haas	2011-07-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. In GetLockStatusData, avoid initializing instance before we've ensured that the array is large enough. Otherwise, if repalloc moves the block around, we're hosed. 2. Add the word "Relation" to the name of some identifiers, to avoid assuming that the fast-path mechanism will only ever apply to relations (though these particular parts certainly will). Some of the macros could possibly use similar treatment, but the names are getting awfully long already. 3. Add a missing word to comment in AtPrepare_Locks().