postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Fix a stupid bug I introduced into XLogFlush().	Robert Haas	2012-07-02
\| \| \| \| \|	Commit f11e8be3e812cdbbc139c1b4e49141378b118dee broke this; it was right in Peter's original patch, but I messed it up before committing.
*	Fix position of WalSndWakeupRequest call.	Robert Haas	2012-07-02
\| \| \| \| \| \| \|	This avoids discriminating against wal_sync_method = open_sync or open_datasync. Fujii Masao, reviewed by Andres Freund
*	Assorted message style improvements	Peter Eisentraut	2012-07-02
\|
*	Fix to_date's handling of year 519.	Tom Lane	2012-07-02
\| \| \| \| \| \| \| \|	A thinko in commit 029dfdf1157b6d837a7b7211cd35b00c6bcd767c caused the year 519 to be handled differently from either adjacent year, which was not the intention AFAICS. Report and diagnosis by Marc Cousin. In passing, remove redundant re-tests of year value.
*	Work a little harder on comments for walsender wakeup patch.	Robert Haas	2012-07-02
\| \| \| \|	Per gripe from Tom Lane.
*	Make commit_delay much smarter.	Robert Haas	2012-07-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of letting every backend participating in a group commit wait independently, have the first one that becomes ready to flush WAL wait for the configured delay, and let all the others wait just long enough for that first process to complete its flush. This greatly increases the chances of being able to configure a commit_delay setting that actually improves performance. As a side consequence of this change, commit_delay now affects all WAL flushes, rather than just commits. There was some discussion on pgsql-hackers about whether to rename the GUC to, say, wal_flush_delay, but in the absence of consensus I am leaving it alone for now. Peter Geoghegan, with some changes, mostly to the documentation, by me.
*	Make walsender more responsive.	Robert Haas	2012-07-02
\| \| \| \| \| \| \| \| \| \| \|	Per testing by Andres Freund, this improves replication performance and reduces replication latency and latency jitter. I was a bit concerned about moving more work into XLogInsert, but testing seems to show that it's not a problem in practice. Along the way, improve comments for WaitLatchOrSocket. Andres Freund. Review and stylistic cleanup by me.
*	Fix race condition in enum value comparisons.	Tom Lane	2012-07-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When (re) loading the typcache comparison cache for an enum type's values, use an up-to-date MVCC snapshot, not the transaction's existing snapshot. This avoids problems if we encounter an enum OID that was created since our transaction started. Per report from Andres Freund and diagnosis by Robert Haas. To ensure this is safe even if enum comparison manages to get invoked before we've set a transaction snapshot, tweak GetLatestSnapshot to redirect to GetTransactionSnapshot instead of throwing error when FirstSnapshotSet is false. The existing uses of GetLatestSnapshot (in ri_triggers.c) don't care since they couldn't be invoked except in a transaction that's already done some work --- but it seems just conceivable that this might not be true of enums, especially if we ever choose to use enums in system catalogs. Note that the comparable coding in enum_endpoint and enum_range_internal remains GetTransactionSnapshot; this is perhaps debatable, but if we changed it those functions would have to be marked volatile, which doesn't seem attractive. Back-patch to 9.1 where ALTER TYPE ADD VALUE was added.
*	Suppress compiler warnings in readfuncs.c.	Tom Lane	2012-06-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 7357558fc8866e3a449aa9473c419b593d67b5b6 introduced "(void) token;" into the READ_TEMP_LOCALS() macro, to suppress complaints from gcc 4.6 when the value of token was not used anywhere in a particular node-read function. However, this just moved the warning around: inspection of buildfarm results shows that some compilers are now complaining that token is being read before it's set. Revert the READ_TEMP_LOCALS() macro change and instead put "(void) token;" into READ_NODE_FIELD(), which is the principal culprit for cases where the warning might occur. In principle we might need the same in READ_BITMAPSET_FIELD() and/or READ_LOCATION_FIELD(), but it seems unlikely that a node would consist only of such fields, so I'll leave them alone for now.
*	Remove inappropriate semicolons after function definitions.	Tom Lane	2012-06-30
\| \| \| \| \|	Solaris Studio warns about this, and some compilers might think it's an outright syntax error.
*	Declare AnonymousShmem pointer as "void *".	Tom Lane	2012-06-30
\| \| \| \| \| \| \| \|	The original coding had it as "PGShmemHeader ", but that doesn't offer any notational benefit because we don't dereference it. And it was resulting in compiler warnings on some platforms, notably buildfarm member castoroides, where mmap() and munmap() are evidently declared to take and return "char ".
*	Prevent CREATE TABLE LIKE/INHERITS from (mis) copying whole-row Vars.	Tom Lane	2012-06-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a CHECK constraint or index definition contained a whole-row Var (that is, "table.*"), an attempt to copy that definition via CREATE TABLE LIKE or table inheritance produced incorrect results: the copied Var still claimed to have the rowtype of the source table, rather than the created table. For the LIKE case, it seems reasonable to just throw error for this situation, since the point of LIKE is that the new table is not permanently coupled to the old, so there's no reason to assume its rowtype will stay compatible. In the inheritance case, we should ideally allow such constraints, but doing so will require nontrivial refactoring of CREATE TABLE processing (because we'd need to know the OID of the new table's rowtype before we adjust inherited CHECK constraints). In view of the lack of previous complaints, that doesn't seem worth the risk in a back-patched bug fix, so just make it throw error for the inheritance case as well. Along the way, replace change_varattnos_of_a_node() with a more robust function map_variable_attnos(), which is capable of being extended to handle insertion of ConvertRowtypeExpr whenever we get around to fixing the inheritance case nicely, and in the meantime it returns a failure indication to the caller so that a helpful message with some context can be thrown. Also, this code will do the right thing with subselects (if we ever allow them in CHECK or indexes), and it range-checks varattnos before using them to index into the map array. Per report from Sergey Konoplev. Back-patch to all supported branches.
*	Validate xlog record header before enlarging the work area to store it.	Heikki Linnakangas	2012-06-30
\| \| \| \| \| \| \| \| \| \| \| \|	If the record header is garbled, we're now quite likely to notice it before we try to make a bogus memory allocation and run out of memory. That can still happen, if the xlog record is split across pages (we cannot verify the record header until reading the next page in that scenario), but this reduces the chances. An out-of-memory is treated as a corrupt record anyway, so this isn't a correctness issue, just a case of giving a better error message. Per Amit Kapila's suggestion.
*	Fix confusion between "size" and "AnonymousShmemSize".	Tom Lane	2012-06-29
\| \| \| \|	Noted by Andres Freund. Also improve a couple of comments.
*	Initialize shared memory copy of ckptXidEpoch correctly when not in recovery.	Heikki Linnakangas	2012-06-29
\| \| \| \| \| \| \|	This bug was introduced by commit 20d98ab6e4110087d1816cd105a40fcc8ce0a307, so backpatch this to 9.0-9.2 like that one. This fixes bug #6710, reported by Tarvi Pillessaar
*	Fix NOTIFY to cope with I/O problems, such as out-of-disk-space.	Tom Lane	2012-06-29
\| \| \| \| \| \| \| \| \| \| \| \| \|	The LISTEN/NOTIFY subsystem got confused if SimpleLruZeroPage failed, which would typically happen as a result of a write() failure while attempting to dump a dirty pg_notify page out of memory. Subsequently, all attempts to send more NOTIFY messages would fail with messages like "Could not read from file "pg_notify/nnnn" at offset nnnnn: Success". Only restarting the server would clear this condition. Per reports from Kevin Grittner and Christoph Berg. Back-patch to 9.0, where the problem was introduced during the LISTEN/NOTIFY rewrite.
*	Provide MAP_FAILED if sys/mman.h doesn't.	Tom Lane	2012-06-28
\| \| \| \| \| \|	On old HPUX this has to be #defined to -1. It might be that other values are required on other dinosaur systems, but we'll worry about that when and if we get reports.
*	Update outdated commit; xlp_rem_len field is in page header now.	Heikki Linnakangas	2012-06-28
\| \| \| \|	Spotted by Amit Kapila
*	Fix broken mmap failure-detection code, and improve error message.	Robert Haas	2012-06-28
\| \| \| \| \|	Per an observation by Thom Brown that my previous commit made an overly large shmem allocation crash the server, on Linux.
*	Dramatically reduce System V shared memory consumption.	Robert Haas	2012-06-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Except when compiling with EXEC_BACKEND, we'll now allocate only a tiny amount of System V shared memory (as an interlock to protect the data directory) and allocate the rest as anonymous shared memory via mmap. This will hopefully spare most users the hassle of adjusting operating system parameters before being able to start PostgreSQL with a reasonable value for shared_buffers. There are a bunch of documentation updates needed here, and we might need to adjust some of the HINT messages related to shared memory as well. But it's not 100% clear how portable this is, so before we write the documentation, let's give it a spin on the buildfarm and see what turns red.
*	Add missing space in event_source GUC description.	Robert Haas	2012-06-28
\| \| \| \| \| \|	This has apparently been wrong since event_source was added. Alexander Lakhin
*	Make UtilityContainsQuery recurse until it finds a non-utility Query.	Tom Lane	2012-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The callers of UtilityContainsQuery want it to return a non-utility Query if it returns anything at all. However, since we made CREATE TABLE AS/SELECT INTO into a utility command instead of a variant of SELECT, a command like "EXPLAIN SELECT INTO" results in two nested utility statements. So what we need UtilityContainsQuery to do is drill down to the bottom non-utility Query. I had thought of this possibility in setrefs.c, and fixed it there by looping around the UtilityContainsQuery call; but overlooked that the call sites in plancache.c have a similar issue. In those cases it's notationally inconvenient to provide an external loop, so let's redefine UtilityContainsQuery as recursing down to a non-utility Query instead. Noted by Rushabh Lathia. This is a somewhat cleaned-up version of his proposed patch.
*	Fix two more neglected comments, still referring to log/seg.	Heikki Linnakangas	2012-06-27
\| \| \| \|	Fujii Masao
*	I neglected many comments in the log+seg -> 64-bit segno patch. Fix.	Heikki Linnakangas	2012-06-27
\| \| \| \|	Reported by Amit Kapila.
*	Allow pg_terminate_backend() to be used on backends with matching role.	Robert Haas	2012-06-26
\| \| \| \| \| \| \| \|	A similar change was made previously for pg_cancel_backend, so now it all matches again. Dan Farina, reviewed by Fujii Masao, Noah Misch, and Jeff Davis, with slight kibitzing on the doc changes by me.
*	When LWLOCK_STATS is defined, count spindelays.	Robert Haas	2012-06-26
\| \| \| \| \| \| \|	When LWLOCK_STATS is not defined, the only change is that SpinLockAcquire now returns the number of delays. Patch by me, review by Jeff Janes.
*	Cope with smaller-than-normal BLCKSZ setting in SPGiST indexes on text.	Tom Lane	2012-06-26
\| \| \| \| \| \| \| \| \| \| \| \|	The original coding failed miserably for BLCKSZ of 4K or less, as reported by Josh Kupershmidt. With the present design for text indexes, a given inner tuple could have up to 256 labels (requiring either 3K or 4K bytes depending on MAXALIGN), which means that we can't positively guarantee no failures for smaller blocksizes. But we can at least make it behave sanely so long as there are few enough labels to fit on a page. Considering that btree is also more prone to "index tuple too large" failures when BLCKSZ is small, it's not clear that we should expend more work than this on this case.
*	Make DROP FUNCTION hint more informative.	Robert Haas	2012-06-26
\| \| \| \| \| \| \|	If you decide you want to take the hint, this gives you something you can paste right back to the server. Dean Rasheed
*	Reduce use of heavyweight locking inside hash AM.	Robert Haas	2012-06-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid using LockPage(rel, 0, lockmode) to protect against changes to the bucket mapping. Instead, an exclusive buffer content lock is now viewed as sufficient permission to modify the metapage, and a shared buffer content lock is used when such modifications need to be prevented. This more relaxed locking regimen makes it possible that, when we're busy getting a heavyweight bucket on the bucket we intend to search or insert into, a bucket split might occur underneath us. To compenate for that possibility, we use a loop-and-retry system: release the metapage content lock, acquire the heavyweight lock on the target bucket, and then reacquire the metapage content lock and check that the bucket mapping has not changed. Normally it hasn't, and we're done. But if by chance it has, we simply unlock the metapage, release the heavyweight lock we acquired previously, lock the new bucket, and loop around again. Even in the worst case we cannot loop very many times here, since we don't split the same bucket again until we've split all the other buckets, and 2^N gets big pretty fast. This results in greatly improved concurrency, because we're effectively replacing two lwlock acquire-and-release cycles in exclusive mode (on one of the lock manager locks) with a single acquire-and-release cycle in shared mode (on the metapage buffer content lock). Testing shows that it's still not quite as good as btree; for that, we'd probably have to find some way of getting rid of the heavyweight bucket locks as well, which does not appear straightforward. Patch by me, review by Jeff Janes.
*	Tighten up includes in sinvaladt.h, twophase.h, proc.h	Alvaro Herrera	2012-06-25
\| \| \| \| \|	Remove proc.h from sinvaladt.h and twophase.h; also replace xlog.h in proc.h with xlogdefs.h.
*	Unify calling conventions for postgres/postmaster sub-main functions	Peter Eisentraut	2012-06-25
\| \| \| \| \| \| \| \| \| \| \|	There was a wild mix of calling conventions: Some were declared to return void and didn't return, some returned an int exit code, some claimed to return an exit code, which the callers checked, but actually never returned, and so on. Now all of these functions are declared to return void and decorated with attribute noreturn and don't return. That's easiest, and most code already worked that way.
*	Fix typo in DEBUG message, introduced by recent WAL refactoring.	Robert Haas	2012-06-25
\| \| \| \|	Fujii Masao
*	Replace int2/int4 in C code with int16/int32	Peter Eisentraut	2012-06-25
\| \| \| \| \| \| \| \| \| \|	The latter was already the dominant use, and it's preferable because in C the convention is that intXX means XX bits. Therefore, allowing mixed use of int2, int4, int8, int16, int32 is obviously confusing. Remove the typedefs for int2 and int4 for now. They don't seem to be widely used outside of the PostgreSQL source tree, and the few uses can probably be cleaned up by the time this ships.
*	Oops. Remove stray paren.	Heikki Linnakangas	2012-06-24
\| \| \| \|	I didn't notice this on my laptop as I don't HAVE_FSYNC_WRITETHROUGH.
*	Replace XLogRecPtr struct with a 64-bit integer.	Heikki Linnakangas	2012-06-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This simplifies code that needs to do arithmetic on XLogRecPtrs. To avoid changing on-disk format of data pages, the LSN on data pages is still stored in the old format. That should keep pg_upgrade happy. However, we have XLogRecPtrs embedded in the control file, and in the structs that are sent over the replication protocol, so this changes breaks compatibility of pg_basebackup and server. I didn't do anything about this in this patch, per discussion on -hackers, the right thing to do would to be to change the replication protocol to be architecture-independent, so that you could use a newer version of pg_receivexlog, for example, against an older server version.
*	Allow WAL record header to be split across pages.	Heikki Linnakangas	2012-06-24
\| \| \| \| \| \| \| \| \| \| \| \| \|	This saves a few bytes of WAL space, but the real motivation is to make it predictable how much WAL space a record requires, as it no longer depends on whether we need to waste the last few bytes at end of WAL page because the header doesn't fit. The total length field of WAL record, xl_tot_len, is moved to the beginning of the WAL record header, so that it is still always found on the first page where a WAL record begins. Bump WAL version number again as this is an incompatible change.
*	Move WAL continuation record information to WAL page header.	Heikki Linnakangas	2012-06-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The continuation record only contained one field, xl_rem_len, so it makes things simpler to just include it in the WAL page header. This wastes four bytes on pages that don't begin with a continuation from previos page, plus four bytes on every page, because of padding. The motivation of this is to make it easier to calculate how much space a WAL record needs. Before this patch, it depended on how many page boundaries the record crosses. The motivation of that, in turn, is to separate the allocation of space in the WAL from the copying of the record data to the allocated space. Keeping the calculation of space required simple helps to keep the critical section of allocating the space from WAL short. But that's not included in this patch yet. Bump WAL version number again, as this is an incompatible change.
*	Don't waste the last segment of each 4GB logical log file.	Heikki Linnakangas	2012-06-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The comments claimed that wasting the last segment made it easier to do calculations with XLogRecPtrs, because you don't have problems representing last-byte-position-plus-1 that way. In my experience, however, it only made things more complicated, because the there was two ways to represent the boundary at the beginning of a logical log file: logid = n+1 and xrecoff = 0, or as xlogid = n and xrecoff = 4GB - XLOG_SEG_SIZE. Some functions were picky about which representation was used. Also, use a 64-bit segment number instead of the log/seg combination, to point to a certain WAL segment. We assume that all platforms have a working 64-bit integer type nowadays. This is an incompatible change in WAL format, so bumping WAL version number.
*	Fix memory leak in ARRAY(SELECT ...) subqueries.	Tom Lane	2012-06-21
\| \| \| \| \| \| \| \| \| \| \|	Repeated execution of an uncorrelated ARRAY_SUBLINK sub-select (which I think can only happen if the sub-select is embedded in a larger, correlated subquery) would leak memory for the duration of the query, due to not reclaiming the array generated in the previous execution. Per bug #6698 from Armando Miraglia. Diagnosis and fix idea by Heikki, patch itself by me. This has been like this all along, so back-patch to all supported versions.
*	Repair comment mangled by a pgindent run long ago	Alvaro Herrera	2012-06-21
\|
*	Add a small cache of locks owned by a resource owner in ResourceOwner.	Heikki Linnakangas	2012-06-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This speeds up reassigning locks to the parent owner, when the transaction holds a lot of locks, but only a few of them belong to the current resource owner. This is particularly helps pg_dump when dumping a large number of objects. The cache can hold up to 15 locks in each resource owner. After that, the cache is marked as overflowed, and we fall back to the old method of scanning the whole local lock table. The tradeoff here is that the cache has to be scanned whenever a lock is released, so if the cache is too large, lock release becomes more expensive. 15 seems enough to cover pg_dump, and doesn't have much impact on lock release. Jeff Janes, reviewed by Amit Kapila and Heikki Linnakangas.
*	Remove incomplete/incorrect support for zero-column foreign keys.	Tom Lane	2012-06-20
\| \| \| \| \| \| \| \| \| \| \|	The original coding in ri_triggers.c had partial support for the concept of zero-column foreign key constraints. But this is not defined in the SQL standard, nor was it ever allowed by any other part of Postgres, nor was it very fully implemented even here (eg there was no support for preventing PK-table deletions that would violate the constraint). Doesn't seem very useful to carry 100-plus lines of code for a corner case that no one is interested in making work. Instead, just add a check that the column list read from pg_constraint is non-empty.
*	Increase MAX_SYSCACHE_CALLBACKS from 20 to 32.	Tom Lane	2012-06-20
\| \| \| \| \| \| \| \|	By my count there are 18 callers of CacheRegisterSyscacheCallback in the core code in HEAD, so we are potentially leaving as few as 2 slots for any add-on code to use (though possibly not all these callers would actually activate in any particular session). That doesn't seem like a lot of headroom, so let's pump it up a little.
*	Cache the results of ri_FetchConstraintInfo in a backend-local cache.	Tom Lane	2012-06-20
\| \| \| \| \| \| \| \| \|	Extracting data from pg_constraint turned out to take as much as 10% of the runtime in a bulk-update case where the foreign key column wasn't changing, because we did it over again for each tuple. Fix that by maintaining a backend-local cache of the results. This is really a pretty small patch, but converting the trigger functions to work with pointers rather than local struct variables requires a lot of mechanical changes.
*	Improve tests for whether we can skip queueing RI enforcement triggers.	Tom Lane	2012-06-19
\| \| \| \| \| \| \| \| \| \| \| \| \|	During an update of a PK row, we can skip firing the RI trigger if any old key value is NULL, because then the row could not have had any matching rows in the FK table. Conversely, during an update of an FK row, the outcome is determined if any new key value is NULL. In either case it becomes unnecessary to compare individual key values. This patch was inspired by discussion of Vik Reykja's patch to use IS NOT DISTINCT semantics for the key comparisons. In the event there is no need for that and so this patch looks nothing like his, but he should still get credit for having re-opened consideration of the trigger skip logic.
*	Share RI trigger code between NO ACTION and RESTRICT cases.	Tom Lane	2012-06-19
\| \| \| \| \| \| \| \| \| \| \| \|	These triggers are identical except for whether ri_Check_Pk_Match is to be called, so factor out the common code to save a couple hundred lines. Also, eliminate null-column checks in ri_Check_Pk_Match, since they're duplicate with the calling functions and require unnecessary complication in its API statement. Simplify the way code is shared between RI_FKey_check_ins and RI_FKey_check_upd, too.
*	Improve comments about why SET DEFAULT triggers must recheck for matches.	Tom Lane	2012-06-18
\| \| \| \| \| \| \| \|	I was confused about this, so try to make it clearer for the next person. (This seems like a fairly inefficient way of dealing with a corner case, but I don't have a better idea offhand. Maybe if there were a way to turn off the RI_FKey_keyequal_upd_fk event filter temporarily?)
*	Allow ON UPDATE/DELETE SET DEFAULT plans to be cached.	Tom Lane	2012-06-18
\| \| \| \| \| \| \| \| \| \| \| \| \|	Once upon a time, somebody was worried that cached RI plans wouldn't get remade with new default values after ALTER TABLE ... SET DEFAULT, so they didn't allow caching of plans for ON UPDATE/DELETE SET DEFAULT actions. That time is long gone, though (and even at the time I doubt this was the greatest hazard posed by ALTER TABLE...). So allow these triggers to cache their plans just like the others. The cache_plan argument to ri_PlanCheck is now vestigial, since there are no callers that don't pass "true"; but I left it alone in case there is any future need for it.
*	Remove derived fields from RI_QueryKey, and do a bit of other cleanup.	Tom Lane	2012-06-18
\| \| \| \| \| \| \| \| \| \| \| \|	We really only need the foreign key constraint's OID and the query type code to uniquely identify each plan we are caching for FK checks. The other stuff that was in the struct had no business being used as part of a hash key, and was all just being copied from struct RI_ConstraintInfo anyway. Get rid of the unnecessary fields, and readjust various function APIs to make them use RI_ConstraintInfo not RI_QueryKey as info source. I'd be surprised if this makes any measurable performance difference, but it certainly feels cleaner.
*	Update SQL spec references in ri_triggers code to match SQL:2008.	Tom Lane	2012-06-18
\| \| \| \| \| \| \| \| \|	Now that what we're implementing isn't SQL92, we probably shouldn't cite chapter and verse in that spec anymore. Also fix some comments that talked about MATCH FULL but in fact were in code that's also used for MATCH SIMPLE. No code changes in this commit, just comments.