postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Print more information about getObjectIdentityParts() failures.	Tom Lane	2014-12-31
\| \| \| \| \| \| \|	This might help us debug what's happening on some buildfarm members. In passing, reduce the message from ereport to elog --- it doesn't seem like this should be a user-facing case, so not worth translating.
*	Add missing pstrdup calls	Alvaro Herrera	2014-12-31
\| \| \| \| \| \| \| \|	The one for the OCLASS_COLLATION case was noticed by CLOBBER_CACHE_ALWAYS buildfarm members; the others I spotted by manual code inspection. Also remove a redundant check.
*	pg_event_trigger_dropped_objects: Add name/args output columns	Alvaro Herrera	2014-12-30
\| \| \| \| \| \| \| \| \|	These columns can be passed to pg_get_object_address() and used to reconstruct the dropped objects identities in a remote server containing similar objects, so that the drop can be replicated. Reviewed by Stephen Frost, Heikki Linnakangas, Abhijit Menon-Sen, Andres Freund.
*	Add pg_identify_object_as_address	Alvaro Herrera	2014-12-30
\| \| \| \| \| \| \| \| \| \| \| \|	This function returns object type and objname/objargs arrays, which can be passed to pg_get_object_address. This is especially useful because the textual representation can be copied to a remote server in order to obtain the corresponding OID-based address. In essence, this function is the inverse of recently added pg_get_object_address(). Catalog version bumped due to the addition of the new function. Also add docs to pg_get_object_address.
*	Use TypeName to represent type names in certain commands	Alvaro Herrera	2014-12-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In COMMENT, DROP, SECURITY LABEL, and the new pg_get_object_address function, we were representing types as a list of names, same as other objects; but types are special objects that require their own representation to be totally accurate. In the original COMMENT code we had a note about fixing it which was lost in the course of c10575ff005. Change all those places to use TypeName instead, as suggested by that comment. Right now the original coding doesn't cause any bugs, so no backpatch. It is more problematic for proposed future code that operate with object addresses from the SQL interface; type details such as array-ness are lost when working with the degraded representation. Thanks to Petr Jelínek and Dimitri Fontaine for offlist help on finding a solution to a shift/reduce grammar conflict.
*	Remove duplicate assignment in new pg_get_object_address() function.	Tom Lane	2014-12-28
\| \| \| \|	Noted by Coverity.
*	Restrict name list len for domain constraints	Alvaro Herrera	2014-12-26
\| \| \| \| \| \|	This avoids an ugly-looking "cache lookup failure" message. Ugliness pointed out by Andres Freund.
*	Grab heavyweight tuple lock only before sleeping	Alvaro Herrera	2014-12-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were trying to acquire the lock even when we were subsequently not sleeping in some other transaction, which opens us up unnecessarily to deadlocks. In particular, this is troublesome if an update tries to lock an updated version of a tuple and finds itself doing EvalPlanQual update chain walking; more than two sessions doing this concurrently will find themselves sleeping on each other because the HW tuple lock acquisition in heap_lock_tuple called from EvalPlanQualFetch races with the same tuple lock being acquired in heap_update -- one of these sessions sleeps on the other one to finish while holding the tuple lock, and the other one sleeps on the tuple lock. Per trouble report from Andrew Sackville-West in http://www.postgresql.org/message-id/20140731233051.GN17765@andrew-ThinkPad-X230 His scenario can be simplified down to a relatively simple isolationtester spec file which I don't include in this commit; the reason is that the current isolationtester is not able to deal with more than one blocked session concurrently and it blocks instead of raising the expected deadlock. In the future, if we improve isolationtester, it would be good to include the spec file in the isolation schedule. I posted it in http://www.postgresql.org/message-id/20141212205254.GC1768@alvh.no-ip.org Hat tip to Mark Kirkwood, who helped diagnose the trouble.
*	Blindly fix a dtrace probe in lwlock.c for a removed local variable.	Andres Freund	2014-12-25
\| \| \| \|	Per buildfarm member locust.
*	Temporarily revert "Move pg_lzcompress.c to src/common."	Tom Lane	2014-12-25
\| \| \| \| \| \| \|	This reverts commit 60838df922345b26a616e49ac9fab808a35d1f85. That change needs a bit more thought to be workable. In view of the potentially machine-dependent stuff that went in today, we need all of the buildfarm to be testing those other changes.
*	Lockless StrategyGetBuffer clock sweep hot path.	Andres Freund	2014-12-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	StrategyGetBuffer() has proven to be a bottleneck in a number of buffer acquisition heavy workloads. To some degree this has already been alleviated by 5d7962c6, but it still can be quite a heavy bottleneck. The problem is that in unfortunate usage patterns a single StrategyGetBuffer() call will have to look at a large number of buffers - in turn making it likely that the process will be put to sleep while still holding the spinlock. Replace most of the usage of the buffer_strategy_lock spinlock for the clock sweep by a atomic nextVictimBuffer variable. That variable, modulo NBuffers, is the current hand of the clock sweep. The buffer clock-sweep then only needs to acquire the spinlock after a wraparound. And even then only in the process that did the wrapping around. That alleviates nearly all the contention on the relevant spinlock, although significant contention on the cacheline can still exist. Reviewed-By: Robert Haas and Amit Kapila Discussion: 20141010160020.GG6670@alap3.anarazel.de, 20141027133218.GA2639@awork2.anarazel.de
*	Improve LWLock scalability.	Andres Freund	2014-12-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old LWLock implementation had the problem that concurrent lock acquisitions required exclusively acquiring a spinlock. Often that could lead to acquirers waiting behind the spinlock, even if the actual LWLock was free. The new implementation doesn't acquire the spinlock when acquiring the lock itself. Instead the new atomic operations are used to atomically manipulate the state. Only the waitqueue, used solely in the slow path, is still protected by the spinlock. Check lwlock.c's header for an explanation about the used algorithm. For some common workloads on larger machines this can yield significant performance improvements. Particularly in read mostly workloads. Reviewed-By: Amit Kapila and Robert Haas Author: Andres Freund Discussion: 20130926225545.GB26663@awork2.anarazel.de
*	Convert the PGPROC->lwWaitLink list into a dlist instead of open coding it.	Andres Freund	2014-12-25
\| \| \| \| \| \| \| \|	Besides being shorter and much easier to read it changes the logic in LWLockRelease() to release all shared lockers when waking up any. This can yield some significant performance improvements - and the fairness isn't really much worse than before, as we always allowed new shared lockers to jump the queue.
*	Add capability to suppress CONTEXT: messages to elog machinery.	Andres Freund	2014-12-25
\| \| \| \| \| \| \|	Hiding context messages usually is not a good idea - except for rather verbose debugging/development utensils like LOG_DEBUG. There the amount of repeated context messages just bloat the log without adding information.
*	Remove duplicate include of slot.h.	Fujii Masao	2014-12-25
\| \| \| \|	Back-patch to 9.4, where this problem was added.
*	Move pg_lzcompress.c to src/common.	Fujii Masao	2014-12-25
\| \| \| \| \| \| \| \| \| \| \| \|	Exposing compression and decompression APIs of pglz makes possible its use by extensions and contrib modules. pglz_decompress contained a call to elog to emit an error message in case of corrupted data. This function is changed to return a status code to let its callers return an error instead. This commit is required for upcoming WAL compression feature so that the WAL reader facility can decompress the WAL data by using pglz_decompress. Michael Paquier
*	Remove unused fields from ReindexStmt.	Fujii Masao	2014-12-24
\| \| \| \| \| \| \|	fe263d1 changed the REINDEX logic so that those fields are not used at all, but forgot to remove them. Sawada Masahiko
*	Suppress MSVC warning in typeStringToTypeName function.	Andres Freund	2014-12-24
\| \| \| \| \| \|	MSVC doesn't realize ereport(ERROR) doesn't return. David Rowley
*	Revert "Use a bitmask to represent role attributes"	Alvaro Herrera	2014-12-23
\| \| \| \| \| \| \| \| \|	This reverts commit 1826987a46d079458007b7b6bbcbbd852353adbb. The overall design was deemed unacceptable, in discussion following the previous commit message; we might find some parts of it still salvageable, but I don't want to be on the hook for fixing it, so let's wait until we have a new patch.
*	Add SQL-callable pg_get_object_address	Alvaro Herrera	2014-12-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows access to get_object_address from SQL, which is useful to obtain OID addressing information from data equivalent to that emitted by the parser. This is necessary infrastructure of a project to let replication systems propagate object dropping events to remote servers, where the schema might be different than the server originating the DROP. This patch also adds support for OBJECT_DEFAULT to get_object_address; that is, it is now possible to refer to a column's default value. Catalog version bumped due to the new function. Reviewed by Stephen Frost, Heikki Linnakangas, Robert Haas, Andres Freund, Abhijit Menon-Sen, Adam Brightwell.
*	Use a bitmask to represent role attributes	Alvaro Herrera	2014-12-23
\| \| \| \| \| \| \| \| \| \| \| \| \|	The previous representation using a boolean column for each attribute would not scale as well as we want to add further attributes. Extra auxilliary functions are added to go along with this change, to make up for the lost convenience of access of the old representation. Catalog version bumped due to change in catalogs and the new functions. Author: Adam Brightwell, minor tweaks by Álvaro Reviewed by: Stephen Frost, Andres Freund, Álvaro Herrera
*	get_object_address: separate domain constraints from table constraints	Alvaro Herrera	2014-12-23
\| \| \| \| \| \| \| \| \| \| \| \|	Apart from enabling comments on domain constraints, this enables a future project to replicate object dropping to remote servers: with the current mechanism there's no way to distinguish between the two types of constraints, so there's no way to know what to drop. Also added support for the domain constraint comments in psql's \dd and pg_dump. Catalog version bumped due to the change in ObjectType enum.
*	Change local_preload_libraries to PGC_USERSET	Peter Eisentraut	2014-12-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows it to be used with ALTER ROLE SET. Although the old setting of PGC_BACKEND prevented changes after session start, after discussion it was more useful to allow ALTER ROLE SET instead and just document that changes during a session have no effect. This is similar to how session_preload_libraries works already. An alternative would be to change things to allow PGC_BACKEND and PGC_SU_BACKEND settings to be changed by ALTER ROLE SET. But that might need further research (e.g., log_connections would probably not work). based on patch by Kyotaro Horiguchi
*	Move rbtree.c from src/backend/utils/misc to src/backend/lib.	Heikki Linnakangas	2014-12-22
\| \| \| \| \|	We have other general-purpose data structures in src/backend/lib, so it seems like a better home for the red-black tree as well.
*	Use a pairing heap for the priority queue in kNN-GiST searches.	Heikki Linnakangas	2014-12-22
\| \| \| \| \| \| \|	This performs slightly better, uses less memory, and needs slightly less code in GiST, than the Red-Black tree previously used. Reviewed by Peter Geoghegan
*	Fix file descriptor leak at end of recovery.	Heikki Linnakangas	2014-12-21
\| \| \| \| \| \| \| \|	XLogFileInit() returns a file descriptor, which needs to be closed. The leak was short-lived, since the startup process exits shortly afterwards, but it was clearly a bug, nevertheless. Per Coverity report.
*	pg_event_trigger_dropped_objects: add behavior flags	Alvaro Herrera	2014-12-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add "normal" and "original" flags as output columns to the pg_event_trigger_dropped_objects() function. With this it's possible to distinguish which objects, among those listed, need to be explicitely referenced when trying to replicate a deletion. This is necessary so that the list of objects can be pruned to the minimum necessary to replicate the DROP command in a remote server that might have slightly different schema (for instance, TOAST tables and constraints with different names and such.) Catalog version bumped due to change of function definition. Reviewed by: Abhijit Menon-Sen, Stephen Frost, Heikki Linnakangas, Robert Haas.
*	Fix timestamp in end-of-recovery WAL records.	Heikki Linnakangas	2014-12-19
\| \| \| \| \| \| \|	We used time(null) to set a TimestampTz field, which gave bogus results. Noticed while looking at pg_xlogdump output. Backpatch to 9.3 and above, where the fast promotion was introduced.
*	Prevent potentially hazardous compiler/cpu reordering during lwlock release.	Andres Freund	2014-12-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In LWLockRelease() (and in 9.4+ LWLockUpdateVar()) we release enqueued waiters using PGSemaphoreUnlock(). As there are other sources of such unlocks backends only wake up if MyProc->lwWaiting is set to false; which is only done in the aforementioned functions. Before this commit there were dangers because the store to lwWaitLink could become visible before the store to lwWaitLink. This could both happen due to compiler reordering (on most compilers) and on some platforms due to the CPU reordering stores. The possible consequence of this is that a backend stops waiting before lwWaitLink is set to NULL. If that backend then tries to acquire another lock and has to wait there the list could become corrupted once the lwWaitLink store is finally performed. Add a write memory barrier to prevent that issue. Unfortunately the barrier support has been only added in 9.2. Given that the issue has not knowingly been observed in praxis it seems sufficient to prohibit compiler reordering using volatile for 9.0 and 9.1. Actual problems due to compiler reordering are more likely anyway. Discussion: 20140210134625.GA15246@awork2.anarazel.de
*	Use %u to print out BlockNumber variables	Alvaro Herrera	2014-12-18
\| \| \| \|	Per Tom Lane
*	Have VACUUM log number of skipped pages due to pins	Alvaro Herrera	2014-12-18
\| \| \| \| \| \|	Author: Jim Nasby, some kibitzing by Heikki Linnankangas. Discussion leading to current behavior and precise wording fueled by thoughts from Robert Haas and Andres Freund.
*	Improve hash_create's API for selecting simple-binary-key hash functions.	Tom Lane	2014-12-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, if you wanted anything besides C-string hash keys, you had to specify a custom hashing function to hash_create(). Nearly all such callers were specifying tag_hash or oid_hash; which is tedious, and rather error-prone, since a caller could easily miss the opportunity to optimize by using hash_uint32 when appropriate. Replace this with a design whereby callers using simple binary-data keys just specify HASH_BLOBS and don't need to mess with specific support functions. hash_create() itself will take care of optimizing when the key size is four bytes. This nets out saving a few hundred bytes of code space, and offers a measurable performance improvement in tidbitmap.c (which was not exploiting the opportunity to use hash_uint32 for its 4-byte keys). There might be some wins elsewhere too, I didn't analyze closely. In future we could look into offering a similar optimized hashing function for 8-byte keys. Under this design that could be done in a centralized and machine-independent fashion, whereas getting it right for keys of platform-dependent sizes would've been notationally painful before. For the moment, the old way still works fine, so as not to break source code compatibility for loadable modules. Eventually we might want to remove tag_hash and friends from the exported API altogether, since there's no real need for them to be explicitly referenced from outside dynahash.c. Teodor Sigaev and Tom Lane
*	Change how first WAL segment on new timeline after promotion is created.	Heikki Linnakangas	2014-12-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two changes: 1. When copying a WAL segment from old timeline to create the first segment on the new timeline, only copy up to the point where the timeline switch happens, and zero-fill the rest. This avoids corner cases where we might think that the copied WAL from the previous timeline belong to the new timeline. 2. If the timeline switch happens at a segment boundary, don't copy the whole old segment to the new timeline. It's pointless, because it's 100% identical to the old segment.
*	Add memory barriers for PgBackendStatus.st_changecount protocol.	Fujii Masao	2014-12-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	st_changecount protocol needs the memory barriers to ensure that the apparent order of execution is as it desires. Otherwise, for example, the CPU might rearrange the code so that st_changecount is incremented twice before the modification on a machine with weak memory ordering. This surprising result can lead to bugs. This commit introduces the macros to load and store st_changecount with the memory barriers. These are called before and after PgBackendStatus entries are modified or copied into private memory, in order to prevent CPU from reordering PgBackendStatus access. Per discussion on pgsql-hackers, we decided not to back-patch this to 9.4 or before until we get an actual bug report about this. Patch by me. Review by Robert Haas.
*	Ensure variables live across calls in generate_series(numeric, numeric).	Fujii Masao	2014-12-18
\| \| \| \| \| \| \| \| \| \| \|	In generate_series_step_numeric(), the variables "start_num" and "stop_num" may be potentially freed until the next call. So they should be put in the location which can survive across calls. But previously they were not, and which could cause incorrect behavior of generate_series(numeric, numeric). This commit fixes this problem by copying them on multi_call_memory_ctx. Andrew Gierth
*	Remove odd blank line in comment.	Fujii Masao	2014-12-18
\| \| \| \|	Etsuro Fujita
*	Fix (re-)starting from a basebackup taken off a standby after a failure.	Andres Freund	2014-12-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When starting up from a basebackup taken off a standby extra logic has to be applied to compute the point where the data directory is consistent. Normal base backups use a WAL record for that purpose, but that isn't possible on a standby. That logic had a error check ensuring that the cluster's control file indicates being in recovery. Unfortunately that check was too strict, disregarding the fact that the control file could also indicate that the cluster was shut down while in recovery. That's possible when the a cluster starting from a basebackup is shut down before the backup label has been removed. When everything goes well that's a short window, but when either restore_command or primary_conninfo isn't configured correctly the window can get much wider. That's because inbetween reading and unlinking the label we restore the last checkpoint from WAL which can need additional WAL. To fix simply also allow starting when the control file indicates "shutdown in recovery". There's nicer fixes imaginable, but they'd be more invasive. Backpatch to 9.2 where support for taking basebackups from standbys was added.
*	Allow CHECK constraints to be placed on foreign tables.	Tom Lane	2014-12-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As with NOT NULL constraints, we consider that such constraints are merely reports of constraints that are being enforced by the remote server (or other underlying storage mechanism). Their only real use is to allow planner optimizations, for example in constraint-exclusion checks. Thus, the code changes here amount to little more than removal of the error that was formerly thrown for applying CHECK to a foreign table. (In passing, do a bit of cleanup of the ALTER FOREIGN TABLE reference page, which had accumulated some weird decisions about ordering etc.) Shigeru Hanada and Etsuro Fujita, reviewed by Kyotaro Horiguchi and Ashutosh Bapat.
*	Fix poorly worded error message.	Tom Lane	2014-12-17
\| \| \| \|	Adam Brightwell, per report from Martín Marqués.
*	Fix off-by-one loop count in MapArrayTypeName, and get rid of static array.	Tom Lane	2014-12-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MapArrayTypeName would copy up to NAMEDATALEN-1 bytes of the base type name, which of course is wrong: after prepending '_' there is only room for NAMEDATALEN-2 bytes. Aside from being the wrong result, this case would lead to overrunning the statically allocated work buffer. This would be a security bug if the function were ever used outside bootstrap mode, but it isn't, at least not in any currently supported branches. Aside from fixing the off-by-one loop logic, this patch gets rid of the static work buffer by having MapArrayTypeName pstrdup its result; the sole caller was already doing that, so this just requires moving the pstrdup call. This saves a few bytes but mainly it makes the API a lot cleaner. Back-patch on the off chance that there is some third-party code using MapArrayTypeName with less-secure input. Pushing pstrdup into the function should not cause any serious problems for such hypothetical code; at worst there might be a short term memory leak. Per Coverity scanning.
*	Fix some jsonb issues found by Coverity in recent commits.	Andrew Dunstan	2014-12-16
\| \| \| \| \| \| \| \| \| \| \| \|	Mostly these issues concern the non-use of function results. These have been changed to use (void) pushJsonbValue(...) instead of assigning the result to a variable that gets overwritten before it is used. There is a larger issue that we should possibly examine the API for pushJsonbValue(), so that instead of returning a value it modifies a state argument. The current idiom is rather clumsy. However, changing that requires quite a bit more work, so this change should do for the moment.
*	Misc comment typo fixes.	Heikki Linnakangas	2014-12-16
\| \| \| \| \|	Backpatch the applicable parts, just to make backpatching future patches easier.
*	Fix point <-> polygon code for zero-distance case.	Tom Lane	2014-12-15
\| \| \| \| \|	"PG_RETURN_FLOAT8(x)" is not "return x", except perhaps by accident on some platforms.
*	Add point <-> polygon distance operator.	Heikki Linnakangas	2014-12-15
\| \| \| \|	Alexander Korotkov, reviewed by Emre Hasegeli.
*	Translation updates	Peter Eisentraut	2014-12-15
\|
*	Add CINE option for CREATE TABLE AS and CREATE MATERIALIZED VIEW	Andrew Dunstan	2014-12-13
\| \| \| \|	Fabrízio de Royes Mello reviewed by Rushabh Lathia.
*	Repair corner-case bug in array version of percentile_cont().	Tom Lane	2014-12-13
\| \| \| \| \| \| \| \|	The code for advancing through the input rows overlooked the case that we might already be past the first row of the row pair now being considered, in case the previous percentile also fell between the same two input rows. Report and patch by Andrew Gierth; logic rewritten a bit for clarity by me.
*	Add several generator functions for jsonb that exist for json.	Andrew Dunstan	2014-12-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The functions are: to_jsonb() jsonb_object() jsonb_build_object() jsonb_build_array() jsonb_agg() jsonb_object_agg() Also along the way some better logic is implemented in json_categorize_type() to match that in the newly implemented jsonb_categorize_type(). Andrew Dunstan, reviewed by Pavel Stehule and Alvaro Herrera.
*	Add json_strip_nulls and jsonb_strip_nulls functions.	Andrew Dunstan	2014-12-12
\| \| \| \| \| \| \| \|	The functions remove object fields, including in nested objects, that have null as a value. In certain cases this can lead to considerably smaller datums, with no loss of semantic information. Andrew Dunstan, reviewed by Pavel Stehule.
*	Put the logic to decide which synchronous standby is active into a function.	Heikki Linnakangas	2014-12-12
\| \| \| \| \| \|	This avoids duplicating the code. Michael Paquier, reviewed by Simon Riggs and me