postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Make CREATE/ALTER FUNCTION support NOT LEAKPROOF.	Robert Haas	2012-02-15
\| \| \| \|	Because it isn't good to be able to turn things on, and not off again.
*	Preserve column names in the execution-time tupledesc for a RowExpr.	Tom Lane	2012-02-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The hstore and json datatypes both have record-conversion functions that pay attention to column names in the composite values they're handed. We used to not worry about inserting correct field names into tuple descriptors generated at runtime, but given these examples it seems useful to do so. Observe the nicer-looking results in the regression tests whose results changed. catversion bump because there is a subtle change in requirements for stored rule parsetrees: RowExprs from ROW() constructs now have to include field names. Andrew Dunstan and Tom Lane
*	Allow LEAKPROOF functions for better performance of security views.	Robert Haas	2012-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't normally allow quals to be pushed down into a view created with the security_barrier option, but functions without side effects are an exception: they're OK. This allows much better performance in common cases, such as when using an equality operator (that might even be indexable). There is an outstanding issue here with the CREATE FUNCTION / ALTER FUNCTION syntax: there's no way to use ALTER FUNCTION to unset the leakproof flag. But I'm committing this as-is so that it doesn't have to be rebased again; we can fix up the grammar in a future commit. KaiGai Kohei, with some wordsmithing by me.
*	Fix heap_multi_insert to set t_self field in the caller's tuples.	Heikki Linnakangas	2012-02-13
\| \| \| \| \| \| \| \|	If tuples were toasted, heap_multi_insert didn't update the ctid on the original tuples. This caused a failure if there was an after trigger (including a foreign key), on the table, and a tuple got toasted. Per off-list report and test case from Ted Phelps
*	Add a comment to AdjustIntervalForTypmod to reduce chance of future bugs.	Robert Haas	2012-02-09
\| \| \| \| \| \|	It's not entirely evident how the logic here relates to the interval_transform function, so let's clue people in that they need to check that if the rules change.
*	Improve interval_transform function to detect a few more cases.	Robert Haas	2012-02-09
\| \| \| \|	Noah Misch, per a review comment from me.
*	Add new keywords SNAPSHOT and TYPES to the keyword list in gram.y	Heikki Linnakangas	2012-02-09
\| \| \| \| \| \| \| \|	These were added to kwlist.h as unreserved keywords in separate patches, but authors forgot to add them to the corresponding list in gram.y. Because of that, even though they were supposed to be unreserved keywords, they could not be used as identifiers. src/tools/check_keywords.pl is your friend.
*	Throw error sooner for unlogged GiST indexes.	Tom Lane	2012-02-08
\| \| \| \| \| \|	Throwing an error only after we've built the main index fork is pretty unfriendly when the table already contains data. Per gripe from Jay Levitt.
*	Check misplaced window functions before checking aggregate/group by sanity.	Tom Lane	2012-02-08
\| \| \| \| \| \| \| \| \| \| \|	If somebody puts a window function in WHERE, we should complain about that in so many words. The previous coding tended to complain about the window function's arguments instead, which is likely to be misleading to users who are unclear on the semantics of window functions; as seen for example in bug #6440 from Matyas Novak. Just another example of how "add new code at the end" is frequently a bad heuristic.
*	Add transform functions for various temporal typmod coercisions.	Robert Haas	2012-02-08
\| \| \| \| \| \|	This enables ALTER TABLE to skip table and index rebuilds in some cases. Noah Misch, with trivial changes by me.
*	Rename LWLockWaitUntilFree to LWLockAcquireOrWait.	Heikki Linnakangas	2012-02-08
\| \| \| \| \|	LWLockAcquireOrWait makes it more clear that the lock is acquired if it's free.
*	Fix typos pointed out by Noah Misch.	Robert Haas	2012-02-07
\|
*	Add a transform function for varbit typmod coercisions.	Robert Haas	2012-02-07
\| \| \| \| \| \| \| \|	This enables ALTER TABLE to skip table and index rebuilds when the new type is unconstraint varbit, or when the allowable number of bits is not decreasing. Noah Misch, with review and a fix for an OID collision by me.
*	Add a transform function for numeric typmod coercisions.	Robert Haas	2012-02-07
\| \| \| \| \| \| \| \| \|	This enables ALTER TABLE to skip table and index rebuilds when a column is changed to an unconstrained numeric, or when the scale is unchanged and the precision does not decrease. Noah Misch, with a few stylistic changes and a fix for an OID collision by me.
*	Add TIMING option to EXPLAIN, to allow eliminating of timing overhead.	Robert Haas	2012-02-07
\| \| \| \| \| \| \| \|	Sometimes it may be useful to get actual row counts out of EXPLAIN (ANALYZE) without paying the cost of timing every node entry/exit. With this patch, you can say EXPLAIN (ANALYZE, TIMING OFF) to get that. Tomas Vondra, reviewed by Eric Theise, with minor doc changes by me.
*	When building with LWLOCK_STATS, initialize the stats in LWLockWaitUntilFree.	Heikki Linnakangas	2012-02-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If LWLockWaitUntilFree was called before the first LWLockAcquire call, you would either crash because of access to uninitialized array or account the acquisition incorrectly. LWLockConditionalAcquire doesn't have this problem because it doesn't update the lwlock stats. In practice, this never happens because there is no codepath where you would call LWLockWaitUntilfree before LWLockAcquire after a new process is launched. But that's just accidental, there's no guarantee that that's always going to be true in the future. Spotted by Jeff Janes.
*	Fix postmaster to attempt restart after a hot-standby crash.	Tom Lane	2012-02-06
\| \| \| \| \| \| \| \| \| \| \| \|	The postmaster was coded to treat any unexpected exit of the startup process (i.e., the WAL replay process) as a catastrophic crash, and not try to restart it. This was OK so long as the startup process could not have any sibling postmaster children. However, if a hot-standby backend crashes, we SIGQUIT the startup process along with everything else, and the resulting exit is hardly "unexpected". Treating it as such meant we failed to restart a standby server after any child crash at all, not only a crash of the WAL replay process as intended. Adjust that. Back-patch to 9.0 where hot standby was introduced.
*	Avoid throwing ERROR during WAL replay of DROP TABLESPACE.	Tom Lane	2012-02-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless removal of the tablespace's directories succeeds, that does not guarantee that the same operation will succeed during WAL replay. Foreseeable reasons for it to fail include temp files created in the tablespace by Hot Standby backends, wrong directory permissions on a standby server, etc etc. The original coding threw ERROR if replay failed to remove the directories, but that is a serious overreaction. Throwing an error aborts recovery, and worse means that manual intervention will be needed to get the database to start again, since otherwise the same error will recur on subsequent attempts to replay the same WAL record. And the consequence of failing to remove the directories is only that some probably-small amount of disk space is wasted, so it hardly seems justified to throw an error. Accordingly, arrange to report such failures as LOG messages and keep going when a failure occurs during replay. Back-patch to 9.0 where Hot Standby was introduced. In principle such problems can occur in earlier releases, but Hot Standby increases the odds of trouble significantly. Given the lack of field reports of such issues, I'm satisfied with patching back as far as the patch applies easily.
*	Add locking around WAL-replay modification of shared-memory variables.	Tom Lane	2012-02-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Originally, most of this code assumed that no Postgres backends could be running concurrently with it, and so no locking could be needed. That assumption fails in Hot Standby. While it's still true that Hot Standby backends should never change values like nextXid, they can examine them, and consistency is important in some cases such as when computing a snapshot. Therefore, prudence requires that WAL replay code obtain the relevant locks when modifying such variables, even though it can examine them without taking a lock. We were following that coding rule in some places but not all. This commit applies the coding rule uniformly to all updates of ShmemVariableCache and MultiXactState fields; a search of the replay routines did not find any other cases that seemed to be at risk. In addition, this commit fixes a longstanding thinko in replay of NEXTOID and checkpoint records: we tried to advance nextOid only if it was behind the value in the WAL record, but the comparison would draw the wrong conclusion if OID wraparound had occurred since the previous value. Better to just unconditionally assign the new value, since OID assignment shouldn't be happening during replay anyway. The additional locking seems to be more in the nature of future-proofing than fixing any live bug, so I am not going to back-patch it. The NEXTOID fix will be back-patched separately.
*	Fix transient clobbering of shared buffers during WAL replay.	Tom Lane	2012-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RestoreBkpBlocks was in the habit of zeroing and refilling the target buffer; which was perfectly safe when the code was written, but is unsafe during Hot Standby operation. The reason is that we have coding rules that allow backends to continue accessing a tuple in a heap relation while holding only a pin on its buffer. Such a backend could see transiently zeroed data, if WAL replay had occasion to change other data on the page. This has been shown to be the cause of bug #6425 from Duncan Rance (who deserves kudos for developing a sufficiently-reproducible test case) as well as Bridget Frey's re-report of bug #6200. It most likely explains the original report as well, though we don't yet have confirmation of that. To fix, change the code so that only bytes that are supposed to change will change, even transiently. This actually saves cycles in RestoreBkpBlocks, since it's not writing the same bytes twice. Also fix seq_redo, which has the same disease, though it has to work a bit harder to meet the requirement. So far as I can tell, no other WAL replay routines have this type of bug. In particular, the index-related replay routines, which would certainly be broken if they had to meet the same standard, are not at risk because we do not have coding rules that allow access to an index page when not holding a buffer lock on it. Back-patch to 9.0 where Hot Standby was added.
*	Improve comment.	Tom Lane	2012-02-04
\|
*	Add missing Assert and fix inaccurate elog message in standby_redo().	Tom Lane	2012-02-04
\| \| \| \| \| \|	All other WAL redo routines either call RestoreBkpBlocks() or Assert that they haven't been passed any backup blocks. Make this one do likewise. Also, fix incorrect routine name in its failure message.
*	Allow SQL-language functions to reference parameters by name.	Tom Lane	2012-02-04
\| \| \| \|	Matthew Draper, reviewed by Hitoshi Harada
*	Add array_to_json and row_to_json functions.	Andrew Dunstan	2012-02-03
\| \| \| \| \| \| \|	Also move the escape_json function from explain.c to json.c where it seems to belong. Andrew Dunstan, Reviewd by Abhijit Menon-Sen.
*	Allow spgist's text_ops to handle pattern-matching operators.	Robert Haas	2012-02-02
\| \| \| \| \| \| \|	This was presumably intended to work this way all along, but a few key bits of indxpath.c didn't get the memo. Robert Haas and Tom Lane
*	Avoid re-checking for visibility map extension too frequently.	Robert Haas	2012-02-01
\| \| \| \| \| \| \| \| \| \|	When testing bits (but not when setting or clearing them), we now won't check whether the map has been extended. This significantly improves performance in the case where the visibility map doesn't exist yet, by avoiding an extra system call per tuple. To make sure backends notice eventually, send an smgr inval on VM extension. Dean Rasheed, with minor modifications by me.
*	initdb: Add options --auth-local and --auth-host	Peter Eisentraut	2012-02-01
\| \| \| \|	reviewed by Robert Haas and Pavel Stehule
*	Try to be more consistent about accepting denormalized float8 numbers.	Tom Lane	2012-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \|	On some platforms, strtod() reports ERANGE for a denormalized value (ie, one that can be represented as distinct from zero, but is too small to have full precision). On others, it doesn't. It seems better to try to accept these values consistently, so add a test to see if the result value indicates a true out-of-range condition. This should be okay per Single Unix Spec. On machines where the underlying math isn't IEEE standard, the behavior for such small numbers may not be very consistent, but then it wouldn't be anyway. Marti Raudsepp, after a proposal by Jeroen Vermeulen
*	Built-in JSON data type.	Robert Haas	2012-01-31
\| \| \| \| \| \| \| \| \| \|	Like the XML data type, we simply store JSON data as text, after checking that it is valid. More complex operations such as canonicalization and comparison may come later, but this is enough for not. There are a few open issues here, such as whether we should attempt to detect UTF-8 surrogate pairs represented as \uXXXX\uYYYY, but this gets the basic framework in place.
*	Fix bug in the new wait-until-lwlock-is-free mechanism.	Heikki Linnakangas	2012-01-31
\| \| \| \| \| \|	If there was a wait-until-free process in the head of the wait queue, followed by an exclusive locker, the exclusive locker was not be woken up as it should.
*	Add sequence USAGE privileges to information schema	Peter Eisentraut	2012-01-30
\| \| \| \| \| \| \|	The sequence USAGE privilege is sufficiently similar to the SQL standard that it seems reasonable to show in the information schema. Also add some compatibility notes about it on the GRANT reference page.
*	Make group commit more effective.	Heikki Linnakangas	2012-01-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a backend needs to flush the WAL, and someone else is already flushing the WAL, wait until it releases the WALInsertLock and check if we still need to do the flush or if the other backend already did the work for us, before acquiring WALInsertLock. This helps group commit, because when the WAL flush finishes, all the backends that were waiting for it can be woken up in one go, and the can all concurrently observe that they're done, rather than waking them up one by one in a cascading fashion. This is based on a new LWLock function, LWLockWaitUntilFree(), which has peculiar semantics. If the lock is immediately free, it grabs the lock and returns true. If it's not free, it waits until it is released, but then returns false without grabbing the lock. This is used in XLogFlush(), so that when the lock is acquired, the backend flushes the WAL, but if it's not, the backend first checks the current flush location before retrying. Original patch and benchmarking by Peter Geoghegan and Simon Riggs, although this patch as committed ended up being very different from that.
*	Minor bug fix and cleanup from self-review of sync rep queues patch.	Simon Riggs	2012-01-30
\|
*	Various minor comments changes from bgwriter to checkpointer.	Simon Riggs	2012-01-30
\|
*	Accept a non-existent value in "ALTER USER/DATABASE SET ..." command.	Heikki Linnakangas	2012-01-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When default_text_search_config, default_tablespace, or temp_tablespaces setting is set per-user or per-database, with an "ALTER USER/DATABASE SET ..." statement, don't throw an error if the text search configuration or tablespace does not exist. In case of text search configuration, even if it doesn't exist in the current database, it might exist in another database, where the setting is intended to have its effect. This behavior is now the same as search_path's. Tablespaces are cluster-wide, so the same argument doesn't hold for tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER SET ..." statements before the "CREATE TABLESPACE" statements. Arguably that's pg_dumpall's fault - it should dump the statements in such an order that the tablespace is created first and then the "ALTER USER SET default_tablespace ..." statements after that - but it seems better to be consistent with search_path and default_text_search_config anyway. Besides, you could still create a dump that throws an error, by creating the tablespace, running "ALTER USER SET default_tablespace", then dropping the tablespace and running pg_dumpall on that. Backpatch to all supported versions.
*	Assorted comment fixes, mostly just typos, but some obsolete statements.	Tom Lane	2012-01-29
\| \| \| \|	YAMAMOTO Takashi
*	Tweak index costing for problems with partial indexes.	Tom Lane	2012-01-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	btcostestimate() makes an estimate of the number of index tuples that will be visited based on knowledge of which index clauses can actually bound the scan within nbtree. However, it forgot to account for partial indexes in this calculation, with the result that the cost of the index scan could be significantly overestimated for a partial index. Fix that by merging the predicate with the abbreviated indexclause list, in the same way as we do with the full list to estimate how many heap tuples will be visited. Also, slightly increase the "fudge factor" that's meant to give preference to smaller indexes over larger ones. While this is applied to all indexes, it's most important for partial indexes since it can be the only factor that makes a partial index look cheaper than a similar full index. Experimentation shows that the existing value is so small as to easily get swamped by noise such as page-boundary-roundoff behavior. I'm tempted to kick it up more than this, but will refrain for now. Per report from Ruben Blanco. These are long-standing issues, but given the lack of prior complaints I'm not going to risk changing planner behavior in back branches by back-patching.
*	Fix pushing of index-expression qualifications through UNION ALL.	Tom Lane	2012-01-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 57664ed25e5dea117158a2e663c29e60b3546e1c, I made the planner wrap non-simple-variable outputs of appendrel children (IOW, child SELECTs of UNION ALL subqueries) inside PlaceHolderVars, in order to solve some issues with EquivalenceClass processing. However, this means that any upper-level WHERE clauses mentioning such outputs will now contain PlaceHolderVars after they're pushed down into the appendrel child, and that prevents indxpath.c from recognizing that they could be matched to index expressions. To fix, add explicit stripping of PlaceHolderVars from index operands, same as we have long done for RelabelType nodes. Add a regression test covering both this and the plain-UNION case (which is a totally different code path, but should also be able to do it). Per bug #6416 from Matteo Beccati. Back-patch to 9.1, same as the previous change.
*	Fix handling of init_plans list in inheritance_planner().	Tom Lane	2012-01-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Formerly we passed an empty list to each per-child-table invocation of grouping_planner, and then merged the results into the global list. However, that fails if there's a CTE attached to the statement, because create_ctescan_plan uses the list to find the plan referenced by a CTE reference; so it was unable to find any CTEs attached to the outer UPDATE or DELETE. But there's no real reason not to use the same list throughout the process, and doing so is simpler and faster anyway. Per report from Josh Berkus of "could not find plan for CTE" failures. Back-patch to 9.1 where we added support for WITH attached to UPDATE or DELETE. Add some regression test cases, too.
*	Fix handling of data-modifying CTE subplans in EvalPlanQual.	Tom Lane	2012-01-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can't just skip initializing such subplans, because the referencing CTE node will expect to find the subplan available when it initializes. That in turn means that ExecInitModifyTable must allow the case (which actually it needed to do anyway, since there's no guarantee that ModifyTable is exactly at the top of the CTE plan tree). So move the complaint about not being allowed in EvalPlanQual mode to execution instead of initialization. Testing turned up yet another problem, which is that we'd try to re-initialize the result relation's index list, leading to leaks and dangling pointers. Per report from Phil Sorber. Back-patch to 9.1 where data-modifying CTEs were introduced.
*	Prevent logging "failed to stat file: success" for temp files	Magnus Hagander	2012-01-28
\| \| \| \| \| \| \|	This was broken in commit bc3347484a7bf9eddb98e4352d84599cae9a31c6, the addition of statistics counters for temp files. Reported by Thom Brown
*	Undo 8.4-era lobotomization of subquery pullup rules.	Tom Lane	2012-01-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After the planner was fixed to convert some IN/EXISTS subqueries into semijoins or antijoins, we had to prevent it from doing that in some cases where the plans risked getting much worse. The reason the plans got worse was that in the unoptimized implementation, subqueries could reference parameters from the outer query at any join level, and so full table scans could be avoided even if they were one or more levels of join below where the semi/anti join would be. Now that we have sufficient mechanism in the planner to handle such cases properly, it should no longer be necessary to play dumb here. This reverts commits 07b9936a0f10d746e5076239813a5e938f2f16be and cd1f0d04bf06938c0ee5728fc8424d62bcf2eef3. The latter was a stopgap fix that wasn't really sufficiently analyzed at the time. Rather than just restricting ourselves to cases where the new join can be stacked on the right-hand input, we should also consider whether it can be stacked on the left-hand input.
*	Use parameterized paths to generate inner indexscans more flexibly.	Tom Lane	2012-01-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the planner so that it can generate nestloop-with- inner-indexscan plans even with one or more levels of joining between the indexscan and the nestloop join that is supplying the parameter. The executor was fixed to handle such cases some time ago, but the planner was not ready. This should improve our plans in many situations where join ordering restrictions formerly forced complete table scans. There is probably a fair amount of tuning work yet to be done, because of various heuristics that have been added to limit the number of parameterized paths considered. However, we are not going to find out what needs to be adjusted until the code gets some real-world use, so it's time to get it in there where it can be tested easily. Note API change for index AM amcostestimate functions. I'm not aware of any non-core index AMs, but if there are any, they will need minor adjustments.
*	Show default privileges in information schema	Peter Eisentraut	2012-01-27
\| \| \| \| \| \| \| \| \| \| \| \|	Hitherto, the information schema only showed explicitly granted privileges that were visible in the *acl catalog columns. If no privileges had been granted, the implicit privileges were not shown. To fix that, add an SQL-accessible version of the acldefault() function, and use that inside the aclexplode() calls to substitute the catalog-specific default privilege set for null values. reviewed by Abhijit Menon-Sen
*	Revert unfortunate whitespace change	Peter Eisentraut	2012-01-27
\| \| \| \| \| \| \|	In e5e2fc842c418432756d8b5825ff107c6c5fc4c3, blank lines were removed after a comment block, which now looks as though the comment refers to the immediately following code, but it actually refers to the preceding code. So put the blank lines back.
*	Disallow ALTER DOMAIN on non-domain type everywhere	Peter Eisentraut	2012-01-27
\| \| \| \| \| \|	This has been the behavior already in most cases, but through omission, ALTER DOMAIN / OWNER TO and ALTER DOMAIN / SET SCHEMA would silently work on non-domain types as well.
*	Hide most variable-length fields from Form_pg_* structs	Peter Eisentraut	2012-01-27
\| \| \| \| \| \| \| \| \| \| \| \| \|	Those fields only appear in the structs so that genbki.pl can create the BKI bootstrap files for the catalogs. But they are not actually usable from C. So hiding them can prevent coding mistakes, saves stack space, and can help the compiler. In certain catalogs, the first variable-length field has been kept visible after manual inspection. These exceptions are noted in C comments. reviewed by Tom Lane
*	Do not access indclass through Form_pg_index	Peter Eisentraut	2012-01-27
\| \| \| \| \| \| \| \| \| \| \| \|	Normally, accessing variable-length members of catalog structures past the first one doesn't work at all. Here, it happened to work because indnatts was checked to be 1, and so the defined FormData_pg_index layout, using int2vector[1] and oidvector[1] for variable-length arrays, happened to match the actual memory layout. But it's a very fragile assumption, and it's not in a performance-critical path, so code it properly using heap_getattr() instead. bug analysis by Tom Lane
*	Initialize the new bgwriterLatch field properly.	Heikki Linnakangas	2012-01-27
\| \| \| \|	Peter Geoghegan
*	Adjust tuplesort.c based on the fact that we never use the OS's qsort().	Robert Haas	2012-01-26
\| \| \| \| \| \| \| \| \| \|	Our own qsort_arg() implementation doesn't have the defect previously observed to affect only QNX 4, so it seems sufficiently to assert that it isn't broken rather than retesting. Also, update a few comments to clarify why it's valuable to retain a tie-break rule based on CTID during index builds. Peter Geoghegan, with slight tweaks by me.