postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Support varlena fields with single-byte headers and unaligned storage.	Tom Lane	2007-04-06
\| \| \| \| \| \| \| \| \|	This commit breaks any code that assumes that the mere act of forming a tuple (without writing it to disk) does not "toast" any fields. While all available regression tests pass, I'm not totally sure that we've fixed every nook and cranny, especially in contrib. Greg Stark with some help from Tom Lane
*	Remove the CheckpointStartLock in favor of having backends show whether they	Tom Lane	2007-04-03
\| \| \| \| \| \| \| \| \| \|	are in their commit critical sections via flags in the ProcArray. Checkpoint can watch the ProcArray to determine when it's safe to proceed. This is a considerably better solution to the original problem of race conditions between checkpoint and transaction commit: it speeds up commit, since there's one less lock to fool with, and it prevents the problem of checkpoint being delayed indefinitely when there's a constant flow of commits. Heikki, with some kibitzing from Tom.
*	Decouple the values of TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE.	Tom Lane	2007-04-03
\| \| \| \| \| \| \| \| \| \| \|	Add the latter to the values checked in pg_control, since it can't be changed without invalidating toast table content. This commit in itself shouldn't change any behavior, but it lays some necessary groundwork for experimentation with these toast-control numbers. Note: while TOAST_TUPLE_THRESHOLD can now be changed without initdb, some thought still needs to be given to needs_toast_table() in toasting.c before unleashing random changes.
*	Support enum data types. Along the way, use macros for the values of	Tom Lane	2007-04-02
\| \| \| \| \|	pg_type.typtype whereever practical. Tom Dunstan, with some kibitzing from Tom Lane.
*	Fix oversight in coding of _bt_start_vacuum: we can't assume that the LWLock	Tom Lane	2007-03-30
\| \| \| \| \| \| \|	will be released by transaction abort before _bt_end_vacuum gets called. If either of these "can't happen" errors actually happened, we'd freeze up trying to acquire an already-held lock. Latest word is that this does not explain Martin Pitt's trouble report, but it still looks like a bug.
*	Teach CLUSTER to skip writing WAL if not needed (ie, not using archiving)	Tom Lane	2007-03-29
\| \| \| \| \|	--- Simon. Also, code review and cleanup for the previous COPY-no-WAL patches --- Tom.
*	Clean up the representation of special snapshots by including a "method	Tom Lane	2007-03-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pointer" in every Snapshot struct. This allows removal of the case-by-case tests in HeapTupleSatisfiesVisibility, which should make it a bit faster (I didn't try any performance tests though). More importantly, we are no longer violating portable C practices by assuming that small integers are distinct from all pointer values, and HeapTupleSatisfiesDirty no longer has a non-reentrant API involving side-effects on a global variable. There were a couple of places calling HeapTupleSatisfiesXXX routines directly rather than through the HeapTupleSatisfiesVisibility macro. Since these places had to be changed anyway, I chose to make them go through the macro for uniformity. Along the way I renamed HeapTupleSatisfiesSnapshot to HeapTupleSatisfiesMVCC to emphasize that it's only used with MVCC-type snapshots. I was sorely tempted to rename HeapTupleSatisfiesVisibility to HeapTupleSatisfiesSnapshot, but forebore for the moment to avoid confusion and reduce the likelihood that this patch breaks some of the pending patches. Might want to reconsider doing that later.
*	Arrange for PreventTransactionChain to reject commands submitted as part	Tom Lane	2007-03-22
\| \| \| \| \| \| \|	of a multi-statement simple-Query message. This bug goes all the way back, but unfortunately is not nearly so easy to fix in existing releases; it is only the recent ProcessUtility API change that makes it fixable in HEAD. Per report from William Garrison.
*	Reverted waiting for further fixes:	Peter Eisentraut	2007-03-13
\| \| \| \| \| \| \|	Make configuration parameters fall back to their default values when they are removed from the configuration file. Joachim Wieland
*	First phase of plan-invalidation project: create a plan cache management	Tom Lane	2007-03-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	module and teach PREPARE and protocol-level prepared statements to use it. In service of this, rearrange utility-statement processing so that parse analysis does not assume table schemas can't change before execution for utility statements (necessary because we don't attempt to re-acquire locks for utility statements when reusing a stored plan). This requires some refactoring of the ProcessUtility API, but it ends up cleaner anyway, for instance we can get rid of the QueryContext global. Still to do: fix up SPI and related code to use the plan cache; I'm tempted to try to make SQL functions use it too. Also, there are at least some aspects of system state that we want to ensure remain the same during a replan as in the original processing; search_path certainly ought to behave that way for instance, and perhaps there are others.
*	Make configuration parameters fall back to their default values when they	Peter Eisentraut	2007-03-12
\| \| \| \| \| \|	are removed from the configuration file. Joachim Wieland
*	Fix a typo in a comment. Heikki Linnakangas.	Neil Conway	2007-03-05
\|
*	Split _bt_insertonpg to two functions.	Bruce Momjian	2007-03-03
\| \| \| \|	Heikki Linnakangas
*	Remove undo information from pg_controldata --- never used.	Bruce Momjian	2007-03-03
\| \| \| \|	Florian G. Pflug
*	Replace direct assignments to VARATT_SIZEP(x) with SET_VARSIZE(x, len).	Tom Lane	2007-02-27
\| \| \| \| \| \| \| \| \| \| \|	Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with VARSIZE and VARDATA, and as a consequence almost no code was using the longer names. Rename the length fields of struct varlena and various derived structures to catch anyplace that was accessing them directly; and clean up various places so caught. In itself this patch doesn't change any behavior at all, but it is necessary infrastructure if we hope to play any games with the representation of varlena headers. Greg Stark and Tom Lane
*	btree source code cleanups:	Bruce Momjian	2007-02-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I refactored findsplitloc and checksplitloc so that the division of labor is more clear IMO. I pushed all the space calculation inside the loop to checksplitloc. I also fixed the off by 4 in free space calculation caused by PageGetFreeSpace subtracting sizeof(ItemIdData), even though it was harmless, because it was distracting and I felt it might come back to bite us in the future if we change the page layout or alignments. There's now a new function PageGetExactFreeSpace that doesn't do the subtraction. findsplitloc now tries the "just the new item to right page" split as well. If people don't like the refactoring, I can write a patch to just add that. Heikki Linnakangas
*	Restructure autovacuum in two processes: a dummy process, which runs	Alvaro Herrera	2007-02-15
\| \| \| \| \| \| \| \| \|	continuously, and requests vacuum runs of "autovacuum workers" to postmaster. The workers do the actual vacuum work. This allows for future improvements, like allowing multiple autovacuum jobs running in parallel. For now, the code keeps the original behavior of having a single autovac process at any time by sleeping until the previous worker has finished.
*	Move fsync method macro defines into /include/access/xlogdefs.h so they	Bruce Momjian	2007-02-14
\| \| \| \|	can be used by src/tools/fsync/test_fsync.c.
*	Disallow committing a prepared transaction unless we are in the same database	Tom Lane	2007-02-13
\| \| \| \| \|	it was executed in. Someday it might be nice to allow cross-DB commits, but work would be needed in NOTIFY and perhaps other places. Per Heikki.
*	Replace useless uses of := by = in makefiles.	Peter Eisentraut	2007-02-09
\|
*	Combine cmin and cmax fields of HeapTupleHeaders into a single field, by	Tom Lane	2007-02-09
\| \| \| \| \| \| \| \| \| \|	keeping private state in each backend that has inserted and deleted the same tuple during its current top-level transaction. This is sufficient since there is no need to be able to determine the cmin/cmax from any other transaction. This gets us back down to 23-byte headers, removing a penalty paid in 8.0 to support subtransactions. Patch by Heikki Linnakangas, with minor revisions by moi, following a design hashed out awhile back on the pghackers list.
*	Fix reference-after-free in the new btree page split code, as reported by	Alvaro Herrera	2007-02-08
\| \| \| \| \| \|	the buildfarm via Stefan Kaltenbrunner. Patch from Heikki Linnakangas.
*	Normalize fgets() calls to use sizeof() for calculating the buffer size	Peter Eisentraut	2007-02-08
\| \| \| \| \| \| \|	where possible, and fix some sites that apparently thought that fgets() will overwrite the buffer by one byte. Also add some strlcpy() to eliminate some weird memory handling.
*	Reduce WAL activity for page splits:	Bruce Momjian	2007-02-08
\| \| \| \| \| \| \| \| \|	> Currently, an index split writes all the data on the split page to > WAL. That's a lot of WAL traffic. The tuples that are copied to the > right page need to be WAL logged, but the tuples that stay on the > original page don't. Heikki Linnakangas
*	Add a function pg_stat_clear_snapshot() that discards any statistics snapshot	Tom Lane	2007-02-07
\| \| \| \| \| \| \| \| \| \| \| \|	already collected in the current transaction; this allows plpgsql functions to watch for stats updates even though they are confined to a single transaction. Use this instead of the previous kluge involving pg_stat_file() to wait for the stats collector to update in the stats regression test. Internally, decouple storage of stats snapshots from transaction boundaries; they'll now stick around until someone calls pgstat_clear_snapshot --- which xact.c still does at transaction end, to maintain the previous behavior. This makes the logic a lot cleaner, at the price of a couple dozen cycles per transaction exit.
*	Remove the xlog-centric "database system is ready" message and replace it with	Tom Lane	2007-02-07
\| \| \| \| \| \|	"database system is ready to accept connections", which is issued by the postmaster when it really is ready to accept connections. Per proposal from Markus Schiltknecht and subsequent discussion.
*	Remove some dead code, per Heikki.	Tom Lane	2007-02-06
\|
*	Rename MaxTupleSize to MaxHeapTupleSize to clarify that it's not meant to	Tom Lane	2007-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	describe the maximum size of index tuples (which is typically AM-dependent anyway); and consequently remove the bogus deduction for "special space" that was built into it. Adjust TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE to avoid wasting two bytes per toast chunk, and to ensure that the calculation correctly tracks any future changes in page header size. The computation had been inaccurate in a way that didn't cause any harm except space wastage, but future changes could have broken it more drastically. Fix the calculation of BTMaxItemSize, which was formerly computed as 1 byte more than it could safely be. This didn't cause any harm in practice because it's only compared against maxalign'd lengths, but future changes in the size of page headers or btree special space could have exposed the problem. initdb forced because of change in TOAST_MAX_CHUNK_SIZE, which alters the storage of toast tables.
*	Don't MAXALIGN in the checks to decide whether a tuple is over TOAST's	Tom Lane	2007-02-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	threshold for tuple length. On 4-byte-MAXALIGN machines, the toast code creates tuples that have t_len exactly TOAST_TUPLE_THRESHOLD ... but this number is not itself maxaligned, so if heap_insert maxaligns t_len before comparing to TOAST_TUPLE_THRESHOLD, it'll uselessly recurse back to tuptoaster.c, wasting cycles. (It turns out that this does not happen on 8-byte-MAXALIGN machines, because for them the outer MAXALIGN in the TOAST_MAX_CHUNK_SIZE macro reduces TOAST_MAX_CHUNK_SIZE so that toast tuples will be less than TOAST_TUPLE_THRESHOLD in size. That MAXALIGN is really incorrect, but we can't remove it now, see below.) There isn't any particular value in maxaligning before comparing to the thresholds, so just don't do that, which saves a small number of cycles in itself. These numbers should be rejiggered to minimize wasted space on toast-relation pages, but we can't do that in the back branches because changing TOAST_MAX_CHUNK_SIZE would force an initdb (by changing the contents of toast tables). We can move the toast decision thresholds a bit, though, which is what this patch effectively does. Thanks to Pavan Deolasee for discovering the unintended recursion. Back-patch into 8.2, but not further, pending more testing. (HEAD is about to get a further patch modifying the thresholds, so it won't help much for testing this form of the patch.)
*	Wording cleanup for error messages. Also change can't -> cannot.	Bruce Momjian	2007-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".
*	Fix a few typos in comments in GiN.	Neil Conway	2007-02-01
\|
*	Allow GIN's extractQuery method to signal that nothing can satisfy the query.	Teodor Sigaev	2007-01-31
\| \| \| \| \| \| \| \| \| \| \| \| \|	In this case extractQuery should returns -1 as nentries. This changes prototype of extractQuery method to use int32* instead of uint32* for nentries argument. Based on that gincostestimate may see two corner cases: nothing will be found or seqscan should be used. Per proposal at http://archives.postgresql.org/pgsql-hackers/2007-01/msg01581.php PS tsearch_core patch should be sightly modified to support changes, but I'm waiting a verdict about reviewing of tsearch_core patch.
*	Add support for cross-type hashing in hash index searches and hash joins.	Tom Lane	2007-01-30
\| \| \| \| \| \|	Hashing for aggregation purposes still needs work, so it's not time to mark any cross-type operators as hashable for general use, but these cases work if the operators are so marked by hand in the system catalogs.
*	Add comment noting that hashm_procid in a hash index's metapage isn't	Tom Lane	2007-01-29
\| \| \| \|	actually used for anything.
*	Correct an old logic error in btree page splitting: when considering a split	Tom Lane	2007-01-27
\| \| \| \| \| \| \| \| \| \| \| \|	exactly at the point where we need to insert a new item, the calculation used the wrong size for the "high key" of the new left page. This could lead to choosing an unworkable split, resulting in "PANIC: failed to add item to the left sibling" (or "right sibling") failure. Although this bug has been there a long time, it's very difficult to trigger a failure before 8.2, since there was generally a lot of free space on both sides of a chosen split. In 8.2, where the user-selected fill factor determines how much free space the code tries to leave, an unworkable split is much more likely. Report by Joe Conway, diagnosis and fix by Heikki Linnakangas.
*	Prevent WAL logging when COPY is done in the same transation that	Bruce Momjian	2007-01-25
\| \| \| \| \| \|	created it. Simon Riggs
*	Refactor the index AM API slightly: move currentItemData and	Neil Conway	2007-01-20
\| \| \| \| \| \| \|	currentMarkData from IndexScanDesc to the opaque structs for the AMs that need this information (currently gist and hash). Patch from Heikki Linnakangas, fixes by Neil Conway.
*	Remove remains of old depend target.	Peter Eisentraut	2007-01-20
\|
*	Arrange for autovacuum to be killed when another operation wants to be alone	Alvaro Herrera	2007-01-16
\| \| \| \| \| \|	accessing it, like DROP DATABASE. This allows the regression tests to pass with autovacuum enabled, which open the gates for finally enabling autovacuum by default.
*	Add some notes about the basic mathematical laws that the system presumes	Tom Lane	2007-01-12
\| \| \| \| \| \|	hold true for operators in a btree operator family. This is mostly to clarify my own thinking about what the planner can assume for optimization purposes. (blowing dust off an old abstract-algebra textbook...)
*	Enable another five tuple status bits by using the high bits of the	Bruce Momjian	2007-01-09
\| \| \| \| \| \|	nattr field, and rename the field. Heikki Linnakangas
*	Add a citation to Seltzer and Yigit's Usenix '91 paper about hash table	Tom Lane	2007-01-09
\| \| \| \| \| \| \| \| \| \|	management. The paper clearly describes many of the ideas embodied in our current hashing code, but as far as I could find out there is not a direct code heritage. (Mike Olsen recalls discussion of this paper at Postgres meetings but believes it "informed the Postgres implementation probably just at the design level". Margo herself says she wasn't involved with Postgres' hash code.) Credit where credit is due 'n all that, even if fifteen years after the fact.
*	Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LAST	Tom Lane	2007-01-09
\| \| \| \| \| \| \| \| \| \| \| \|	per-column options for btree indexes. The planner's support for this is still pretty rudimentary; it does not yet know how to plan mergejoins with nondefault ordering options. The documentation is pretty rudimentary, too. I'll work on improving that stuff later. Note incompatible change from prior behavior: ORDER BY ... USING will now be rejected if the operator is not a less-than or greater-than member of some btree opclass. This prevents less-than-sane behavior if an operator that doesn't actually define a proper sort ordering is selected.
*	Update CVS HEAD for 2007 copyright. Back branches are typically not	Bruce Momjian	2007-01-05
\| \| \| \|	back-stamped for this.
*	Fix some small typos in comments. Greg Stark	Tom Lane	2007-01-04
\|
*	Clean up smgr.c/md.c APIs as per discussion a couple months ago. Instead of	Tom Lane	2007-01-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	having md.c return a success/failure boolean to smgr.c, which was just going to elog anyway, let md.c issue the elog messages itself. This allows better error reporting, particularly in cases such as "short read" or "short write" which Peter was complaining of. Also, remove the kluge of allowing mdread() to return zeroes from a read-beyond-EOF: this is now an error condition except when InRecovery or zero_damaged_pages = true. (Hash indexes used to require that behavior, but no more.) Also, enforce that mdwrite() is to be used for rewriting existing blocks while mdextend() is to be used for extending the relation EOF. This restriction lets us get rid of the old ad-hoc defense against creating huge files by an accidental reference to a bogus block number: we'll only create new segments in mdextend() not mdwrite() or mdread(). (Again, when InRecovery we allow it anyway, since we need to allow updates of blocks that were later truncated away.) Also, clean up the original makeshift patch for bug #2737: move the responsibility for padding relation segments to full length into md.c.
*	Support type modifiers for user-defined types, and pull most knowledge	Tom Lane	2006-12-30
\| \| \| \| \| \|	about typmod representation for standard types out into type-specific typmod I/O functions. Teodor Sigaev, with some editorialization by Tom Lane.
*	Fix up btree's initial scankey processing to be able to detect redundant	Tom Lane	2006-12-28
\| \| \| \| \| \|	or contradictory keys even in cross-data-type scenarios. This is another benefit of the opfamily rewrite: we can find the needed comparison operators now.
*	Restructure operator classes to allow improved handling of cross-data-type	Tom Lane	2006-12-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cases. Operator classes now exist within "operator families". While most families are equivalent to a single class, related classes can be grouped into one family to represent the fact that they are semantically compatible. Cross-type operators are now naturally adjunct parts of a family, without having to wedge them into a particular opclass as we had done originally. This commit restructures the catalogs and cleans up enough of the fallout so that everything still works at least as well as before, but most of the work needed to actually improve the planner's behavior will come later. Also, there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way to create a new family right now is to allow CREATE OPERATOR CLASS to make one by default. I owe some more documentation work, too. But that can all be done in smaller pieces once this infrastructure is in place.
*	Remove the logId/logSeg fields from pg_control, because they are not needed	Tom Lane	2006-12-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in normal operation, and we can avoid rewriting pg_control at every log segment switch if we don't insist that these values be valid. Reducing the number of pg_control updates is a good idea for both performance and reliability. It does make pg_resetxlog's life a bit harder, but that seems a good tradeoff; and anyway the change to pg_resetxlog amounts to automating something people formerly needed to do by hand, namely look at the existing pg_xlog files to make sure the new WAL start point was past them. In passing, change the wording of xlog.c's "database system was interrupted" messages: describe the pg_control timestamp as "last known up at" rather than implying it is the exact time of service interruption. With this change the timestamp will generally be the time of the last checkpoint, which could be many minutes before the failure; and we've already seen indications that people tend to misinterpret the old wording. initdb forced due to change in pg_control layout. Simon Riggs and Tom Lane