postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Update error messages, per notes from Tom.	Magnus Hagander	2008-04-24
\| \| \| \|	Laurenz Albe
*	Prevent shutdown in normal mode if online backup is running, and	Magnus Hagander	2008-04-23
\| \| \| \| \| \| \| \| \|	have pg_ctl warn about this. Cancel running online backups (by renaming the backup_label file, thus rendering the backup useless) when shutting down in fast mode. Laurenz Albe
*	Fix using too many LWLocks bug, reported by Craig Ringer	Teodor Sigaev	2008-04-22
\| \| \| \| \| \| \| \| \|	<craig@postnewspapers.com.au>. It was my mistake, I missed limitation of number of held locks, now GIN doesn't use continiuous locks, but still hold buffers pinned to prevent interference with vacuum's deletion algorithm. Backpatch is needed.
*	Allow float8, int8, and related datatypes to be passed by value on machines	Tom Lane	2008-04-21
\| \| \| \| \| \| \| \| \| \|	where Datum is 8 bytes wide. Since this will break old-style C functions (those still using version 0 calling convention) that have arguments or results of these types, provide a configure option to disable it and retain the old pass-by-reference behavior. Likewise, provide a configure option to disable the recently-committed float4 pass-by-value change. Zoltan Boszormenyi, plus configurability stuff by me.
*	Clean up a few places where Datums were being treated as pointers (and vice	Alvaro Herrera	2008-04-17
\| \| \| \| \| \|	versa) without going through DatumGetPointer. Gavin Sherry, with Feng Tian.
*	Repair two places where SIGTERM exit could leave shared memory state	Tom Lane	2008-04-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	corrupted. (Neither is very important if SIGTERM is used to shut down the whole database cluster together, but there's a problem if someone tries to SIGTERM individual backends.) To do this, introduce new infrastructure macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care of transiently pushing an on_shmem_exit cleanup hook. Also use this method for createdb cleanup --- that wasn't a shared-memory-corruption problem, but SIGTERM abort of createdb could leave orphaned files lying around. Backpatch as far as 8.2. The shmem corruption cases don't exist in 8.1, and the createdb usage doesn't seem important enough to risk backpatching further.
*	Push index operator lossiness determination down to GIST/GIN opclass	Tom Lane	2008-04-14
\| \| \| \| \| \| \| \| \| \| \|	"consistent" functions, and remove pg_amop.opreqcheck, as per recent discussion. The main immediate benefit of this is that we no longer need 8.3's ugly hack of requiring @@@ rather than @@ to test weight-using tsquery searches on GIN indexes. In future it should be possible to optimize some other queries better than is done now, by detecting at runtime whether the index match is exact or not. Tom Lane, after an idea of Heikki's, and with some help from Teodor.
*	Phase 2 of project to make index operator lossiness be determined at runtime	Tom Lane	2008-04-13
\| \| \| \| \| \| \| \| \| \| \| \|	instead of plan time. Extend the amgettuple API so that the index AM returns a boolean indicating whether the indexquals need to be rechecked, and make that rechecking happen in nodeIndexscan.c (currently the only place where it's expected to be needed; other callers of index_getnext are just erroring out for now). For the moment, GIN and GIST have stub logic that just always sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing that control down to the opclass consistent() functions. The planner no longer pays any attention to amopreqcheck, and that catalog column will go away in due course.
*	Create new routines systable_beginscan_ordered, systable_getnext_ordered,	Tom Lane	2008-04-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	systable_endscan_ordered that have API similar to systable_beginscan etc (in particular, the passed-in scankeys have heap not index attnums), but guarantee ordered output, unlike the existing functions. For the moment these are just very thin wrappers around index_beginscan/index_getnext/etc. Someday they might need to get smarter; but for now this is just a code refactoring exercise to reduce the number of direct callers of index_getnext, in preparation for changing that function's API. In passing, remove index_getnext_indexitem, which has been dead code for quite some time, and will have even less use than that in the presence of run-time-lossy indexes.
*	Replace "amgetmulti" AM functions with "amgetbitmap", in which the whole	Tom Lane	2008-04-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	indexscan always occurs in one call, and the results are returned in a TIDBitmap instead of a limited-size array of TIDs. This should improve speed a little by reducing AM entry/exit overhead, and it is necessary infrastructure if we are ever to support bitmap indexes. In an only slightly related change, add support for TIDBitmaps to preserve (somewhat lossily) the knowledge that particular TIDs reported by an index need to have their quals rechecked when the heap is visited. This facility is not really used yet; we'll need to extend the forced-recheck feature to plain indexscans before it's useful, and that hasn't been coded yet. The intent is to use it to clean up 8.3's horrid @@@ kluge for text search with weighted queries. There might be other uses in future, but that one alone is sufficient reason. Heikki Linnakangas, with some adjustments by me.
*	Improve hash_any() to use word-wide fetches when hashing suitably aligned	Tom Lane	2008-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \|	data. This makes for a significant speedup at the cost that the results now vary between little-endian and big-endian machines; which forces us to add explicit ORDER BYs in a couple of regression tests to preserve machine-independent comparison results. Also, force initdb by bumping catversion, since the contents of hash indexes will change (at least on big-endian machines). Kenneth Marshall and Tom Lane, based on work from Bob Jenkins. This commit does not adopt Bob's new faster mix() algorithm, however, since we still need to convince ourselves that that doesn't degrade the quality of the hashing.
*	Have pg_stop_backup() wait for all archive files to be sent, rather than	Bruce Momjian	2008-04-05
\| \| \| \| \| \| \|	returing right away. This guarantees that when pg_stop_backup() returns, you have a valid backup. Simon Riggs
*	Remove heap_release_fetch, which is no longer used anywhere; this simplifies	Tom Lane	2008-04-03
\| \| \| \|	heap_fetch a little.
*	Move the HTSU_Result enum definition into snapshot.h, to avoid including	Alvaro Herrera	2008-03-26
\| \| \| \| \| \|	tqual.h into heapam.h. This makes all inclusion of tqual.h explicit. I also sorted alphabetically the includes on some source files.
*	Rename snapmgmt.c/h to snapmgr.c/h, for consistency with other files.	Alvaro Herrera	2008-03-26
\| \| \| \|	Per complaint from Tom Lane.
*	Separate snapshot management code from tuple visibility code, create a	Alvaro Herrera	2008-03-26
\| \| \| \| \| \| \| \| \| \| \| \| \|	snapmgmt.c file for the former. The header files have also been reorganized in three parts: the most basic snapshot definitions are now in a new file snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c. tqual.h has been reduced to the bare minimum. This patch is just a first step towards managing live snapshots within a transaction; there is no functionality change. Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and subsequent discussion.
*	Simplify and standardize conversions between TEXT datums and ordinary C	Tom Lane	2008-03-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	strings. This patch introduces four support functions cstring_to_text, cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and two macros CStringGetTextDatum and TextDatumGetCString. A number of existing macros that provided variants on these themes were removed. Most of the places that need to make such conversions now require just one function or macro call, in place of the multiple notational layers that used to be needed. There are no longer any direct calls of textout or textin, and we got most of the places that were using handmade conversions via memcpy (there may be a few still lurking, though). This commit doesn't make any serious effort to eliminate transient memory leaks caused by detoasting toasted text objects before they reach text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few places where it was easy, but much more could be done. Brendan Jurd and Tom Lane
*	More README src cleanups.	Bruce Momjian	2008-03-21
\|
*	Make source code READMEs more consistent. Add CVS tags to all README files.	Bruce Momjian	2008-03-20
\|
*	Enable probes to work with Mac OS X Leopard and other OSes that will	Peter Eisentraut	2008-03-17
\| \| \| \| \| \| \| \| \| \| \|	support DTrace in the future. Switch from using DTRACE_PROBEn macros to the dynamically generated macros. Use "dtrace -h" to create a header file that contains the dynamically generated macros to be used in the source code instead of the DTRACE_PROBEn macros. A dummy header file is generated for builds without DTrace support. Author: Robert Lor <Robert.Lor@sun.com>
*	Fix TransactionIdIsCurrentTransactionId() to use binary search instead of	Tom Lane	2008-03-17
\| \| \| \| \| \| \| \| \|	linear search when checking child-transaction XIDs. This makes for an important speedup in transactions that have large numbers of children, as in a recent example from Craig Ringer. We can also get rid of an ugly kluge that represented lists of TransactionIds as lists of OIDs. Heikki Linnakangas
*	When creating a large hash index, pre-sort the index entries by estimated	Tom Lane	2008-03-16
\| \| \| \| \| \| \| \| \| \|	bucket number, so as to ensure locality of access to the index during the insertion step. Without this, building an index significantly larger than available RAM takes a very long time because of thrashing. On the other hand, sorting is just useless overhead when the index does fit in RAM. We choose to sort when the initial index size exceeds effective_cache_size. This is a revised version of work by Tom Raney and Shreya Bhargava.
*	Change hash index creation so that rather than always establishing exactly	Tom Lane	2008-03-15
\| \| \| \| \| \| \| \| \| \| \|	two buckets at the start, we create a number of buckets appropriate for the estimated size of the table. This avoids a lot of expensive bucket-split actions during initial index build on an already-populated table. This is one of the two core ideas of Tom Raney and Shreya Bhargava's patch to reduce hash index build time. I'm committing it separately to make it easier for people to test the effects of this separately from the effects of their other core idea (pre-sorting the index entries by bucket number).
*	Fix heap_page_prune's problem with failing to send cache invalidation	Tom Lane	2008-03-13
\| \| \| \| \| \| \| \| \| \| \|	messages if the calling transaction aborts later on. Collapsing out line pointer redirects is a done deal as soon as we complete the page update, so syscache must be notified even if the VACUUM FULL as a whole doesn't complete. To fix, add some functionality to inval.c to allow the pending inval messages to be sent immediately while heap_page_prune is still running. The implementation is a bit chintzy: it will only work in the context of VACUUM FULL. But that's all we need now, and it can always be extended later if needed. Per my trouble report of a week ago.
*	Make TransactionIdIsInProgress check transam.c's single-item XID status cache	Tom Lane	2008-03-11
\| \| \| \| \| \| \| \|	before it goes groveling through the ProcArray. In situations where the same recently-committed transaction ID is checked repeatedly by tqual.c, this saves a lot of shared-memory searches. And it's cheap enough that it shouldn't hurt noticeably when it doesn't help. Concept and patch by Simon, some minor tweaking and comment-cleanup by Tom.
*	Remove no-longer-used XLogCacheByte field of XLogCtl.	Tom Lane	2008-03-10
\| \| \| \|	Itagaki Takahiro
*	Refactor heap_page_prune so that instead of changing item states on-the-fly,	Tom Lane	2008-03-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	it accumulates the set of changes to be made and then applies them. It had to accumulate the set of changes anyway to prepare a WAL record for the pruning action, so this isn't an enormous change; the only new complexity is to not doubly mark tuples that are visited twice in the scan. The main advantage is that we can substantially reduce the scope of the critical section in which the changes are applied, thus avoiding PANIC in foreseeable cases like running out of memory in inval.c. A nice secondary advantage is that it is now far clearer that WAL replay will actually do the same thing that the original pruning did. This commit doesn't do anything about the open problem that CacheInvalidateHeapTuple doesn't have the right semantics for a CTID change caused by collapsing out a redirect pointer. But whatever we do about that, it'll be a good idea to not do it inside a critical section.
*	This patch addresses some issues in TOAST compression strategy that	Tom Lane	2008-03-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	were discussed last year, but we felt it was too late in the 8.3 cycle to change the code immediately. Specifically, the patch: * Reduces the minimum datum size to be considered for compression from 256 to 32 bytes, as suggested by Greg Stark. * Increases the required compression rate for compressed storage from 20% to 25%, again per Greg's suggestion. * Replaces force_input_size (size above which compression is forced) with a maximum size to be considered for compression. It was agreed that allowing large inputs to escape the minimum-compression-rate requirement was not bright, and that indeed we'd rather have a knob that acted in the other direction. I set this value to 1MB for the moment, but it could use some performance studies to tune it. * Adds an early-failure path to the compressor as suggested by Jan: if it's been unable to find even one compressible substring in the first 1KB (parameterizable), assume we're looking at incompressible input and give up. (Possibly this logic can be improved, but I'll commit it as-is for now.) * Improves the toasting heuristics so that when we have very large fields with attstorage 'x' or 'e', we will push those out to toast storage before considering inline compression of shorter fields. This also responds to a suggestion of Greg's, though my original proposal for a solution was a bit off base because it didn't fix the problem for large 'e' fields. There was some discussion in the earlier threads of exposing some of the compression knobs to users, perhaps even on a per-column basis. I have not done anything about that here. It seems to me that if we are changing around the parameters, we'd better get some experience and be sure we are happy with the design before we set things in stone by providing user-visible knobs.
*	Change hashscan.c to keep its list of active hash index scans in	Tom Lane	2008-03-07
\| \| \| \| \| \| \| \| \| \| \| \| \|	TopMemoryContext, rather than scattered through executor per-query contexts. This poses no danger of memory leak since the ResourceOwner mechanism guarantees release of no-longer-needed items. It is needed because the per-query context might already be released by the time we try to clean up the hash scan list. Report by ykhuang, diagnosis by Heikki. Back-patch to 8.0, where the ResourceOwner-based cleanup was introduced. The given test case does not fail before 8.2, probably because we rearranged transaction abort processing somehow; but this coding is undoubtedly risky so I'll patch 8.0 and 8.1 anyway.
*	Fix PREPARE TRANSACTION to reject the case where the transaction has dropped a	Tom Lane	2008-03-04
\| \| \| \| \| \| \|	temporary table; we can't support that because there's no way to clean up the source backend's internal state if the eventual COMMIT PREPARED is done by another backend. This was checked correctly in 8.1 but I broke it in 8.2 :-(. Patch by Heikki Linnakangas, original trouble report by John Smith.
*	Reducing the assumed alignment of struct varlena means that the compiler	Tom Lane	2008-02-29
\| \| \| \| \| \| \| \| \| \|	is also licensed to put a local variable declared that way at an unaligned address. Which will not work if the variable is then manipulated with SET_VARSIZE or other macros that assume alignment. So the previous patch is not an unalloyed good, but on balance I think it's still a win, since we have very few places that do that sort of thing. Fix the one place in tuptoaster.c that does it. Per buildfarm results from gypsy_moth (I'm a bit surprised that only one machine showed a failure).
*	Change the declaration of struct varlena so that the length word is	Tom Lane	2008-02-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	represented as "char ...[4]" not "int32". Since the length word is never supposed to be accessed via this struct member anyway, this won't break any existing code that is following the rules. The advantage is that C compilers will no longer assume that a pointer to struct varlena is word-aligned, which prevents incorrect optimizations in TOAST-pointer access and perhaps other places. gcc doesn't seem to do this (at least not at -O2), but the problem is demonstrable on some other compilers. I changed struct inet as well, but didn't bother to touch a lot of other struct definitions in which it wouldn't make any difference because there were other fields forcing int alignment anyway. Hopefully none of those struct definitions are used for accessing unaligned Datums.
*	Remove another target I forgot during the refactoring	Peter Eisentraut	2008-02-19
\|
*	Refactor backend makefiles to remove lots of duplicate code	Peter Eisentraut	2008-02-19
\|
*	Replace time_t with pg_time_t (same values, but always int64) in on-disk	Tom Lane	2008-02-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	data structures and backend internal APIs. This solves problems we've seen recently with inconsistent layout of pg_control between machines that have 32-bit time_t and those that have already migrated to 64-bit time_t. Also, we can get out from under the problem that Windows' Unix-API emulation is not consistent about the width of time_t. There are a few remaining places where local time_t variables are used to hold the current or recent result of time(NULL). I didn't bother changing these since they do not affect any cross-module APIs and surely all platforms will have 64-bit time_t before overflow becomes an actual risk. time_t should be avoided for anything visible to extension modules, however.
*	Add a GUC variable "synchronize_seqscans" to allow clients to disable the new	Tom Lane	2008-01-30
\| \| \| \| \|	synchronized-scanning behavior, and make pg_dump disable sync scans so that it will reliably preserve row ordering. Per recent discussions.
*	Provide a clearer error message if the pg_control version number looks	Peter Eisentraut	2008-01-21
\| \| \| \|	wrong because of mismatched byte ordering.
*	Revise memory management for libxml calls. Instead of keeping libxml's data	Tom Lane	2008-01-15
\| \| \| \| \| \| \| \| \| \| \|	in whichever context happens to be current during a call of an xml.c function, use a dedicated context that will not go away until we explicitly delete it (which we do at transaction end or subtransaction abort). This makes recovery after an error much simpler --- we don't have to individually delete the data structures created by libxml. Also, we need to initialize and cleanup libxml only once per transaction (if there's no error) instead of once per function call, so it should be a bit faster. We'll need to keep an eye out for intra-transaction memory leaks, though. Alvaro and Tom.
*	Fix CREATE INDEX CONCURRENTLY so that it won't use synchronized scan for	Tom Lane	2008-01-14
\| \| \| \| \| \| \| \|	its second pass over the table. It has to start at block zero, else the "merge join" logic for detecting which TIDs are already in the index doesn't work. Hence, extend heapam.c's API so that callers can enable or disable syncscan. (I put in an option to disable buffer access strategy, too, just in case somebody needs it.) Per report from Hannes Dorbath.
*	Make standard maintenance operations (including VACUUM, ANALYZE, REINDEX,	Tom Lane	2008-01-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and CLUSTER) execute as the table owner rather than the calling user, using the same privilege-switching mechanism already used for SECURITY DEFINER functions. The purpose of this change is to ensure that user-defined functions used in index definitions cannot acquire the privileges of a superuser account that is performing routine maintenance. While a function used in an index is supposed to be IMMUTABLE and thus not able to do anything very interesting, there are several easy ways around that restriction; and even if we could plug them all, there would remain a risk of reading sensitive information and broadcasting it through a covert channel such as CPU usage. To prevent bypassing this security measure, execution of SET SESSION AUTHORIZATION and SET ROLE is now forbidden within a SECURITY DEFINER context. Thanks to Itagaki Takahiro for reporting this vulnerability. Security: CVE-2007-6600
*	Update copyrights in source tree to 2008.	Bruce Momjian	2008-01-01
\|
*	Improve a number of elog messages for not-supposed-to-happen cases in btrees,	Tom Lane	2007-12-31
\| \| \| \| \| \| \| \| \|	since these seem to happen after all in corrupted indexes. Make sure we supply the index name in all cases, and provide relevant block numbers where available. Also consistently identify the index name as such. Back-patch to 8.2, in hopes that this might help Mason Hale figure out his problem.
*	Code review for LIKE ... INCLUDING INDEXES patch. Fix failure to propagate	Tom Lane	2007-12-01
\| \| \| \| \| \| \| \| \| \|	constraint status of copied indexes (bug #3774), as well as various other small bugs such as failure to pstrdup when needed. Allow INCLUDING INDEXES indexes to be merged with identical declared indexes (perhaps not real useful, but the code is there and having it not apply to LIKE indexes seems pretty unorthogonal). Avoid useless work in generateClonedIndexStmt(). Undo some poorly chosen API changes, and put a couple of routines in modules that seem to be better places for them.
*	Avoid incrementing the CommandCounter when CommandCounterIncrement is called	Tom Lane	2007-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	but no database changes have been made since the last CommandCounterIncrement. This should result in a significant improvement in the number of "commands" that can typically be performed within a transaction before hitting the 2^32 CommandId size limit. In particular this buys back (and more) the possible adverse consequences of my previous patch to fix plan caching behavior. The implementation requires tracking whether the current CommandCounter value has been "used" to mark any tuples. CommandCounter values stored into snapshots are presumed not to be used for this purpose. This requires some small executor changes, since the executor used to conflate the curcid of the snapshot it was using with the command ID to mark output tuples with. Separating these concepts allows some small simplifications in executor APIs. Something for the TODO list: look into having CommandCounterIncrement not do AcceptInvalidationMessages. It seems fairly bogus to be doing it there, but exactly where to do it instead isn't clear, and I'm disinclined to mess with asynchronous behavior during late beta.
*	Improve GIN index build's tracking of memory usage by using	Tom Lane	2007-11-16
\| \| \| \| \| \| \| \| \|	GetMemoryChunkSpace, not just the palloc request size. This brings the allocatedMemory counter close enough to reality (as measured by MemoryContextStats printouts) that I think we can get rid of the arbitrary factor-of-2 adjustment that was put into the code initially. Given the sensitivity of GIN build to work memory size, not using as much of work memory as we're allowed to seems a pretty bad idea.
*	Repair still another bug in the btree page split WAL reduction patch:	Tom Lane	2007-11-16
\| \| \| \| \| \| \|	it failed for splits of non-leaf pages because in such pages the first data key on a page is suppressed, and so we can't just copy the first key from the right page to reconstitute the left page's high key. Problem found by Koichi Suzuki, patch by Heikki.
*	Small comment spacing improvement.	Bruce Momjian	2007-11-16
\|
*	Fix pgindent to properly handle 'else' and single-line comments on the	Bruce Momjian	2007-11-15
\| \| \| \| \|	same line; previous fix was only partial. Re-run pgindent on files that need it.
*	Re-run pgindent with updated list of typedefs. (Updated README should	Bruce Momjian	2007-11-15
\| \| \| \|	avoid this problem in the future.)
*	When logging the recovery.conf parameters, show them quoted as they would	Peter Eisentraut	2007-11-15
\| \| \| \|	appear in the configuration file.