aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
...
* Normalize fgets() calls to use sizeof() for calculating the buffer sizePeter Eisentraut2007-02-08
| | | | | | | where possible, and fix some sites that apparently thought that fgets() will overwrite the buffer by one byte. Also add some strlcpy() to eliminate some weird memory handling.
* Reduce WAL activity for page splits:Bruce Momjian2007-02-08
| | | | | | | | | > Currently, an index split writes all the data on the split page to > WAL. That's a lot of WAL traffic. The tuples that are copied to the > right page need to be WAL logged, but the tuples that stay on the > original page don't. Heikki Linnakangas
* Add a function pg_stat_clear_snapshot() that discards any statistics snapshotTom Lane2007-02-07
| | | | | | | | | | | | already collected in the current transaction; this allows plpgsql functions to watch for stats updates even though they are confined to a single transaction. Use this instead of the previous kluge involving pg_stat_file() to wait for the stats collector to update in the stats regression test. Internally, decouple storage of stats snapshots from transaction boundaries; they'll now stick around until someone calls pgstat_clear_snapshot --- which xact.c still does at transaction end, to maintain the previous behavior. This makes the logic a lot cleaner, at the price of a couple dozen cycles per transaction exit.
* Remove the xlog-centric "database system is ready" message and replace it withTom Lane2007-02-07
| | | | | | "database system is ready to accept connections", which is issued by the postmaster when it really is ready to accept connections. Per proposal from Markus Schiltknecht and subsequent discussion.
* Remove some dead code, per Heikki.Tom Lane2007-02-06
|
* Rename MaxTupleSize to MaxHeapTupleSize to clarify that it's not meant toTom Lane2007-02-05
| | | | | | | | | | | | | | | | | | | | describe the maximum size of index tuples (which is typically AM-dependent anyway); and consequently remove the bogus deduction for "special space" that was built into it. Adjust TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE to avoid wasting two bytes per toast chunk, and to ensure that the calculation correctly tracks any future changes in page header size. The computation had been inaccurate in a way that didn't cause any harm except space wastage, but future changes could have broken it more drastically. Fix the calculation of BTMaxItemSize, which was formerly computed as 1 byte more than it could safely be. This didn't cause any harm in practice because it's only compared against maxalign'd lengths, but future changes in the size of page headers or btree special space could have exposed the problem. initdb forced because of change in TOAST_MAX_CHUNK_SIZE, which alters the storage of toast tables.
* Don't MAXALIGN in the checks to decide whether a tuple is over TOAST'sTom Lane2007-02-04
| | | | | | | | | | | | | | | | | | | | | | | | | | threshold for tuple length. On 4-byte-MAXALIGN machines, the toast code creates tuples that have t_len exactly TOAST_TUPLE_THRESHOLD ... but this number is not itself maxaligned, so if heap_insert maxaligns t_len before comparing to TOAST_TUPLE_THRESHOLD, it'll uselessly recurse back to tuptoaster.c, wasting cycles. (It turns out that this does not happen on 8-byte-MAXALIGN machines, because for them the outer MAXALIGN in the TOAST_MAX_CHUNK_SIZE macro reduces TOAST_MAX_CHUNK_SIZE so that toast tuples will be less than TOAST_TUPLE_THRESHOLD in size. That MAXALIGN is really incorrect, but we can't remove it now, see below.) There isn't any particular value in maxaligning before comparing to the thresholds, so just don't do that, which saves a small number of cycles in itself. These numbers should be rejiggered to minimize wasted space on toast-relation pages, but we can't do that in the back branches because changing TOAST_MAX_CHUNK_SIZE would force an initdb (by changing the contents of toast tables). We can move the toast decision thresholds a bit, though, which is what this patch effectively does. Thanks to Pavan Deolasee for discovering the unintended recursion. Back-patch into 8.2, but not further, pending more testing. (HEAD is about to get a further patch modifying the thresholds, so it won't help much for testing this form of the patch.)
* Wording cleanup for error messages. Also change can't -> cannot.Bruce Momjian2007-02-01
| | | | | | | | | | | | | | Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".
* Fix a few typos in comments in GiN.Neil Conway2007-02-01
|
* Allow GIN's extractQuery method to signal that nothing can satisfy the query.Teodor Sigaev2007-01-31
| | | | | | | | | | | | | In this case extractQuery should returns -1 as nentries. This changes prototype of extractQuery method to use int32* instead of uint32* for nentries argument. Based on that gincostestimate may see two corner cases: nothing will be found or seqscan should be used. Per proposal at http://archives.postgresql.org/pgsql-hackers/2007-01/msg01581.php PS tsearch_core patch should be sightly modified to support changes, but I'm waiting a verdict about reviewing of tsearch_core patch.
* Add support for cross-type hashing in hash index searches and hash joins.Tom Lane2007-01-30
| | | | | | Hashing for aggregation purposes still needs work, so it's not time to mark any cross-type operators as hashable for general use, but these cases work if the operators are so marked by hand in the system catalogs.
* Add comment noting that hashm_procid in a hash index's metapage isn'tTom Lane2007-01-29
| | | | actually used for anything.
* Correct an old logic error in btree page splitting: when considering a splitTom Lane2007-01-27
| | | | | | | | | | | | exactly at the point where we need to insert a new item, the calculation used the wrong size for the "high key" of the new left page. This could lead to choosing an unworkable split, resulting in "PANIC: failed to add item to the left sibling" (or "right sibling") failure. Although this bug has been there a long time, it's very difficult to trigger a failure before 8.2, since there was generally a lot of free space on both sides of a chosen split. In 8.2, where the user-selected fill factor determines how much free space the code tries to leave, an unworkable split is much more likely. Report by Joe Conway, diagnosis and fix by Heikki Linnakangas.
* Prevent WAL logging when COPY is done in the same transation thatBruce Momjian2007-01-25
| | | | | | created it. Simon Riggs
* Refactor the index AM API slightly: move currentItemData andNeil Conway2007-01-20
| | | | | | | currentMarkData from IndexScanDesc to the opaque structs for the AMs that need this information (currently gist and hash). Patch from Heikki Linnakangas, fixes by Neil Conway.
* Remove remains of old depend target.Peter Eisentraut2007-01-20
|
* Arrange for autovacuum to be killed when another operation wants to be aloneAlvaro Herrera2007-01-16
| | | | | | accessing it, like DROP DATABASE. This allows the regression tests to pass with autovacuum enabled, which open the gates for finally enabling autovacuum by default.
* Add some notes about the basic mathematical laws that the system presumesTom Lane2007-01-12
| | | | | | hold true for operators in a btree operator family. This is mostly to clarify my own thinking about what the planner can assume for optimization purposes. (blowing dust off an old abstract-algebra textbook...)
* Enable another five tuple status bits by using the high bits of theBruce Momjian2007-01-09
| | | | | | nattr field, and rename the field. Heikki Linnakangas
* Add a citation to Seltzer and Yigit's Usenix '91 paper about hash tableTom Lane2007-01-09
| | | | | | | | | | management. The paper clearly describes many of the ideas embodied in our current hashing code, but as far as I could find out there is not a direct code heritage. (Mike Olsen recalls discussion of this paper at Postgres meetings but believes it "informed the Postgres implementation probably just at the design level". Margo herself says she wasn't involved with Postgres' hash code.) Credit where credit is due 'n all that, even if fifteen years after the fact.
* Support ORDER BY ... NULLS FIRST/LAST, and add ASC/DESC/NULLS FIRST/NULLS LASTTom Lane2007-01-09
| | | | | | | | | | | | per-column options for btree indexes. The planner's support for this is still pretty rudimentary; it does not yet know how to plan mergejoins with nondefault ordering options. The documentation is pretty rudimentary, too. I'll work on improving that stuff later. Note incompatible change from prior behavior: ORDER BY ... USING will now be rejected if the operator is not a less-than or greater-than member of some btree opclass. This prevents less-than-sane behavior if an operator that doesn't actually define a proper sort ordering is selected.
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-05
| | | | back-stamped for this.
* Fix some small typos in comments. Greg StarkTom Lane2007-01-04
|
* Clean up smgr.c/md.c APIs as per discussion a couple months ago. Instead ofTom Lane2007-01-03
| | | | | | | | | | | | | | | | | | having md.c return a success/failure boolean to smgr.c, which was just going to elog anyway, let md.c issue the elog messages itself. This allows better error reporting, particularly in cases such as "short read" or "short write" which Peter was complaining of. Also, remove the kluge of allowing mdread() to return zeroes from a read-beyond-EOF: this is now an error condition except when InRecovery or zero_damaged_pages = true. (Hash indexes used to require that behavior, but no more.) Also, enforce that mdwrite() is to be used for rewriting existing blocks while mdextend() is to be used for extending the relation EOF. This restriction lets us get rid of the old ad-hoc defense against creating huge files by an accidental reference to a bogus block number: we'll only create new segments in mdextend() not mdwrite() or mdread(). (Again, when InRecovery we allow it anyway, since we need to allow updates of blocks that were later truncated away.) Also, clean up the original makeshift patch for bug #2737: move the responsibility for padding relation segments to full length into md.c.
* Support type modifiers for user-defined types, and pull most knowledgeTom Lane2006-12-30
| | | | | | about typmod representation for standard types out into type-specific typmod I/O functions. Teodor Sigaev, with some editorialization by Tom Lane.
* Fix up btree's initial scankey processing to be able to detect redundantTom Lane2006-12-28
| | | | | | or contradictory keys even in cross-data-type scenarios. This is another benefit of the opfamily rewrite: we can find the needed comparison operators now.
* Restructure operator classes to allow improved handling of cross-data-typeTom Lane2006-12-23
| | | | | | | | | | | | | | | | cases. Operator classes now exist within "operator families". While most families are equivalent to a single class, related classes can be grouped into one family to represent the fact that they are semantically compatible. Cross-type operators are now naturally adjunct parts of a family, without having to wedge them into a particular opclass as we had done originally. This commit restructures the catalogs and cleans up enough of the fallout so that everything still works at least as well as before, but most of the work needed to actually improve the planner's behavior will come later. Also, there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way to create a new family right now is to allow CREATE OPERATOR CLASS to make one by default. I owe some more documentation work, too. But that can all be done in smaller pieces once this infrastructure is in place.
* Remove the logId/logSeg fields from pg_control, because they are not neededTom Lane2006-12-08
| | | | | | | | | | | | | | | | | | | in normal operation, and we can avoid rewriting pg_control at every log segment switch if we don't insist that these values be valid. Reducing the number of pg_control updates is a good idea for both performance and reliability. It does make pg_resetxlog's life a bit harder, but that seems a good tradeoff; and anyway the change to pg_resetxlog amounts to automating something people formerly needed to do by hand, namely look at the existing pg_xlog files to make sure the new WAL start point was past them. In passing, change the wording of xlog.c's "database system was interrupted" messages: describe the pg_control timestamp as "last known up at" rather than implying it is the exact time of service interruption. With this change the timestamp will generally be the time of the last checkpoint, which could be many minutes before the failure; and we've already seen indications that people tend to misinterpret the old wording. initdb forced due to change in pg_control layout. Simon Riggs and Tom Lane
* Add a txn_start column to pg_stat_activity. This makes it easier toNeil Conway2006-12-06
| | | | | | | | identify long-running transactions. Since we already need to record the transaction-start time (e.g. for now()), we don't need any additional system calls to report this information. Catversion bumped, initdb required.
* Minor adjustments to make failures in startup/shutdown behave more cleanly.Tom Lane2006-11-30
| | | | | | | | | | | | | | | | StartupXLOG and ShutdownXLOG no longer need to be critical sections, because in all contexts where they are invoked, elog(ERROR) would be translated to elog(FATAL) anyway. (One change in bgwriter.c is needed to make this true: set ExitOnAnyError before trying to exit. This is a good fix anyway since the existing code would have gone into an infinite loop on elog(ERROR) during shutdown.) That avoids a misleading report of PANIC during semi-orderly failures. Modify the postmaster to include the startup process in the set of processes that get SIGTERM when a fast shutdown is requested, and also fix it to not try to restart the bgwriter if the bgwriter fails while trying to write the shutdown checkpoint. Net result is that "pg_ctl stop -m fast" does something reasonable for a system in warm standby mode, and so should Unix system shutdown (ie, universal SIGTERM). Per gripe from Stephen Harris and some corner-case testing of my own.
* Fix bug with page deletion. If inner page is removed and it tries toTeodor Sigaev2006-11-30
| | | | | | | | | | remove page on next level linked from next inner page, ginScanToDelete() wrongly sets parent page. Bug reveals when many item pointers from index was deleted ( several hundred thousands). Bug is discovered by hubert depesz lubaczewski <depesz@gmail.com> Suppose, we need rc2 before release...
* Add a comment noting that heap_copytuple_with_tuple() results in aNeil Conway2006-11-23
| | | | | | HeapTuple that is no longer allocated as a single palloc() block; if used carelessly, this might result in a subsequent memory leak after heap_freetuple().
* Several changes to reduce the probability of running out of memory duringTom Lane2006-11-23
| | | | | | | | | | | | | | | | | AbortTransaction, which would lead to recursion and eventual PANIC exit as illustrated in recent report from Jeff Davis. First, in xact.c create a special dedicated memory context for AbortTransaction to run in. This solves the problem as long as AbortTransaction doesn't need more than 32K (or whatever other size we create the context with). But in corner cases it might. Second, in trigger.c arrange to keep pending after-trigger event records in separate contexts that can be freed near the beginning of AbortTransaction, rather than having them persist until CleanupTransaction as before. Third, in portalmem.c arrange to free executor state data earlier as well. These two changes should result in backing off the out-of-memory condition before AbortTransaction needs any significant amount of memory, at least in typical cases such as memory overrun due to too many trigger events or too big an executor hash table. And all the same for subtransaction abort too, of course.
* On systems that have setsid(2) (which should be just about everything exceptTom Lane2006-11-21
| | | | | | | | | | | | | Windows), arrange for each postmaster child process to be its own process group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole process group not only the direct child process. This provides saner behavior for archive and recovery scripts; in particular, it's possible to shut down a warm-standby recovery server using "pg_ctl stop -m immediate", since delivery of SIGQUIT to the startup subprocess will result in killing the waiting recovery_command. Also, this makes Query Cancel and statement_timeout apply to scripts being run from backends via system(). (There is no support in the core backend for that, but it's widely done using untrusted PLs.) Per gripe from Stephen Harris and subsequent discussion.
* Repair problems with hash indexes that span multiple segments: the hash code'sTom Lane2006-11-19
| | | | | | | | | | | | | | | | | | preference for filling pages out-of-order tends to confuse the sanity checks in md.c, as per report from Balazs Nagy in bug #2737. The fix is to ensure that the smgr-level code always has the same idea of the logical EOF as the hash index code does, by using ReadBuffer(P_NEW) where we are adding a single page to the end of the index, and using smgrextend() to reserve a large batch of pages when creating a new splitpoint. The patch is a bit ugly because it avoids making any changes in md.c, which seems the most prudent approach for a backpatchable beta-period fix. After 8.3 development opens, I'll take a look at a cleaner but more invasive patch, in particular getting rid of the now unnecessary hack to allow reading beyond EOF in mdread(). Backpatch as far as 7.4. The bug likely exists in 7.3 as well, but because of the magnitude of the 7.3-to-7.4 changes in hash, the later-version patch doesn't even begin to apply. Given the other known bugs in the 7.3-era hash code, it does not seem worth trying to develop a separate patch for 7.3.
* Repair two related errors in heap_lock_tuple: it was failing to recognizeTom Lane2006-11-17
| | | | | | | | | cases where we already hold the desired lock "indirectly", either via membership in a MultiXact or because the lock was originally taken by a different subtransaction of the current transaction. These cases must be accounted for to avoid needless deadlocks and/or inappropriate replacement of an exclusive lock with a shared lock. Per report from Clarence Gardner and subsequent investigation.
* String fixPeter Eisentraut2006-11-16
|
* Fix some typos in comments.Neil Conway2006-11-12
|
* Suppress a few 'uninitialized variable' warnings that gcc emits only atTom Lane2006-11-11
| | | | | -O3 or higher (presumably because it inlines more things). Per gripe from Mark Mielke.
* Clean up some misleading references to %p being a full path, per Simon.Tom Lane2006-11-10
|
* Change Windows rename and unlink substitutes so that they time out afterTom Lane2006-11-08
| | | | | | | | 30 seconds instead of retrying forever. Also modify xlog.c so that if it fails to rename an old xlog segment up to a future slot, it will unlink the segment instead. Per discussion of bug #2712, in which it became apparent that Windows can handle unlinking a file that's being held open, but not renaming it.
* Fix recently-understood problems with handling of XID freezing, particularlyTom Lane2006-11-05
| | | | | | | | | | | | | | | in PITR scenarios. We now WAL-log the replacement of old XIDs with FrozenTransactionId, so that such replacement is guaranteed to propagate to PITR slave databases. Also, rather than relying on hint-bit updates to be preserved, pg_clog is not truncated until all instances of an XID are known to have been replaced by FrozenTransactionId. Add new GUC variables and pg_autovacuum columns to allow management of the freezing policy, so that users can trade off the size of pg_clog against the amount of freezing work done. Revise the already-existing code that forces autovacuum of tables approaching the wraparound point to make it more bulletproof; also, revise the autovacuum logic so that anti-wraparound vacuuming is done per-table rather than per-database. initdb forced because of changes in pg_class, pg_database, and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.
* Fix "failed to re-find parent key" btree VACUUM failure by revising pageTom Lane2006-11-01
| | | | | | | | | | | deletion code to avoid the case where an upper-level btree page remains "half dead" for a significant period of time, and to block insertions into a key range that is in process of being re-assigned to the right sibling of the deleted page's parent. This prevents the scenario reported by Ed L. wherein index keys could become out-of-order in the grandparent index level. Since this is a moderately invasive fix, I'm applying it only to HEAD. The bug exists back to 7.4, but the back branches will get a different patch.
* Add some code to CREATE DATABASE to check for pre-existing subdirectoriesTom Lane2006-10-18
| | | | | | that conflict with the OID that we want to use for the new database. This avoids the risk of trying to remove files that maybe we shouldn't remove. Per gripe from Jon Lapham and subsequent discussion of 27-Sep.
* Message style improvementsPeter Eisentraut2006-10-06
|
* Cleanup for pglz_compress code: remove dead code, const-ify API ofTom Lane2006-10-05
| | | | | | remaining functions, simplify pglz_compress's API to not require a useless data copy when compression fails. Also add a check in pglz_decompress that the expected amount of data was decompressed.
* Make use of qsort_arg in several places that were formerly using klugyTom Lane2006-10-05
| | | | | | static variables. This avoids any risk of potential non-reentrancy, and in particular offers a much cleaner workaround for the Intel compiler bug that was affecting ginutil.c.
* pgindent run for 8.2.Bruce Momjian2006-10-04
|
* Make some sentences consistent with similar ones.Bruce Momjian2006-10-03
| | | | Euler Taveira de Oliveira
* Degrade the transaction-id wraparound point message from LOG to DEBUG1, perAlvaro Herrera2006-09-26
| | | | | | discussion. Patch from Simon Riggs.