aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
...
* Update the in-code documentation about the transaction system. Move itTom Lane2004-08-01
| | | | | into a README file instead of being in xact.c's header comment. Alvaro Herrera.
* Error message style adjustments, per Alvaro Herrera.Tom Lane2004-08-01
|
* Some mop-up work for savepoints (nested transactions). Store a smallTom Lane2004-08-01
| | | | | | | | | | | | number of active subtransaction XIDs in each backend's PGPROC entry, and use this to avoid expensive probes into pg_subtrans during TransactionIdIsInProgress. Extend EOXactCallback API to allow add-on modules to get control at subxact start/end. (This is deliberately not compatible with the former API, since any uses of that API probably need manual review anyway.) Add basic reference documentation for SAVEPOINT and related commands. Minor other cleanups to check off some of the open issues for subtransactions. Alvaro Herrera and Tom Lane.
* plpgsql does exceptions.Tom Lane2004-07-31
| | | | | | There are still some things that need refinement; in particular I fear that the recognized set of error condition names probably has little in common with what Oracle recognizes. But it's a start.
* Fix subtransaction behavior for large objects, temp namespace, files,Tom Lane2004-07-28
| | | | | | | password/group files. Also allow read-only subtransactions of a read-write parent, but not vice versa. These are the reasonably noncontroversial parts of Alvaro's recent mop-up patch, plus further work on large objects to minimize use of the TopTransactionResourceOwner.
* Replace nested-BEGIN syntax for subtransactions with spec-compliantTom Lane2004-07-27
| | | | | | | SAVEPOINT/RELEASE/ROLLBACK-TO syntax. (Alvaro) Cause COMMIT of a failed transaction to report ROLLBACK instead of COMMIT in its command tag. (Tom) Fix a few loose ends in the nested-transactions stuff.
* Add cross-check that current timeline of pg_control is an ancestor ofTom Lane2004-07-22
| | | | | | recovery_target_timeline --- otherwise there is no path from the backup to the requested timeline. This check was foreseen in the original discussion but I forgot to implement it.
* Add a check on file size as an additional safety check that a WAL fileTom Lane2004-07-22
| | | | | | | | | recovered from archive is not corrupt. It's not much but it will catch one common problem, viz out-of-disk-space. Also, force a WAL recovery scan when recovery.conf is present, even if pg_control shows a clean shutdown. This allows recovery with a tar backup that was taken with the postmaster shut down, as per complaint from Mark Kirkwood.
* Invent WAL timelines, as per recent discussion, to make point-in-timeTom Lane2004-07-21
| | | | | | | | recovery more manageable. Also, undo recent change to add FILE_HEADER and WASTED_SPACE records to XLOG; instead make the XLOG page header variable-size with extra fields in the first page of an XLOG file. This should fix the boundary-case bugs observed by Mark Kirkwood. initdb forced due to change of XLOG representation.
* Remove unportable use of strptime() to parse recovery target time spec.Tom Lane2004-07-19
| | | | Instead use our own abstimein code, which is more flexible anyway.
* XLOG file archiving and point-in-time recovery. There are still someTom Lane2004-07-19
| | | | | | loose ends and a glaring lack of documentation, but it basically works. Simon Riggs with some editorialization by Tom Lane.
* Invent ResourceOwner mechanism as per my recent proposal, and use it toTom Lane2004-07-17
| | | | | | | | keep track of portal-related resources separately from transaction-related resources. This allows cursors to work in a somewhat sane fashion with nested transactions. For now, cursor behavior is non-subtransactional, that is a cursor's state does not roll back if you abort a subtransaction that fetched from the cursor. We might want to change that later.
* Rename XLOG_BTREE_NEWPAGE xlog record type into XLOG_HEAP_NEWPAGE, andTom Lane2004-07-11
| | | | | | | | | | shift support code into heapam.c accordingly. This is in service of soon-to-be-committed ALTER TABLE SET TABLESPACE code that will want to use this same record type for both heaps and indexes. Theoretically I should have forced initdb for this, but in practice there is no change in xlog contents because CVS tip will never really emit this record type anyhow...
* Fix no-longer-correct bit-pushing in TransactionIdSetStatus, per Alvaro.Tom Lane2004-07-03
|
* Further review of xact.c state machine for nested transactions. FixTom Lane2004-07-01
| | | | | problems with starting subtransactions inside already-failed transactions. Clean up some comments.
* Nested transactions. There is still much left to do, especially on theTom Lane2004-07-01
| | | | | | | performance front, but with feature freeze upon us I think it's time to drive a stake in the ground and say that this will be in 7.5. Alvaro Herrera, with some help from Tom Lane.
* Tablespaces. Alternate database locations are dead, long live tablespaces.Tom Lane2004-06-18
| | | | | | | | | There are various things left to do: contrib dbsize and oid2name modules need work, and so does the documentation. Also someone should think about COMMENT ON TABLESPACE and maybe RENAME TABLESPACE. Also initlocation is dead, it just doesn't know it yet. Gavin Sherry and Tom Lane.
* Give inet/cidr datatypes their own hash function that ignores the inet vsTom Lane2004-06-13
| | | | | | | cidr type bit, the same as network_eq does. This is needed for hash joins and hash aggregation to work correctly on these types. Per bug report from Michael Fuhr, 2004-04-13. Also, improve hash function for int8 as suggested by Greg Stark.
* Infrastructure for I/O of composite types: arrange for the I/O routinesTom Lane2004-06-06
| | | | | | | | | | of a composite type to get that type's OID as their second parameter, in place of typelem which is useless. The actual changes are mostly centralized in getTypeInputInfo and siblings, but I had to fix a few places that were fetching pg_type.typelem for themselves instead of using the lsyscache.c routines. Also, I renamed all the related variables from 'typelem' to 'typioparam' to discourage people from assuming that they necessarily contain array element types.
* Tweak palloc/repalloc to allow zero bytes to be requested, as per recentTom Lane2004-06-05
| | | | | proposal. Eliminate several dozen now-unnecessary hacks to avoid palloc(0). (It's likely there are more that I didn't find.)
* Make the world very nearly safe for composite-type columns in tables.Tom Lane2004-06-05
| | | | | | | | | | | | | | | | | | | 1. Solve the problem of not having TOAST references hiding inside composite values by establishing the rule that toasting only goes one level deep: a tuple can contain toasted fields, but a composite-type datum that is to be inserted into a tuple cannot. Enforcing this in heap_formtuple is relatively cheap and it avoids a large increase in the cost of running the tuptoaster during final storage of a row. 2. Fix some interesting problems in expansion of inherited queries that reference whole-row variables. We never really did this correctly before, but it's now relatively painless to solve by expanding the parent's whole-row Var into a RowExpr() selecting the proper columns from the child. If you dike out the preventive check in CheckAttributeType(), composite-type columns now seem to actually work. However, we surely cannot ship them like this --- without I/O for composite types, you can't get pg_dump to dump tables containing them. So a little more work still to do.
* Resurrect heap_deformtuple(), this time implemented as a singly nestedTom Lane2004-06-04
| | | | | | | | | | loop over the fields instead of a loop around heap_getattr. This is considerably faster (O(N) instead of O(N^2)) when there are nulls or varlena fields, since those prevent use of attcacheoff. Replace loops over heap_getattr with heap_deformtuple in situations where all or most of the fields have to be fetched, such as printtup and tuptoaster. Profiling done more than a year ago shows that this should be a nice win for situations involving many-column tables.
* Adjust our timezone library to use pg_time_t (typedef'd as int64) inTom Lane2004-06-03
| | | | | | | | | | | | | | | | | | | place of time_t, as per prior discussion. The behavior does not change on machines without a 64-bit-int type, but on machines with one, which is most, we are rid of the bizarre boundary behavior at the edges of the 32-bit-time_t range (1901 and 2038). The system will now treat times over the full supported timestamp range as being in your local time zone. It may seem a little bizarre to consider that times in 4000 BC are PST or EST, but this is surely at least as reasonable as propagating Gregorian calendar rules back that far. I did not modify the format of the zic timezone database files, which means that for the moment the system will not know about daylight-savings periods outside the range 1901-2038. Given the way the files are set up, it's not a simple decision like 'widen to 64 bits'; we have to actually think about the range of years that need to be supported. We should probably inquire what the plans of the upstream zic people are before making any decisions of our own.
* Adjust btree index build to not use shared buffers, thereby avoiding theTom Lane2004-06-02
| | | | | | | | | | | locking conflict against concurrent CHECKPOINT that was discussed a few weeks ago. Also, if not using WAL archiving (which is always true ATM but won't be if PITR makes it into this release), there's no need to WAL-log the index build process; it's sufficient to force-fsync the completed index before commit. This seems to gain about a factor of 2 in my tests, which is consistent with writing half as much data. I did not try it with WAL on a separate drive though --- probably the gain would be a lot less in that scenario.
* Minor code rationalization: FlushRelationBuffers just returns void,Tom Lane2004-05-31
| | | | | | | | rather than an error code, and does elog(ERROR) not elog(WARNING) when it detects a problem. All callers were simply elog(ERROR)'ing on failure return anyway, and I find it hard to envision a caller that would not, so we may as well simplify the callers and produce the more useful error message directly.
* Per previous discussions, get rid of use of sync(2) in favor ofTom Lane2004-05-31
| | | | | | | | | | | | | | | | | explicitly fsync'ing every (non-temp) file we have written since the last checkpoint. In the vast majority of cases, the burden of the fsyncs should fall on the bgwriter process not on backends. (To this end, we assume that an fsync issued by the bgwriter will force out blocks written to the same file by other processes using other file descriptors. Anyone have a problem with that?) This makes the world safe for WIN32, which ain't even got sync(2), and really makes the world safe for Unixen as well, because sync(2) never had the semantics we need: it offers no way to wait for the requested I/O to finish. Along the way, fix a bug I recently introduced in xlog recovery: file truncation replay failed to clear bufmgr buffers for the dropped blocks, which could result in 'PANIC: heap_delete_redo: no block' later on in xlog replay.
* Use the new List API function names throughout the backend, and disable theNeil Conway2004-05-30
| | | | | list compatibility API by default. While doing this, I decided to keep the llast() macro around and introduce llast_int() and llast_oid() variants.
* Separate out bgwriter code into a logically separate module, ratherTom Lane2004-05-29
| | | | | | | | | than being random pieces of other files. Give bgwriter responsibility for all checkpoint activity (other than a post-recovery checkpoint); so this child process absorbs the functionality of the former transient checkpoint and shutdown subprocesses. While at it, create an actual include file for postmaster.c, which for some reason never had its own file before.
* Code review for EXEC_BACKEND changes. Reduce the number of #ifdefs byTom Lane2004-05-28
| | | | | | | | | | | about a third, make it work on non-Windows platforms again. (But perhaps I broke the WIN32 code, since I have no way to test that.) Fold all the paths that fork postmaster child processes to go through the single routine SubPostmasterMain, which takes care of resurrecting the state that would normally be inherited from the postmaster (including GUC variables). Clean up some places where there's no particularly good reason for the EXEC and non-EXEC cases to work differently. Take care of one or two FIXMEs that remained in the code.
* Get rid of the former rather baroque mechanism for propagating the valuesTom Lane2004-05-27
| | | | | | | | | | of ThisStartUpID and RedoRecPtr into new backends. It's a lot easier just to make them all grab the values out of shared memory during startup. This helps to decouple the postmaster from checkpoint execution, which I need since I'm intending to let the bgwriter do it instead, and it also fixes a bug in the Win32 port: ThisStartUpID wasn't getting propagated at all AFAICS. (Doesn't give me a lot of faith in the amount of testing that port has gotten.)
* Reimplement the linked list data structure used throughout the backend.Neil Conway2004-05-26
| | | | | | | | | | | | | | | | In the past, we used a 'Lispy' linked list implementation: a "list" was merely a pointer to the head node of the list. The problem with that design is that it makes lappend() and length() linear time. This patch fixes that problem (and others) by maintaining a count of the list length and a pointer to the tail node along with each head node pointer. A "list" is now a pointer to a structure containing some meta-data about the list; the head and tail pointers in that structure refer to ListCell structures that maintain the actual linked list of nodes. The function names of the list API have also been changed to, I hope, be more logically consistent. By default, the old function names are still available; they will be disabled-by-default once the rest of the tree has been updated to use the new API names.
* For multi-table ANALYZE, use per-table transactions when possibleTom Lane2004-05-22
| | | | | (ie, when not inside a transaction block), so that we can avoid holding locks longer than necessary. Per trouble report from Philip Warner.
* Put back #include <sys/time.h> in files that seem to need it on Linux.Tom Lane2004-05-21
|
* Integrate src/timezone library for all platforms. There is more we canTom Lane2004-05-21
| | | | | | and should do now that we control our own destiny for timezone handling, but this commit gets the bulk of the picayune diffs in place. Magnus Hagander and Tom Lane.
* Fix speling.Tom Lane2004-05-20
|
* Get rid of rd_nblocks field in relcache entries. Turns out this wasTom Lane2004-05-08
| | | | | | | | | costing us lots more to maintain than it was worth. On shared tables it was of exactly zero benefit because we couldn't trust it to be up to date. On temp tables it sometimes saved an lseek, but not often enough to be worth getting excited about. And the real problem was that we forced an lseek on every relcache flush in order to update the field. So all in all it seems best to lose the complexity.
* Solve the 'Turkish problem' with undesirable locale behavior for caseTom Lane2004-05-07
| | | | | | | | | | | | | conversion of basic ASCII letters. Remove all uses of strcasecmp and strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp; remove most but not all direct uses of toupper and tolower in favor of pg_toupper and pg_tolower. These functions use the same notions of case folding already developed for identifier case conversion. I left the straight locale-based folding in place for situations where we are just manipulating user data and not trying to match it to built-in strings --- for example, the SQL upper() function is still locale dependent. Perhaps this will prove not to be what's wanted, but at the moment we can initdb and pass regression tests in Turkish locale.
* Tweak indexscan and seqscan code to arrange that steps from one page toTom Lane2004-04-21
| | | | | | | | | the next are handled by ReleaseAndReadBuffer rather than separate ReleaseBuffer and ReadBuffer calls. This cuts the number of acquisitions of the BufMgrLock by a factor of 2 (possibly more, if an indexscan happens to pull successive rows from the same heap page). Unfortunately this doesn't seem enough to get us out of the recently discussed context-switch storm problem, but it's surely worth doing anyway.
* * Most changes are to fix warnings issued when compiling win32Bruce Momjian2004-04-19
| | | | | | | | | | | | | | | | | | | | | * removed a few redundant defines * get_user_name safe under win32 * rationalized pipe read EOF for win32 (UPDATED PATCH USED) * changed all backend instances of sleep() to pg_usleep - except for the SLEEP_ON_ASSERT in assert.c, as it would exceed a 32-bit long [Note to patcher: If a SLEEP_ON_ASSERT of 2000 seconds is acceptable, please replace with pg_usleep(2000000000L)] I added a comment to that part of the code: /* * It would be nice to use pg_usleep() here, but only does 2000 sec * or 33 minutes, which seems too short. */ sleep(1000000); Claudio Natoli
* This is a cleanup patch for access/transam/xact.c. It only removes someBruce Momjian2004-04-05
| | | | | | | #ifdef NOT_USED code, and adds a new TBLOCK state which signals the fact that StartTransaction() has been executed. Alvaro Herrera
* Replace TupleTableSlot convention for whole-row variables and functionTom Lane2004-04-01
| | | | | | | | results with tuples as ordinary varlena Datums. This commit does not in itself do much for us, except eliminate the horrid memory leak associated with evaluation of whole-row variables. However, it lays the groundwork for allowing composite types as table columns, and perhaps some other useful features as well. Per my proposal of a few days ago.
* Cleanup vectors of GISTENTRY and eliminate problem with 64-bit strict-alignedTeodor Sigaev2004-03-30
| | | | | | boxes. Change interface to user-defined GiST support methods union and picksplit. Now instead of bytea struct it used special GistEntryVector structure.
* Increase xlog str_time() static string variable, per Korean User's Group.Bruce Momjian2004-03-22
|
* Add NOWAIT option to LOCK commandTatsuo Ishii2004-03-11
|
* Replace opendir/closedir calls throughout the backend with AllocateDirTom Lane2004-02-23
| | | | | | | | | | and FreeDir routines modeled on the existing AllocateFile/FreeFile. Like the latter, these routines will avoid failing on EMFILE/ENFILE conditions whenever possible, and will prevent leakage of directory descriptors if an elog() occurs while one is open. Also, reduce PANIC to ERROR in MoveOfflineLogs() --- this is not critical code and there is no reason to force a DB restart on failure. All per recent trouble report from Olivier Hubaut.
* Here is an updated version of the win32 readdir patch.Bruce Momjian2004-02-17
| | | | | | | | | | | | | | 1) Now puts in exactly the same change as the current-cvs mingw code does. (see http://cvs.sourceforge.net/viewcvs.py/mingw/runtime/mingwex/dirent.c?r1= 1.3&r2=1.4, second part of the patch). 2) Updates both xlog.c and slru.c in backend/access/transam/ 3) Also updates pg_resetxlog, which also uses readdir() and checks the errno value after the loop. Magnus Hagander
* Commit the reasonably uncontroversial parts of J.R. Nield's PITR patch, toTom Lane2004-02-11
| | | | | | | | | | | wit: Add a header record to each WAL segment file so that it can be reliably identified. Avoid splitting WAL records across segment files (this is not strictly necessary, but makes it simpler to incorporate the header records). Make WAL entries for file creation, deletion, and truncation (as foreseen but never implemented by Vadim). Also, add support for making XLOG_SEG_SIZE configurable at compile time, similarly to BLCKSZ. Fix a couple bugs I introduced in WAL replay during recent smgr API changes. initdb is forced due to changes in pg_control contents.
* Centralize implementation of delay code by creating a pg_usleep()Tom Lane2004-02-10
| | | | | | | | | subroutine in src/port/pgsleep.c. Remove platform dependencies from miscadmin.h and put them in port.h where they belong. Extend recent vacuum cost-based-delay patch to apply to VACUUM FULL, ANALYZE, and non-btree index vacuuming. By the way, where is the documentation for the cost-based-delay patch?
* Restructure smgr API as per recent proposal. smgr no longer depends onTom Lane2004-02-10
| | | | | | | | | the relcache, and so the notion of 'blind write' is gone. This should improve efficiency in bgwriter and background checkpoint processes. Internal restructuring in md.c to remove the not-very-useful array of MdfdVec objects --- might as well just use pointers. Also remove the long-dead 'persistent main memory' storage manager (mm.c), since it seems quite unlikely to ever get resurrected.
* Cost based vacuum delay feature.Jan Wieck2004-02-06
| | | | Jan