aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/heap/hio.c
Commit message (Collapse)AuthorAge
* Extend amcheck to check heap pages.Robert Haas2020-10-22
| | | | | | | | Mark Dilger, reviewed by Peter Geoghegan, Andres Freund, Álvaro Herrera, Michael Paquier, Amul Sul, and by me. Some last-minute cosmetic revisions by me. Discussion: http://postgr.es/m/12ED3DA8-25F0-4B68-937D-D907CFBF08E7@enterprisedb.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Revert "Avoid the creation of the free space map for small heap relations".Amit Kapila2019-05-07
| | | | | | | | | | | | | | | | | | | | | This feature was using a process local map to track the first few blocks in the relation. The map was reset each time we get the block with enough freespace. It was discussed that it would be better to track this map on a per-relation basis in relcache and then invalidate the same whenever vacuum frees up some space in the page or when FSM is created. The new design would be better both in terms of API design and performance. List of commits reverted, in reverse chronological order: 06c8a5090e Improve code comments in b0eaa4c51b. 13e8643bfc During pg_upgrade, conditionally skip transfer of FSMs. 6f918159a9 Add more tests for FSM. 9c32e4c350 Clear the local map when not used. 29d108cdec Update the documentation for FSM behavior.. 08ecdfe7e5 Make FSM test portable. b0eaa4c51b Avoid creation of the free space map for small heap relations. Discussion: https://postgr.es/m/20190416180452.3pm6uegx54iitbt5@alap3.anarazel.de
* Clear the local map when not used.Amit Kapila2019-03-01
| | | | | | | | | | | | | | | | | After commit b0eaa4c51b, we use a local map of pages to find the required space for small relations. We do clear this map when we have found a block with enough free space, when we extend the relation, or on transaction abort so that it can be used next time. However, we miss to clear it when we didn't find any pages to try from the map which leads to an assertion failure when we later tried to use it after relation extension. In the passing, I have improved some comments in this area. Reported-by: Tom Lane based on buildfarm results Author: Amit Kapila Reviewed-by: John Naylor Tested-by: Kuntal Ghosh Discussion: https://postgr.es/m/32368.1551114120@sss.pgh.pa.us
* Avoid creation of the free space map for small heap relations, take 2.Amit Kapila2019-02-04
| | | | | | | | | | | | | | | | | | | | | | | | Previously, all heaps had FSMs. For very small tables, this means that the FSM took up more space than the heap did. This is wasteful, so now we refrain from creating the FSM for heaps with 4 pages or fewer. If the last known target block has insufficient space, we still try to insert into some other page before giving up and extending the relation, since doing otherwise leads to table bloat. Testing showed that trying every page penalized performance slightly, so we compromise and try every other page. This way, we visit at most two pages. Any pages with wasted free space become visible at next relation extension, so we still control table bloat. As a bonus, directly attempting one or two pages can even be faster than consulting the FSM would have been. Once the FSM is created for a heap we don't remove it even if somebody deletes all the rows from the corresponding relation. We don't think it is a useful optimization as it is quite likely that relation will again grow to the same size. Author: John Naylor, Amit Kapila Reviewed-by: Amit Kapila Tested-by: Mithun C Y Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
* Move page initialization from RelationAddExtraBlocks() to use, take 2.Andres Freund2019-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we initialized pages when bulk extending in RelationAddExtraBlocks(). That has a major disadvantage: It ties RelationAddExtraBlocks() to heap, as other types of storage are likely to need different amounts of special space, have different amount of free space (previously determined by PageGetHeapFreeSpace()). That we're relying on initializing pages, but not WAL logging the initialization, also means the risk for getting "WARNING: relation \"%s\" page %u is uninitialized --- fixing" style warnings in vacuums after crashes/immediate shutdowns, is considerably higher. The warning sounds much more serious than what they are. Fix those two issues together by not initializing pages in RelationAddExtraPages() (but continue to do so in RelationGetBufferForTuple(), which is linked much more closely to heap), and accepting uninitialized pages as normal in vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it to the FSM, but does nothing else. We chose to not issue a debug message, much less a warning in that case - it seems rarely useful, and quite likely to scare people unnecessarily. For now empty pages aren't added to the VM, because standbys would not re-discover such pages after a promotion. In contrast to other sources for empty pages, there's no corresponding WAL records triggering FSM updates during replay. Previously when extending the relation, there was a moment between extending the relation, and acquiring an exclusive lock on the new page, in which another backend could lock the page. To avoid new content being put on that new page, vacuumlazy needed to acquire the extension lock for a brief moment when encountering a new page. A second corner case, only working somewhat by accident, was that RelationGetBufferForTuple() sometimes checks the last page in a relation for free space, without consulting the FSM; that only worked because PageGetHeapFreeSpace() interprets the zero page header in a new page as no free space. The lack of handling this properly required reverting the previous attempt in 684200543b. This issue can be solved by using RBM_ZERO_AND_LOCK when extending the relation, thereby avoiding this window. There's some added complexity when RelationGetBufferForTuple() is called with another buffer (for updates), to avoid deadlocks, but that's rarely hit at runtime. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
* Avoid possible deadlock while locking multiple heap pages.Amit Kapila2019-02-02
| | | | | | | | | | | | | | | | | To avoid deadlock, backend acquires a lock on heap pages in block number order. In certain cases, lock on heap pages is dropped and reacquired. In this case, the locks are dropped for reading in corresponding VM page/s. The issue is we re-acquire locks in bufferId order whereas the intention was to acquire in blockid order. This commit ensures that we will always acquire locks on heap pages in blockid order. Reported-by: Nishant Fnu Author: Nishant Fnu Reviewed-by: Amit Kapila and Robert Haas Backpatch-through: 9.4 Discussion: https://postgr.es/m/5883C831-2ED1-47C8-BFAC-2D5BAE5A8CAE@amazon.com
* Revert "Move page initialization from RelationAddExtraBlocks() to use."Andres Freund2019-01-28
| | | | | | | | | | | | This reverts commit fc02e6724f3ce069b33284bce092052ab55bd751 and e6799d5a53011985d916fdb48fe014a4ae70422e. Parts of the buildfarm error out with ERROR: page %u of relation "%s" should be empty but is not errors, and so far I/we do not know why. fc02e672 didn't fix the issue. As I cannot reproduce the issue locally, it seems best to get the buildfarm green again, and reproduce the issue without time pressure.
* Move page initialization from RelationAddExtraBlocks() to use.Andres Freund2019-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we initialized pages when bulk extending in RelationAddExtraBlocks(). That has a major disadvantage: It ties RelationAddExtraBlocks() to heap, as other types of storage are likely to need different amounts of special space, have different amount of free space (previously determined by PageGetHeapFreeSpace()). That we're relying on initializing pages, but not WAL logging the initialization, also means the risk for getting "WARNING: relation \"%s\" page %u is uninitialized --- fixing" style warnings in vacuums after crashes/immediate shutdowns, is considerably higher. The warning sounds much more serious than what they are. Fix those two issues together by not initializing pages in RelationAddExtraPages() (but continue to do so in RelationGetBufferForTuple(), which is linked much more closely to heap), and accepting uninitialized pages as normal in vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it to the FSM, but does nothing else. We chose to not issue a debug message, much less a warning in that case - it seems rarely useful, and quite likely to scare people unnecessarily. For now empty pages aren't added to the VM, because standbys would not re-discover such pages after a promotion. In contrast to other sources for empty pages, there's no corresponding WAL records triggering FSM updates during replay. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
* Revert "Avoid creation of the free space map for small heap relations."Amit Kapila2019-01-28
| | | | This reverts commit ac88d2962a96a9c7e83d5acfc28fe49a72812086.
* Avoid creation of the free space map for small heap relations.Amit Kapila2019-01-28
| | | | | | | | | | | | | | | | | | | | | | | | Previously, all heaps had FSMs. For very small tables, this means that the FSM took up more space than the heap did. This is wasteful, so now we refrain from creating the FSM for heaps with 4 pages or fewer. If the last known target block has insufficient space, we still try to insert into some other page before giving up and extending the relation, since doing otherwise leads to table bloat. Testing showed that trying every page penalized performance slightly, so we compromise and try every other page. This way, we visit at most two pages. Any pages with wasted free space become visible at next relation extension, so we still control table bloat. As a bonus, directly attempting one or two pages can even be faster than consulting the FSM would have been. Once the FSM is created for a heap we don't remove it even if somebody deletes all the rows from the corresponding relation. We don't think it is a useful optimization as it is quite likely that relation will again grow to the same size. Author: John Naylor with design inputs and some code contribution by Amit Kapila Reviewed-by: Amit Kapila Tested-by: Mithun C Y Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Remove UpdateFreeSpaceMap(), use FreeSpaceMapVacuumRange() instead.Tom Lane2018-03-29
| | | | | | | | | | | | | | | | | FreeSpaceMapVacuumRange has the same effect, is more efficient if many pages are involved, and makes fewer assumptions about how it's used. Notably, Claudio Freire pointed out that UpdateFreeSpaceMap could fail if the specified freespace value isn't the maximum possible. This isn't a problem for the single existing user, but the function represents an attractive nuisance IMO, because it's named as though it were a general-purpose update function and its limitations are undocumented. In any case we don't need multiple ways to get the same result. In passing, do some code review and cleanup in RelationAddExtraBlocks. In particular, I see no excuse for it to omit the PageIsNew safety check that's done in the mainline extension path in RelationGetBufferForTuple. Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
* Remove redundant IndexTupleDSize macro.Tom Lane2018-02-28
| | | | | | | | | | | Use IndexTupleSize everywhere, instead. Also, remove IndexTupleSize's internal typecast, as that's not really needed and might mask coding errors. Change some pointer variable datatypes in the call sites to compensate for that and make it clearer what we're assuming. Ildar Musin, Robert Haas, Stephen Frost Discussion: https://postgr.es/m/0274288e-9e88-13b6-c61c-7b36928bf221@postgrespro.ru
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Phase 2 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* pgindent run for 9.6Robert Haas2016-06-09
|
* Revert no-op changes to BufferGetPage()Kevin Grittner2016-04-20
| | | | | | | | | | | | | | | | | | The reverted changes were intended to force a choice of whether any newly-added BufferGetPage() calls needed to be accompanied by a test of the snapshot age, to support the "snapshot too old" feature. Such an accompanying test is needed in about 7% of the cases, where the page is being used as part of a scan rather than positioning for other purposes (such as DML or vacuuming). The additional effort required for back-patching, and the doubt whether the intended benefit would really be there, have indicated it is best just to rely on developers to do the right thing based on comments and existing usage, as we do with many other conventions. This change should have little or no effect on generated executable code. Motivated by the back-patching pain of Tom Lane and Robert Haas
* Modify BufferGetPage() to prepare for "snapshot too old" featureKevin Grittner2016-04-08
| | | | | | | | | | | This patch is a no-op patch which is intended to reduce the chances of failures of omission once the functional part of the "snapshot too old" patch goes in. It adds parameters for snapshot, relation, and an enum to specify whether the snapshot age check needs to be done for the page at this point. This initial patch passes NULL for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the third. The follow-on patch will change the places where the test needs to be made.
* Extend relations multiple blocks at a time to improve scalability.Robert Haas2016-04-08
| | | | | | | | | Contention on the relation extension lock can become quite fierce when multiple processes are inserting data into the same relation at the same time at a high rate. Experimentation shows the extending the relation multiple blocks at a time improves scalability. Dilip Kumar, reviewed by Petr Jelinek, Amit Kapila, and me.
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* pgindent run for 9.5Bruce Momjian2015-05-23
|
* Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.Andres Freund2015-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The newly added ON CONFLICT clause allows to specify an alternative to raising a unique or exclusion constraint violation error when inserting. ON CONFLICT refers to constraints that can either be specified using a inference clause (by specifying the columns of a unique constraint) or by naming a unique or exclusion constraint. DO NOTHING avoids the constraint violation, without touching the pre-existing row. DO UPDATE SET ... [WHERE ...] updates the pre-existing tuple, and has access to both the tuple proposed for insertion and the existing tuple; the optional WHERE clause can be used to prevent an update from being executed. The UPDATE SET and WHERE clauses have access to the tuple proposed for insertion using the "magic" EXCLUDED alias, and to the pre-existing tuple using the table name or its alias. This feature is often referred to as upsert. This is implemented using a new infrastructure called "speculative insertion". It is an optimistic variant of regular insertion that first does a pre-check for existing tuples and then attempts an insert. If a violating tuple was inserted concurrently, the speculatively inserted tuple is deleted and a new attempt is made. If the pre-check finds a matching tuple the alternative DO NOTHING or DO UPDATE action is taken. If the insertion succeeds without detecting a conflict, the tuple is deemed inserted. To handle the possible ambiguity between the excluded alias and a table named excluded, and for convenience with long relation names, INSERT INTO now can alias its target table. Bumps catversion as stored rules change. Author: Peter Geoghegan, with significant contributions from Heikki Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes. Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs, Dean Rasheed, Stephen Frost and many others.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Allow use of "z" flag in our printf calls, and use it where appropriate.Tom Lane2014-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | Since C99, it's been standard for printf and friends to accept a "z" size modifier, meaning "whatever size size_t has". Up to now we've generally dealt with printing size_t values by explicitly casting them to unsigned long and using the "l" modifier; but this is really the wrong thing on platforms where pointers are wider than longs (such as Win64). So let's start using "z" instead. To ensure we can do that on all platforms, teach src/port/snprintf.c to understand "z", and add a configure test to force use of that implementation when the platform's version doesn't handle "z". Having done that, modify a bunch of places that were using the unsigned-long hack to use "z" instead. This patch doesn't pretend to have gotten everyplace that could benefit, but it catches many of them. I made an effort in particular to ensure that all uses of the same error message text were updated together, so as not to increase the number of translatable strings. It's possible that this change will result in format-string warnings from pre-C99 compilers. We might have to reconsider if there are any popular compilers that will warn about this; but let's start by seeing what the buildfarm thinks. Andres Freund, with a little additional work by me
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Split tuple struct defs from htup.h to htup_details.hAlvaro Herrera2012-08-30
| | | | | | | | | | | | This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.
* Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian2012-06-10
| | | | commit-fest.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Modify RelationGetBufferForTuple() to use a typedef, rather than aBruce Momjian2011-10-12
| | | | struct, to help pgindent.
* Update comments related to the crash-safety of the visibility map.Robert Haas2011-09-27
| | | | | | | | In hio.c, document how we avoid deadlock with respect to visibility map buffer locks. In visibilitymap.c, update the LOCKING section of the file header comment. Both oversights noted by Heikki Linnakangas.
* Try again to make the visibility map crash safe.Robert Haas2011-06-27
| | | | | My previous attempt was quite a bit less than half-baked with respect to heap_update().
* Make the visibility map crash-safe.Robert Haas2011-06-21
| | | | | | | | | | | | | | | | | | | | This involves two main changes from the previous behavior. First, when we set a bit in the visibility map, emit a new WAL record of type XLOG_HEAP2_VISIBLE. Replay sets the page-level PD_ALL_VISIBLE bit and the visibility map bit. Second, when inserting, updating, or deleting a tuple, we can no longer get away with clearing the visibility map bit after releasing the lock on the corresponding heap page, because an intervening crash might leave the visibility map bit set and the page-level bit clear. Making this work requires a bit of interface refactoring. In passing, a few minor but related cleanups: change the test in visibilitymap_set and visibilitymap_clear to throw an error if the wrong page (or no page) is pinned, rather than silently doing nothing; this case should never occur. Also, remove duplicate definitions of InvalidXLogRecPtr. Patch by me, review by Noah Misch.
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-10
|
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Fix up rickety handling of relation-truncation interlocks.Tom Lane2010-02-09
| | | | | | | | | | | | | | | | | | | | Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr relation entries, so that they will get reset to InvalidBlockNumber whenever an smgr-level flush happens. Because we now send smgr invalidation messages immediately (not at end of transaction) when a relation truncation occurs, this ensures that other backends will reset their values before they next access the relation. We no longer need the unreliable assumption that a VACUUM that's doing a truncation will hold its AccessExclusive lock until commit --- in fact, we can intentionally release that lock as soon as we've completed the truncation. This patch therefore reverts (most of) Alvaro's patch of 2009-11-10, as well as my marginal hacking on it yesterday. We can also get rid of assorted no-longer-needed relcache flushes, which are far more expensive than an smgr flush because they kill a lot more state. In passing this patch fixes smgr_redo's failure to perform visibility-map truncation, and cleans up some rather dubious assumptions in freespace.c and visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of date.
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Improve bulk-insert performance by keeping the current target buffer pinnedTom Lane2008-11-06
| | | | | | | (but not locked, as that would risk deadlocks). Also, make it work in a small ring of buffers to avoid having bulk inserts trash the whole buffer arena. Robert Haas, after an idea of Simon Riggs'.
* Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, theHeikki Linnakangas2008-09-30
| | | | | | | | | | | | | free space information is stored in a dedicated FSM relation fork, with each relation (except for hash indexes; they don't use FSM). This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any trace of them from the backend, initdb, and documentation. Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also introduce a new variant of the get_raw_page(regclass, int4, int4) function in contrib/pageinspect that let's you to return pages from any relation fork, and a new fsm_page_contents() function to inspect the new FSM pages.
* Clean up the use of some page-header-access macros: principally, useTom Lane2008-07-13
| | | | | | | | | | SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that makes the code clearer, and avoid casting between Page and PageHeader where possible. Zdenek Kotala, with some additional cleanup by Heikki Linnakangas. I did not apply the parts of the proposed patch that would have resulted in slightly changing the on-disk format of hash indexes; it seems to me that's not a win as long as there's any chance of having in-place upgrade for 8.4.
* Move BufferGetPageSize and BufferGetPage from bufpage.h to bufmgr.h. It isAlvaro Herrera2008-06-08
| | | | | | | | | | more logical that way, and also it reduces the amount of unnecessary includes in bufpage.h, which is widely used. Zdenek Kotala. My previous patch to bufpage.h should also have credited him as author, but I forgot (sorry about that).
* Put back bufmgr.h in bufpage.h -- it is needed by some macros.Alvaro Herrera2008-05-12
| | | | | Remove #include bufmgr.h from (most?) source files which already include bufpage.h.
* Restructure some header files a bit, in particular heapam.h, by removing someAlvaro Herrera2008-05-12
| | | | | | | | | | | | unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|