aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
...
* Update function commentsPeter Eisentraut2018-03-16
| | | | | | | After a6542a4b6870a019cd952d055d2e7af2da2fe102, some function comments were misplaced. Fix that. Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
* Log when a BRIN autosummarization request failsAlvaro Herrera2018-03-14
| | | | | | | | | | | | | | | | Autovacuum's 'workitem' request queue is of limited size, so requests can fail if they arrive more quickly than autovacuum can process them. Emit a log message when this happens, to provide better visibility of this. Backpatch to 10. While this represents an API change for AutoVacuumRequestWork, that function is not yet prepared to deal with external modules calling it, so there doesn't seem to be any risk (other than log spam, that is.) Author: Masahiko Sawada Reviewed-by: Fabrízio Mello, Ildar Musin, Álvaro Herrera Discussion: https://postgr.es/m/CAD21AoB1HrQhp6_4rTyHN5kWEJCEsG8YzsjZNt-ctoXSn5Uisw@mail.gmail.com
* Fix HEAP_INSERT_IS_SPECULATIVE to HEAP_INSERT_SPECULATIVE in comments.Andres Freund2018-03-05
| | | | | | | This was wrong since 168d5805e4c08bed7b95d351bf097cff7c07dd65, which introduced speculative inserts. Author: Andres Freund
* Fix VM buffer pin management in heap_lock_updated_tuple_rec().Tom Lane2018-03-02
| | | | | | | | | | | | | Sloppy coding in this function could lead to leaking a VM buffer pin, or to attempting to free the same pin twice. Repair. While at it, reduce the code's tendency to free and reacquire the same page pin. Back-patch to 9.6; before that, this routine did not concern itself with VM pages. Amit Kapila and Tom Lane Discussion: https://postgr.es/m/CAA4eK1KJKwhc=isgTQHjM76CAdVswzNeAuZkh_cx-6QgGkSEgA@mail.gmail.com
* Make gistvacuumcleanup() count the actual number of index tuples.Tom Lane2018-03-02
| | | | | | | | | | | | | | | Previously, it just returned the heap tuple count, which might be only an estimate, and would be completely the wrong thing if the index is partial. Since this function scans every index page anyway to find free pages, it's practically free to count the surviving index tuples. Let's do that and return an accurate count. This is easily visible as a wrong reltuples value for a partial GiST index following VACUUM, so back-patch to all supported branches. Andrey Borodin, reviewed by Michail Nikolaev Discussion: https://postgr.es/m/151956654251.6915.675951950408204404.pgcf@coridan.postgresql.org
* Relax overly strict sanity check for upgraded ancient databasesAlvaro Herrera2018-03-01
| | | | | | | | | | | | | | | | Commit 4800f16a7ad0 added some sanity checks to ensure we don't accidentally corrupt data, but in one of them we failed to consider the effects of a database upgraded from 9.2 or earlier, where a tuple exclusively locked prior to the upgrade has a slightly different bit pattern. Fix that by using the macro that we fixed in commit 74ebba84aeb6 for similar situations. Reported-by: Alexandre Garcia Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CAPYLKR6yxV4=pfW0Gwij7aPNiiPx+3ib4USVYnbuQdUtmkMaEA@mail.gmail.com Andres suspects that this bug may have wider ranging consequences, but I couldn't find anything.
* Remove redundant IndexTupleDSize macro.Tom Lane2018-02-28
| | | | | | | | | | | Use IndexTupleSize everywhere, instead. Also, remove IndexTupleSize's internal typecast, as that's not really needed and might mask coding errors. Change some pointer variable datatypes in the call sites to compensate for that and make it clearer what we're assuming. Ildar Musin, Robert Haas, Stephen Frost Discussion: https://postgr.es/m/0274288e-9e88-13b6-c61c-7b36928bf221@postgrespro.ru
* Use platform independent type for TupleTableSlot->tts_off.Andres Freund2018-02-20
| | | | | | | | | | | | | Previously tts_off was, for unknown reasons, of type long. For one that's unnecessary as tuples are restricted in length, for another long would be a bad choice of type even if that weren't the case, as it's not reliably wider than an int. Also HeapTupleHeader->t_len is a uint32. This is split off from a larger patch implementing JITed tuple deforming. Seems like an independent improvement, as tiny as it is. Author: Andres Freund
* Minor comment fixPeter Eisentraut2018-02-17
|
* Fix typo in commentMagnus Hagander2018-02-16
|
* Silence assorted "variable may be used uninitialized" warnings.Tom Lane2018-02-14
| | | | | | | | | | | | | | | All of these are false positives, but in each case a fair amount of analysis is needed to see that, and it's not too surprising that not all compilers are smart enough. (In particular, in the logtape.c case, a compiler lacking the knowledge provided by the Assert would almost surely complain, so that this warning will be seen in any non-assert build.) Some of these are of long standing while others are pretty recent, but it only seems worth fixing them in HEAD. Jaime Casanova, tweaked a bit by me Discussion: https://postgr.es/m/CAJGNTeMcYAMJdPAom52dppLMtF-UnEZi0dooj==75OEv1EoBZA@mail.gmail.com
* Update out-of-date comment in StartupXLOG.Robert Haas2018-02-07
| | | | | | | | | Commit 4b0d28de06b28e57c540fca458e4853854fbeaf8 should have updated this comment, but did not. Thomas Munro Discussion: http://postgr.es/m/CAEepm=0iJ8aqQcF9ij2KerAkuHF3SwrVTzjMdm1H4w++nfBf9A@mail.gmail.com
* Support all SQL:2011 options for window frame clauses.Tom Lane2018-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING" frame boundaries in window functions. We'd punted on that back in the original patch to add window functions, because it was not clear how to do it in a reasonably data-type-extensible fashion. That problem is resolved here by adding the ability for btree operator classes to provide an "in_range" support function that defines how to add or subtract the RANGE offset value. Factoring it this way also allows the operator class to avoid overflow problems near the ends of the datatype's range, if it wishes to expend effort on that. (In the committed patch, the integer opclasses handle that issue, but it did not seem worth the trouble to avoid overflow failures for datetime types.) The patch includes in_range support for the integer_ops opfamily (int2/int4/int8) as well as the standard datetime types. Support for other numeric types has been requested, but that seems like suitable material for a follow-on patch. In addition, the patch adds GROUPS mode which counts the offset in ORDER-BY peer groups rather than rows, and it adds the frame_exclusion options specified by SQL:2011. As far as I can see, we are now fully up to spec on window framing options. Existing behaviors remain unchanged, except that I changed the errcode for a couple of existing error reports to meet the SQL spec's expectation that negative "offset" values should be reported as SQLSTATE 22013. Internally and in relevant parts of the documentation, we now consistently use the terminology "offset PRECEDING/FOLLOWING" rather than "value PRECEDING/FOLLOWING", since the term "value" is confusingly vague. Oliver Ford, reviewed and whacked around some by me Discussion: https://postgr.es/m/CAGMVOdu9sivPAxbNN0X+q19Sfv9edEPv=HibOJhB14TJv_RCQg@mail.gmail.com
* Doc: move info for btree opclass implementors into main documentation.Tom Lane2018-02-06
| | | | | | | | | | | | | | | Up to now, useful info for writing a new btree opclass has been buried in the backend's nbtree/README file. Let's move it into the SGML docs, in preparation for extending it with info about "in_range" functions in the upcoming window RANGE patch. To do this, I chose to create a new chapter for btree indexes in Part VII (Internals), parallel to the chapters that exist for the newer index AMs. This is a pretty short chapter as-is. At some point somebody might care to flesh it out with more detail about btree internals, but that is beyond the scope of my ambition for today. Discussion: https://postgr.es/m/23141.1517874668@sss.pgh.pa.us
* Be more wary about shm_toc_lookup failure.Tom Lane2018-02-02
| | | | | | | | | | | | | | | | Commit 445dbd82a basically missed the point of commit d46633506, which was that we shouldn't allow shm_toc_lookup() failure to lead to a core dump or assertion crash, because the odds of such a failure should never be considered negligible. It's correct that we can't expect the PARALLEL_KEY_ERROR_QUEUE TOC entry to be there if we have no workers. But if we have no workers, we're not going to do anything in this function with the lookup result anyway, so let's just skip it. That lets the code use the easy-to-prove-safe noError=false case, rather than anything requiring effort to review. Back-patch to v10, like the previous commit. Discussion: https://postgr.es/m/3647.1517601675@sss.pgh.pa.us
* Support parallel btree index builds.Robert Haas2018-02-02
| | | | | | | | | | | | | | | | | | | | | | To make this work, tuplesort.c and logtape.c must also support parallelism, so this patch adds that infrastructure and then applies it to the particular case of parallel btree index builds. Testing to date shows that this can often be 2-3x faster than a serial index build. The model for deciding how many workers to use is fairly primitive at present, but it's better than not having the feature. We can refine it as we get more experience. Peter Geoghegan with some help from Rushabh Lathia. While Heikki Linnakangas is not an author of this patch, he wrote other patches without which this feature would not have been possible, and therefore the release notes should possibly credit him as an author of this feature. Reviewed by Claudio Freire, Heikki Linnakangas, Thomas Munro, Tels, Amit Kapila, me. Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
* Add new function WaitForParallelWorkersToAttach.Robert Haas2018-02-02
| | | | | | | | | | | | | | | | Once this function has been called, we know that all workers have started and attached to their error queues -- so if any of them subsequently exit uncleanly, we'll be sure to throw an ERROR promptly. Otherwise, users of the ParallelContext machinery must be careful not to wait forever for a worker that has failed to start. Parallel query manages to work without needing this for reasons explained in new comments added by this patch, but it's a useful primitive for other parallel operations, such as the pending patch to make creating a btree index run in parallel. Amit Kapila, revised by me. Additional review by Peter Geoghegan. Discussion: http://postgr.es/m/CAA4eK1+e2MzyouF5bg=OtyhDSX+=Ao=3htN=T-r_6s3gCtKFiw@mail.gmail.com
* Fix possible failure to mark hash metapage dirty.Robert Haas2018-02-01
| | | | | | | Report and suggested fix by Lixian Zou. Amit Kapila put it in the form of a patch and reviewed. Discussion: http://postgr.es/m/151739848647.1239.12528851873396651946@wrigleys.postgresql.org
* Avoid unnecessary use of pg_strcasecmp for already-downcased identifiers.Tom Lane2018-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a lot of code in which option names, which from the user's viewpoint are logically keywords, are passed through the grammar as plain identifiers, and then matched to string literals during command execution. This approach avoids making words into lexer keywords unnecessarily. Some places matched these strings using plain strcmp, some using pg_strcasecmp. But the latter should be unnecessary since identifiers would have been downcased on their way through the parser. Aside from any efficiency concerns (probably not a big factor), the lack of consistency in this area creates a hazard of subtle bugs due to different places coming to different conclusions about whether two option names are the same or different. Hence, standardize on using strcmp() to match any option names that are expected to have been fed through the parser. This does create a user-visible behavioral change, which is that while formerly all of these would work: alter table foo set (fillfactor = 50); alter table foo set (FillFactor = 50); alter table foo set ("fillfactor" = 50); alter table foo set ("FillFactor" = 50); now the last case will fail because that double-quoted identifier is different from the others. However, none of our documentation says that you can use a quoted identifier in such contexts at all, and we should discourage doing so since it would break if we ever decide to parse such constructs as true lexer keywords rather than poor man's substitutes. So this shouldn't create a significant compatibility issue for users. Daniel Gustafsson, reviewed by Michael Paquier, small changes by me Discussion: https://postgr.es/m/29405B24-564E-476B-98C0-677A29805B84@yesql.se
* Update obsolete sentence in README.parallel.Robert Haas2018-01-23
| | | | | | Since 9.6, heavyweight locking is not an abstract and unhandled concern of the parallel machinery, but rather something to which we have a specific approach.
* Report an ERROR if a parallel worker fails to start properly.Robert Haas2018-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 28724fd90d2f85a0573a8107b48abad062a86d83 fixed things so that if a background worker fails to start due to fork() failure or because it is terminated before startup succeeds, BGWH_STOPPED will be reported. However, that only helps if the code that uses the background worker machinery notices the change in status, and the code in parallel.c did not. To fix that, do two things. First, make sure that when a worker exits, it triggers the leader to read from error queues. That way, if a worker which has attached to an error queue exits uncleanly, the leader is sure to throw some error, either the contents of the ErrorResponse sent by the worker, or "lost connection to parallel worker" if it exited without sending one. To cover the case where the worker never starts up in the first place or exits before attaching to the error queue, the ParallelContext now keeps track of which workers have sent at least one message via the error queue. A worker which sends no messages by the time the parallel operation finishes will be checked to see whether it exited before attaching to the error queue; if so, a new error message, "parallel worker failed to initialize", will be reported. If not, we'll continue to wait until it either starts up and exits cleanly, starts up and exits uncleanly, or fails to start, and then take the appropriate action. Patch by me, reviewed by Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoYnBgXgdTu6wk5YPdWhmgabYc9nY_pFLq=tB=FSLYkD8Q@mail.gmail.com
* Replace AclObjectKind with ObjectTypePeter Eisentraut2018-01-19
| | | | | | | | | AclObjectKind was basically just another enumeration for object types, and we already have a preferred one for that. It's only used in aclcheck_error. By using ObjectType instead, we can also give some more precise error messages, for example "index" instead of "relation". Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
* Local partitioned indexesAlvaro Herrera2018-01-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | When CREATE INDEX is run on a partitioned table, create catalog entries for an index on the partitioned table (which is just a placeholder since the table proper has no data of its own), and recurse to create actual indexes on the existing partitions; create them in future partitions also. As a convenience gadget, if the new index definition matches some existing index in partitions, these are picked up and used instead of creating new ones. Whichever way these indexes come about, they become attached to the index on the parent table and are dropped alongside it, and cannot be dropped on isolation unless they are detached first. To support pg_dump'ing these indexes, add commands CREATE INDEX ON ONLY <table> (which creates the index on the parent partitioned table, without recursing) and ALTER INDEX ATTACH PARTITION (which is used after the indexes have been created individually on each partition, to attach them to the parent index). These reconstruct prior database state exactly. Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit Langote, Jesper Pedersen, Simon Riggs, David Rowley Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql
* Transfer state pertaining to pending REINDEX operations to workers.Robert Haas2018-01-19
| | | | | | | | | | | | This will allow the pending patch for parallel CREATE INDEX to work on system catalogs, and to provide the same level of protection against use of user indexes while they are being rebuilt that we have for non-parallel CREATE INDEX. Patch by me, reviewed by Peter Geoghegan. Discussion: http://postgr.es/m/CA+TgmoYN-YQU9JsGQcqFLovZ-C+Xgp1_xhJQad=cunGG-_p5gg@mail.gmail.com Discussion: http://postgr.es/m/CAH2-Wzkv4UNkXYhqQRqk-u9rS7h5c-4cCW+EqQ8K_WSeS43aZg@mail.gmail.com
* Change some bogus PageGetLSN calls to BufferGetLSNAtomicAlvaro Herrera2018-01-09
| | | | | | | | | | | | | | | | | | | | | | | | | | As src/backend/access/transam/README says, PageGetLSN may only be called by processes holding either exclusive lock on buffer, or a shared lock on buffer plus buffer header lock. Therefore any place that only holds a shared buffer lock must use BufferGetLSNAtomic instead of PageGetLSN, which internally obtains buffer header lock prior to reading the LSN. A few callsites failed to comply with this rule. This was detected by running all tests under a new (not committed) assertion that verifies PageGetLSN locking contract. All but one of the callsites that failed the assertion are fixed by this patch. Remaining callsites were inspected manually and determined not to need any change. The exception (unfixed callsite) is in TestForOldSnapshot, which only has a Page argument, making it impossible to access the corresponding Buffer from it. Fixing that seems a much larger patch that will have to be done separately; and that's just as well, since it was only introduced in 9.6 and other bugs are much older. Some of these bugs are ancient; backpatch all the way back to 9.3. Authors: Jacob Champion, Asim Praveen, Ashwin Agrawal Reviewed-by: Michaël Paquier Discussion: https://postgr.es/m/CABAq_6GXgQDVu3u12mK9O5Xt5abBZWQ0V40LZCE+oUf95XyNFg@mail.gmail.com
* Add TIMELINE to backup_label fileSimon Riggs2018-01-06
| | | | | | | Allows new test to confirm timelines match Author: Michael Paquier Reviewed-by: David Steele
* Clean up tupdesc.c for recent changes.Tom Lane2018-01-03
| | | | | | | | | | | | | | | | | | | TupleDescCopy needs to have the same effects as CreateTupleDescCopy in that, since it doesn't copy constraints, it should clear the per-attribute fields associated with them. Oversight in commit cc5f81366. Since TupleDescCopy has already established the presumption that it can just flat-copy the entire attribute array in one go, propagate that approach into CreateTupleDescCopy and CreateTupleDescCopyConstr. (I'm suspicious that this would lead to valgrind complaints if we had any trailing padding in the struct, but we do not, and anyway fixing that seems like a job for a separate commit.) Add some better comments. Thomas Munro, reviewed by Vik Fearing, some additional hacking by me Discussion: https://postgr.es/m/CAEepm=0NvOGZ8B6GbQyQe2C_c2m3LKJ9w=8OMBaYRLgZ_Gw6Nw@mail.gmail.com
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Don't cast between GinNullCategory and boolPeter Eisentraut2018-01-02
| | | | | | | | | | | | | | The original idea was that we could use an isNull-style bool array directly as a GinNullCategory array. However, the existing code already acknowledges that that doesn't really work, because of the possibility that bool as currently defined can have arbitrary bit patterns for true values. So it has to loop through the nullFlags array to set each bool value to an acceptable value. But if we are looping through the whole array anyway, we might as well build a proper GinNullCategory array instead and abandon the type casting. That makes the code much safer in case bool is ever changed to something else. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
* Extend near-wraparound hints to include replication slotsSimon Riggs2017-12-29
| | | | | Author: Feike Steenbergen Reviewed-by: Michael Paquier
* Add optional compression method to SP-GiSTTeodor Sigaev2017-12-22
| | | | | | | | | | | | | | Patch allows to have different types of column and value stored in leaf tuples of SP-GiST. The main application of feature is to transform complex column type to simple indexed type or for truncating too long value, transformation could be lossy. Simple example: polygons are converted to their bounding boxes, this opclass follows. Authors: me, Heikki Linnakangas, Alexander Korotkov, Nikita Glukhov Reviewed-By: all authors + Darafei Praliaskouski Discussions: https://www.postgresql.org/message-id/5447B3FF.2080406@sigaev.ru https://www.postgresql.org/message-id/flat/54907069.1030506@sigaev.ru#54907069.1030506@sigaev.ru
* Adjust assertion in GetCurrentCommandId.Robert Haas2017-12-21
| | | | | | | | | | | | | | | | | currentCommandIdUsed is only used to skip redundant increments of the command counter, and CommandCounterIncrement() is categorically denied under parallelism anyway. Therefore, it's OK for GetCurrentCommandId() to mark the counter value used, as long as it happens in the leader, not a worker. Prior to commit e9baa5e9fa147e00a2466ab2c40eb99c8a700824, the slightly incorrect check didn't matter, but now it does. A test case added by commit 1804284042e659e7d16904e7bbb0ad546394b6a3 uncovered the problem by accident; it caused failures with force_parallel_mode=on/regress. Report and review by Andres Freund. Patch by me. Discussion: http://postgr.es/m/20171221143106.5lhtygohvmazli3x@alap3.anarazel.de
* Cancel CV sleep during subtransaction abort.Robert Haas2017-12-21
| | | | | | | | | | | | | | Generally, error recovery paths that need to do things like LWLockReleaseAll and pgstat_report_wait_end also need to call ConditionVariableCancelSleep, but AbortSubTransaction was missed. Since subtransaction abort might destroy up the DSM segment that contains the ConditionVariable stored in cv_sleep_target, this can result in a crash for anything using condition variables. Reported and diagnosed by Andres Freund. Discussion: http://postgr.es/m/20171221110048.rxk6464azzl5t2fi@alap3.anarazel.de
* Fix bug in cancellation of non-exclusive backup to avoid assertion failure.Fujii Masao2017-12-19
| | | | | | | | | | | | | | | | | | | | | | | | Previously an assertion failure occurred when pg_stop_backup() for non-exclusive backup was aborted while it's waiting for WAL files to be archived. This assertion failure happened in do_pg_abort_backup() which was called when a non-exclusive backup was canceled. do_pg_abort_backup() assumes that there is at least one non-exclusive backup running when it's called. But pg_stop_backup() can be canceled even after it marks the end of non-exclusive backup (e.g., during waiting for WAL archiving). This broke the assumption that do_pg_abort_backup() relies on, and which caused an assertion failure. This commit changes do_pg_abort_backup() so that it does nothing when non-exclusive backup has been already marked as completed. That is, the asssumption is also changed, and do_pg_abort_backup() now can handle even the case where it's called when there is no running backup. Backpatch to 9.6 where SQL-callable non-exclusive backup was added. Author: Masahiko Sawada and Michael Paquier Reviewed-By: Robert Haas and Fujii Masao Discussion: https://www.postgresql.org/message-id/CAD21AoD2L1Fu2c==gnVASMyFAAaq3y-AQ2uEVj-zTCGFFjvmDg@mail.gmail.com
* Perform a lot more sanity checks when freezing tuples.Andres Freund2017-12-14
| | | | | | | | | | | | | | | | The previous commit has shown that the sanity checks around freezing aren't strong enough. Strengthening them seems especially important because the existance of the bug has caused corruption that we don't want to make even worse during future vacuum cycles. The errors are emitted with ereport rather than elog, despite being "should never happen" messages, so a proper error code is emitted. To avoid superflous translations, mark messages as internal. Author: Andres Freund and Alvaro Herrera Reviewed-By: Alvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de Backpatch: 9.3-
* Fix parallel index scan hang with deleted or half-dead pages.Robert Haas2017-12-13
| | | | | | | | | | The previous coding forgot to release the scan before seizing it again, leading to a lockup. Report by Patrick Hemmer. Diagnosis by Thomas Munro. Patch by Amit Kapila. Discussion: http://postgr.es/m/CAEepm=2xZUcOGP9V0O_G0=2P2wwXwPrkF=upWTCJSisUxMnuSg@mail.gmail.com
* Rethink MemoryContext creation to improve performance.Tom Lane2017-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes a number of interrelated changes to reduce the overhead involved in creating/deleting memory contexts. The key ideas are: * Include the AllocSetContext header of an aset.c context in its first malloc request, rather than allocating it separately in TopMemoryContext. This means that we now always create an initial or "keeper" block in an aset, even if it never receives any allocation requests. * Create freelists in which we can save and recycle recently-destroyed asets (this idea is due to Robert Haas). * In the common case where the name of a context is a constant string, just store a pointer to it in the context header, rather than copying the string. The first change eliminates a palloc/pfree cycle per context, and also avoids bloat in TopMemoryContext, at the price that creating a context now involves a malloc/free cycle even if the context never receives any allocations. That would be a loser for some common usage patterns, but recycling short-lived contexts via the freelist eliminates that pain. Avoiding copying constant strings not only saves strlen() and strcpy() overhead, but is an essential part of the freelist optimization because it makes the context header size constant. Currently we make no attempt to use the freelist for contexts with non-constant names. (Perhaps someday we'll need to think harder about that, but in current usage, most contexts with custom names are long-lived anyway.) The freelist management in this initial commit is pretty simplistic, and we might want to refine it later --- but in common workloads that will never matter because the freelists will never get full anyway. To create a context with a non-constant name, one is now required to call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME option. AllocSetContextCreate becomes a wrapper macro, and it includes a test that will complain about non-string-literal context name parameters on gcc and similar compilers. An unfortunate side effect of making AllocSetContextCreate a macro is that one is now *required* to use the size parameter abstraction macros (ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of writing out individual size parameters no longer works unless you switch to AllocSetContextCreateExtended. Internally to the memory-context-related modules, the context creation APIs are simplified, removing the rather baroque original design whereby a context-type module called mcxt.c which then called back into the context-type module. That saved a bit of code duplication, but not much, and it prevented context-type modules from exercising control over the allocation of context headers. In passing, I converted the test-and-elog validation of aset size parameters into Asserts to save a few more cycles. The original thought was that callers might compute size parameters on the fly, but in practice nobody does that, so it's useless to expend cycles on checking those numbers in production builds. Also, mark the memory context method-pointer structs "const", just for cleanliness. Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us
* Fix mistake in commentPeter Eisentraut2017-12-08
| | | | Reported-by: Masahiko Sawada <sawada.mshk@gmail.com>
* Simplify do_pg_start_backup's API by opening pg_tblspc internally.Tom Lane2017-12-04
| | | | | | | | | | | | | do_pg_start_backup() expects its callers to pass in an open DIR pointer for the pg_tblspc directory, but there's no apparent advantage in that. It complicates the callers without adding any flexibility, and there's no robustness advantage, since we surely have to be prepared for errors during the scan of pg_tblspc anyway. In fact, by holding an extra kernel resource during operations like the preliminary checkpoint, we might be making things a fraction more failure-prone not less. Hence, remove that argument and open the directory just for the duration of the actual scan. Discussion: https://postgr.es/m/28752.1512413887@sss.pgh.pa.us
* Clean up assorted messiness around AllocateDir() usage.Tom Lane2017-12-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a couple of low-probability bugs that could lead to reporting an irrelevant errno value (and hence possibly a wrong SQLSTATE) concerning directory-open or file-open failures. It also fixes places where we took shortcuts in reporting such errors, either by using elog instead of ereport or by using ereport but forgetting to specify an errcode. And it eliminates a lot of just plain redundant error-handling code. In service of all this, export fd.c's formerly-static function ReadDirExtended, so that external callers can make use of the coding pattern dir = AllocateDir(path); while ((de = ReadDirExtended(dir, path, LOG)) != NULL) if they'd like to treat directory-open failures as mere LOG conditions rather than errors. Also fix FreeDir to be a no-op if we reach it with dir == NULL, as such a coding pattern would cause. Then, remove code at many call sites that was throwing an error or log message for AllocateDir failure, as ReadDir or ReadDirExtended can handle that job just fine. Aside from being a net code savings, this gets rid of a lot of not-quite-up-to-snuff reports, as mentioned above. (In some places these changes result in replacing a custom error message such as "could not open tablespace directory" with more generic wording "could not open directory", but it was agreed that the custom wording buys little as long as we report the directory name.) In some other call sites where we can't just remove code, change the error reports to be fully project-style-compliant. Also reorder code in restoreTwoPhaseData that was acquiring a lock between AllocateDir and ReadDir; in the unlikely but surely not impossible case that LWLockAcquire changes errno, AllocateDir failures would be misreported. There is no great value in opening the directory before acquiring TwoPhaseStateLock, so just do it in the other order. Also fix CheckXLogRemoved to guarantee that it preserves errno, as quite a number of call sites are implicitly assuming. (Again, it's unlikely but I think not impossible that errno could change during a SpinLockAcquire. If so, this function was broken for its own purposes as well as breaking callers.) And change a few places that were using not-per-project-style messages, such as "could not read directory" when "could not open directory" is more correct. Back-patch the exporting of ReadDirExtended, in case we have occasion to back-patch some fix that makes use of it; it's not needed right now but surely making it global is pretty harmless. Also back-patch the restoreTwoPhaseData and CheckXLogRemoved fixes. The rest of this is essentially cosmetic and need not get back-patched. Michael Paquier, with a bit of additional work by me Discussion: https://postgr.es/m/CAB7nPqRpOCxjiirHmebEFhXVTK7V5Jvw4bz82p7Oimtsm3TyZA@mail.gmail.com
* Update typedefs.list and re-run pgindentRobert Haas2017-11-29
| | | | Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
* Fix ReinitializeParallelDSM to tolerate finding no error queues.Robert Haas2017-11-28
| | | | | | | | | | | | | | | | Commit d4663350646ca0c069a36d906155a0f7e3372eb7 changed things so that shm_toc_lookup would fail with an error rather than silently returning NULL in the hope that such failures would be reported in a useful way rather than via a system crash. However, it overlooked the fact that the lookup of PARALLEL_KEY_ERROR_QUEUE in ReinitializeParallelDSM is expected to fail when no DSM segment was created in the first place; in that case, we end up with a backend-private memory segment that still contains an entry for PARALLEL_KEY_FIXED but no others. Consequently a benign failure to initialize parallelism can escalate into an elog(ERROR); repair. Discussion: http://postgr.es/m/CA+Tgmob8LFw55DzH1QEREpBEA9RJ_W_amhBFCVZ6WMwUhVpOqg@mail.gmail.com
* Fix typo.Robert Haas2017-11-28
| | | | | | Masahiko Sawada Discussion: http://postgr.es/m/CAD21AoCq_QG6UEo6yb-purmhLQjDLsCA2_B+8Wh3ah9P2-XmrQ@mail.gmail.com
* Pad XLogReaderState's per-buffer data_bufsz more aggressively.Simon Riggs2017-11-27
| | | | | | | | | | | | Originally, we palloc'd this buffer just barely big enough to hold the largest xlog backup block seen so far. We now MAXALIGN the palloc size. The original coding could result in many repeated palloc cycles, in the worst case where we see a series ofgradually larger xlog records. We ameliorate that similarly to 8735978e7aebfbc499843630131c18d1f7346c79 by imposing a minimum buffer size of BLCKSZ. Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org
* Pad XLogReaderState's main_data buffer more aggressively.Tom Lane2017-11-26
| | | | | | | | | | | | | | | | | | | | | | | | | Originally, we palloc'd this buffer just barely big enough to hold the largest xlog record seen so far. It turns out that that can result in valgrind complaints, because some compilers will emit code that assumes it can safely fetch padding bytes at the end of a struct, and those padding bytes were unallocated so far as aset.c was concerned. We can fix that by MAXALIGN'ing the palloc request size, ensuring that it is big enough to include any possible padding that might've been omitted from the on-disk record. An additional objection to the original coding is that it could result in many repeated palloc cycles, in the worst case where we see a series of gradually larger xlog records. We can ameliorate that cheaply by imposing a minimum buffer size that's large enough for most xlog records. BLCKSZ/2 was chosen after a bit of discussion. In passing, remove an obsolete comment in struct xl_heap_new_cid that the combocid field is free due to alignment considerations. Perhaps that was true at some point, but it's not now. Back-patch to 9.5 where this code came in. Discussion: https://postgr.es/m/E1eHa4J-0006hI-Q8@gemulon.postgresql.org
* Parameter toast_tuple_target controls TOAST for new rowsSimon Riggs2017-11-20
| | | | | | | | | | | | Specifies the point at which we try to move long column values into TOAST tables. No effect on existing rows. Discussion: https://postgr.es/m/CANP8+jKsVmw6CX6YP9z7zqkTzcKV1+Uzr3XjKcZW=2Ya00OyQQ@mail.gmail.com Author: Simon Riggs <simon@2ndQudrant.com> Reviewed-by: Andrew Dunstan <andrew.dunstan@2ndQuadrant.com>
* Fix broken cleanup interlock for GIN pending list.Robert Haas2017-11-16
| | | | | | | | | | | | | The pending list must (for correctness) always be cleaned up by vacuum, and should (for the avoidance of surprising behavior) always be cleaned up by an explicit call to gin_clean_pending_list, but cleanup is optional when inserting. The old logic got this backward: cleanup was forced if (stats == NULL), but that's going to be *false* when vacuuming and *true* for inserts. Masahiko Sawada, reviewed by me. Discussion: http://postgr.es/m/CAD21AoBLUSyiYKnTYtSAbC+F=XDjiaBrOUEGK+zUXdQ8owfPKw@mail.gmail.com
* Add some const decorations to prototypesPeter Eisentraut2017-11-10
| | | | Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
* Change TRUE/FALSE to true/falsePeter Eisentraut2017-11-08
| | | | | | | | | | | | | | The lower case spellings are C and C++ standard and are used in most parts of the PostgreSQL sources. The upper case spellings are only used in some files/modules. So standardize on the standard spellings. The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so those are left as is when using those APIs. In code comments, we use the lower-case spelling for the C concepts and keep the upper-case spelling for the SQL concepts. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
* Remove secondary checkpointSimon Riggs2017-11-07
| | | | | | | | | | Previously server reserved WAL for last two checkpoints, which used too much disk space for small servers. Bumps PG_CONTROL_VERSION Author: Simon Riggs <simon@2ndQuadrant.com> Reviewed-by: Michael Paquier <michael.paquier@gmail.com>