aboutsummaryrefslogtreecommitdiff
path: root/contrib
Commit message (Collapse)AuthorAge
* Initialize structure at declarationPeter Eisentraut2019-03-25
| | | | | | Avoids extra memset call and cast. Discussion: https://www.postgresql.org/message-id/flat/7a5cbea7-b8df-e910-0f10-04014bcad701%402ndquadrant.com
* Avoid double-free in vacuumlo error path.Tom Lane2019-03-24
| | | | | | | | | | | The code would do "PQclear(res)" twice if lo_unlink failed, evidently due to careless thinking about how far out a "break" would break. Remove the extra PQclear and adjust the loop logic so that we'll fall out of both levels of loop after an error, as was clearly the intent. Spotted by Coverity. I have no idea why it took this long to notice, since the bug has been there since commit 67ccbb080. Accordingly, back-patch to all supported branches.
* tableam: Add tuple_{insert, delete, update, lock} and use.Andres Freund2019-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds new, required, table AM callbacks for insert/delete/update and lock_tuple. To be able to reasonably use those, the EvalPlanQual mechanism had to be adapted, moving more logic into the AM. Previously both delete/update/lock call-sites and the EPQ mechanism had to have awareness of the specific tuple format to be able to fetch the latest version of a tuple. Obviously that needs to be abstracted away. To do so, move the logic that find the latest row version into the AM. lock_tuple has a new flag argument, TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last version, rather than the current one. It'd have been possible to do so via a separate callback as well, but finding the last version usually also necessitates locking the newest version, making it sensible to combine the two. This replaces the previous use of EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously signaled either a concurrent update or delete, is now split into two, to avoid callers needing AM specific knowledge to differentiate. The move of finding the latest row version into tuple_lock means that encountering a row concurrently moved into another partition will now raise an error about "tuple to be locked" rather than "tuple to be updated/deleted" - which is accurate, as that always happens when locking rows. While possible slightly less helpful for users, it seems like an acceptable trade-off. As part of this commit HTSU_Result has been renamed to TM_Result, and its members been expanded to differentiated between updating and deleting. HeapUpdateFailureData has been renamed to TM_FailureData. The interface to speculative insertion is changed so nodeModifyTable.c does not have to set the speculative token itself anymore. Instead there's a version of tuple_insert, tuple_insert_speculative, that performs the speculative insertion (without requiring a flag to signal that fact), and the speculative insertion is either made permanent with table_complete_speculative(succeeded = true) or aborted with succeeded = false). Note that multi_insert is not yet routed through tableam, nor is COPY. Changing multi_insert requires changes to copy.c that are large enough to better be done separately. Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED VIEW are also only going to be adjusted in a later commit. Author: Andres Freund and Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
* Collations with nondeterministic comparisonPeter Eisentraut2019-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a flag "deterministic" to collations. If that is false, such a collation disables various optimizations that assume that strings are equal only if they are byte-wise equal. That then allows use cases such as case-insensitive or accent-insensitive comparisons or handling of strings with different Unicode normal forms. This functionality is only supported with the ICU provider. At least glibc doesn't appear to have any locales that work in a nondeterministic way, so it's not worth supporting this for the libc provider. The term "deterministic comparison" in this context is from Unicode Technical Standard #10 (https://unicode.org/reports/tr10/#Deterministic_Comparison). This patch makes changes in three areas: - CREATE COLLATION DDL changes and system catalog changes to support this new flag. - Many executor nodes and auxiliary code are extended to track collations. Previously, this code would just throw away collation information, because the eventually-called user-defined functions didn't use it since they only cared about equality, which didn't need collation information. - String data type functions that do equality comparisons and hashing are changed to take the (non-)deterministic flag into account. For comparison, this just means skipping various shortcuts and tie breakers that use byte-wise comparison. For hashing, we first need to convert the input string to a canonical "sort key" using the ICU analogue of strxfrm(). Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
* Allow amcheck to re-find tuples using new search.Peter Geoghegan2019-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach contrib/amcheck's bt_index_parent_check() function to take advantage of the uniqueness property of heapkeyspace indexes in support of a new verification option: non-pivot tuples (non-highkey tuples on the leaf level) can optionally be re-found using a new search for each, that starts from the root page. If a tuple cannot be re-found, report that the index is corrupt. The new "rootdescend" verification option is exhaustive, and can therefore make a call to bt_index_parent_check() take a lot longer. Re-finding tuples during verification is mostly intended as an option for backend developers, since the corruption scenarios that it alone is uniquely capable of detecting seem fairly far-fetched. For example, "rootdescend" verification is much more likely to detect corruption of the least significant byte of a key from a pivot tuple in the root page of a B-Tree that already has at least three levels. Typically, only a few tuples on a cousin leaf page are at risk of "getting overlooked" by index scans in this scenario. The corrupt key in the root page is only slightly corrupt: corrupt enough to give wrong answers to some queries, and yet not corrupt enough to allow the problem to be detected without verifying agreement between the leaf page and the root page, skipping at least one internal page level. The existing bt_index_parent_check() checks never cross more than a single level. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/CAH2-Wz=yTWnVu+HeHGKb2AGiADL9eprn-cKYAto4MkKOuiGtRQ@mail.gmail.com
* Make heap TID a tiebreaker nbtree index column.Peter Geoghegan2019-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make nbtree treat all index tuples as having a heap TID attribute. Index searches can distinguish duplicates by heap TID, since heap TID is always guaranteed to be unique. This general approach has numerous benefits for performance, and is prerequisite to teaching VACUUM to perform "retail index tuple deletion". Naively adding a new attribute to every pivot tuple has unacceptable overhead (it bloats internal pages), so suffix truncation of pivot tuples is added. This will usually truncate away the "extra" heap TID attribute from pivot tuples during a leaf page split, and may also truncate away additional user attributes. This can increase fan-out, especially in a multi-column index. Truncation can only occur at the attribute granularity, which isn't particularly effective, but works well enough for now. A future patch may add support for truncating "within" text attributes by generating truncated key values using new opclass infrastructure. Only new indexes (BTREE_VERSION 4 indexes) will have insertions that treat heap TID as a tiebreaker attribute, or will have pivot tuples undergo suffix truncation during a leaf page split (on-disk compatibility with versions 2 and 3 is preserved). Upgrades to version 4 cannot be performed on-the-fly, unlike upgrades from version 2 to version 3. contrib/amcheck continues to work with version 2 and 3 indexes, while also enforcing stricter invariants when verifying version 4 indexes. These stricter invariants are the same invariants described by "3.1.12 Sequencing" from the Lehman and Yao paper. A later patch will enhance the logic used by nbtree to pick a split point. This patch is likely to negatively impact performance without smarter choices around the precise point to split leaf pages at. Making these two mostly-distinct sets of enhancements into distinct commits seems like it might clarify their design, even though neither commit is particularly useful on its own. The maximum allowed size of new tuples is reduced by an amount equal to the space required to store an extra MAXALIGN()'d TID in a new high key during leaf page splits. The user-facing definition of the "1/3 of a page" restriction is already imprecise, and so does not need to be revised. However, there should be a compatibility note in the v12 release notes. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas, Alexander Korotkov Discussion: https://postgr.es/m/CAH2-WzkVb0Kom=R+88fDFb=JSxZMFvbHVC6Mn9LJ2n=X=kS-Uw@mail.gmail.com
* Refactor nbtree insertion scankeys.Peter Geoghegan2019-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use dedicated struct to represent nbtree insertion scan keys. Having a dedicated struct makes the difference between search type scankeys and insertion scankeys a lot clearer, and simplifies the signature of several related functions. This is based on a suggestion by Andrey Lepikhov. Streamline how unique index insertions cache binary search progress. Cache the state of in-progress binary searches within _bt_check_unique() for later instead of having callers avoid repeating the binary search in an ad-hoc manner. This makes it easy to add a new optimization: _bt_check_unique() now falls out of its loop immediately in the common case where it's already clear that there couldn't possibly be a duplicate. The new _bt_check_unique() scheme makes it a lot easier to manage cached binary search effort afterwards, from within _bt_findinsertloc(). This is needed for the upcoming patch to make nbtree tuples unique by treating heap TID as a final tiebreaker column. Unique key binary searches need to restore lower and upper bounds. They cannot simply continue to use the >= lower bound as the offset to insert at, because the heap TID tiebreaker column must be used in comparisons for the restored binary search (unlike the original _bt_check_unique() binary search, where scankey's heap TID column must be omitted). Author: Peter Geoghegan, Heikki Linnakangas Reviewed-By: Heikki Linnakangas, Andrey Lepikhov Discussion: https://postgr.es/m/CAH2-WzmE6AhUdk9NdWBf4K3HjWXZBX3+umC7mH7+WDrKcRtsOw@mail.gmail.com
* Remove leftover reference to oid column.Andres Freund2019-03-18
| | | | | | | I (Andres) missed this in 578b229718e8. Author: John Naylor Discussion: https://postgr.es/m/CACPNZCtd+ckUgibRFs9KewK4Yr5rj3Oipefquupw+XJZebFhrA@mail.gmail.com
* Don't auto-restart per-database autoprewarm workers.Robert Haas2019-03-18
| | | | | | | | | | | | | We should try to prewarm each database only once. Otherwise, if prewarming fails for some reason, it will just keep retrying in an infnite loop. This can happen if, for example, the database has been dropped. The existing code was intended to implement the try-once behavior, but failed to do so because it neglected to set worker.bgw_restart_time to BGW_NEVER_RESTART. Mithun Cy, per a report from Hans Buschmann Discussion: http://postgr.es/m/CA+hUKGKpQJCWcgyy3QTC9vdn6uKAR_8r__A-MMm2GYfj45caag@mail.gmail.com
* Fix volatile vs. pointer confusionPeter Eisentraut2019-03-14
| | | | | | | | | | | | | | Variables used after a longjmp() need to be declared volatile. In case of a pointer, it's the pointer itself that needs to be declared volatile, not the pointed-to value. So we need PyObject *volatile items; instead of volatile PyObject *items; /* wrong */ Discussion: https://www.postgresql.org/message-id/flat/f747368d-9e1a-c46a-ac76-3c27da32e8e4%402ndquadrant.com
* tableam: Add and use scan APIs.Andres Freund2019-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
* Move hash_any prototype from access/hash.h to utils/hashutils.hAlvaro Herrera2019-03-11
| | | | | | | | | | | | | | | | | | | | | ... as well as its implementation from backend/access/hash/hashfunc.c to backend/utils/hash/hashfn.c. access/hash is the place for the hash index AM, not really appropriate for generic facilities, which is what hash_any is; having things the old way meant that anything using hash_any had to include the AM's include file, pointlessly polluting its namespace with unrelated, unnecessary cruft. Also move the HTEqual strategy number to access/stratnum.h from access/hash.h. To avoid breaking third-party extension code, add an #include "utils/hashutils.h" to access/hash.h. (An easily removed line by committers who enjoy their asbestos suits to protect them from angry extension authors.) Discussion: https://postgr.es/m/201901251935.ser5e4h6djt2@alvherre.pgsql
* Tighten use of OpenTransientFile and CloseTransientFileMichael Paquier2019-03-09
| | | | | | | | | | | | | | | | | | | | | | | | | | This fixes two sets of issues related to the use of transient files in the backend: 1) OpenTransientFile() has been used in some code paths with read-write flags while read-only is sufficient, so switch those calls to be read-only where necessary. These have been reported by Joe Conway. 2) When opening transient files, it is up to the caller to close the file descriptors opened. In error code paths, CloseTransientFile() gets called to clean up things before issuing an error. However in normal exit paths, a lot of callers of CloseTransientFile() never actually reported errors, which could leave a file descriptor open without knowing about it. This is an issue I complained about a couple of times, but never had the courage to write and submit a patch, so here we go. Note that one frontend code path is impacted by this commit so as an error is issued when fetching control file data, making backend and frontend to be treated consistently. Reported-by: Joe Conway, Michael Paquier Author: Michael Paquier Reviewed-by: Álvaro Herrera, Georgios Kokolatos, Joe Conway Discussion: https://postgr.es/m/20190301023338.GD1348@paquier.xyz Discussion: https://postgr.es/m/c49b69ec-e2f7-ff33-4f17-0eaa4f2cef27@joeconway.com
* Remove unused macroPeter Eisentraut2019-02-28
| | | | It has never been used as long as hstore has been in the tree.
* Use slots in trigger infrastructure, except for the actual invocation.Andres Freund2019-02-26
| | | | | | | | | | | | | | | | | | | | | In preparation for abstracting table storage, convert trigger.c to track tuples in slots. Which also happens to make code calling triggers simpler. As the calling interface for triggers themselves is not changed in this patch, HeapTuples still are extracted from the slot at that time. But that's handled solely inside trigger.c, not visible to callers. It's quite likely that we'll want to revise the external trigger interface, but that's a separate large project. As part of this work the slots used for old/new/return tuples are moved from EState into ResultRelInfo, as different updated tables might need different slots. The slots are now also now created on-demand, which is good both from an efficiency POV, but also makes the modifying code simpler. Author: Andres Freund, Amit Khandekar and Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Doc: Update the documentation for FSM behavior for small tables.Amit Kapila2019-02-20
| | | | | | | | | | | | | | In commit b0eaa4c51b, we have avoided the creation of FSM for small tables. So the functions that use FSM to compute the free space can return zero for such tables. This was previously also possible for the cases where the vacuum has not been triggered to update FSM. This commit updates the comments in code and documentation to reflect this behavior. Author: John Naylor Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CACPNZCtba-3m1q3A8gxA_vxg=T7gQf7gMbpR4Ciy5LntY-j+0Q@mail.gmail.com
* Suppress another case of MSVC warning 4146.Noah Misch2019-02-16
|
* In imath.h, replace stdint.h usage with c.h equivalents.Noah Misch2019-02-16
| | | | | | | | | As things stood, buildfarm member dory failed. MSVC versions lacking stdint.h are unusable for building PostgreSQL, but pg_config.h.win32 doesn't know that. Even so, we support other systems lacking stdint.h, including buildfarm member gaur. Per a suggestion from Tom Lane. Discussion: https://postgr.es/m/9598.1550353336@sss.pgh.pa.us
* Import changes from IMath versions (1.3, 1.29].Noah Misch2019-02-16
| | | | | | | | | | | | | | Upstream fixed bugs over the years, but none are confirmed to have affected pgcrypto. We're better off naively tracking upstream than reactively maintaining a twelve-year-old snapshot of upstream. Add a header comment describing the synchronization procedure. Discard use of INVERT_COMPARE_RESULT(); the domain of the comparisons in question is {-1,0,1}, controlled entirely by code in imath.c. Andrew Gierth reviewed the Makefile change. Tom Lane reviewed the synchronization procedure description. Discussion: https://postgr.es/m/20190203035704.GA6226@rfd.leadboat.com
* Allow user control of CTE materialization, and change the default behavior.Tom Lane2019-02-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically we've always materialized the full output of a CTE query, treating WITH as an optimization fence (so that, for example, restrictions from the outer query cannot be pushed into it). This is appropriate when the CTE query is INSERT/UPDATE/DELETE, or is recursive; but when the CTE query is non-recursive and side-effect-free, there's no hazard of changing the query results by pushing restrictions down. Another argument for materialization is that it can avoid duplicate computation of an expensive WITH query --- but that only applies if the WITH query is called more than once in the outer query. Even then it could still be a net loss, if each call has restrictions that would allow just a small part of the WITH query to be computed. Hence, let's change the behavior for WITH queries that are non-recursive and side-effect-free. By default, we will inline them into the outer query (removing the optimization fence) if they are called just once. If they are called more than once, we will keep the old behavior by default, but the user can override this and force inlining by specifying NOT MATERIALIZED. Lastly, the user can force the old behavior by specifying MATERIALIZED; this would mainly be useful when the query had deliberately been employing WITH as an optimization fence to prevent a poor choice of plan. Andreas Karlsson, Andrew Gierth, David Fetter Discussion: https://postgr.es/m/87sh48ffhb.fsf@news-spur.riddles.org.uk
* Make use of compiler builtins and/or assembly for CLZ, CTZ, POPCNT.Tom Lane2019-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test for the compiler builtins __builtin_clz, __builtin_ctz, and __builtin_popcount, and make use of these in preference to handwritten C code if they're available. Create src/port infrastructure for "leftmost one", "rightmost one", and "popcount" so as to centralize these decisions. On x86_64, __builtin_popcount generally won't make use of the POPCNT opcode because that's not universally supported yet. Provide code that checks CPUID and then calls POPCNT via asm() if available. This requires indirecting through a function pointer, which is an annoying amount of overhead for a one-instruction operation, but it's probably not worth working harder than this for our current use-cases. I'm not sure we've found all the existing places that could profit from this new infrastructure; but we at least touched all the ones that used copied-and-pasted versions of the bitmapset.c code, and got rid of multiple copies of the associated constant arrays. While at it, replace c-compiler.m4's one-per-builtin-function macros with a single one that can handle all the cases we need to worry about so far. Also, because I'm paranoid, make those checks into AC_LINK checks rather than just AC_COMPILE; the former coding failed to verify that libgcc has support for the builtin, in cases where it's not inline code. David Rowley, Thomas Munro, Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com
* Refactor index cost estimation functions in view of IndexClause changes.Tom Lane2019-02-15
| | | | | | | | | | | | | | | | | | | Get rid of deconstruct_indexquals() in favor of just iterating over the IndexClause list directly. The extra services that that function used to provide, such as hiding clause commutation and associating the right index column with each clause, are no longer useful given the new data structure. I'd originally thought that it'd provide a useful amount of abstraction by freeing callers from paying attention to the exact clause type of each indexqual, but that hope proves to have been vain, because few callers can ignore the semantic differences between different clause types. Indeed, removing it results in a net code savings, and probably some cycles shaved by not having to build an extra list-of-structs data structure. Also, export a few formerly-static support functions, with the goal of allowing extension AMs to write functionality equivalent to genericcostestimate() without pointless code duplication. Discussion: https://postgr.es/m/24586.1550106354@sss.pgh.pa.us
* Resolve one unconstify usePeter Eisentraut2019-02-14
| | | | | | | A small API change makes it unnecessary. Reported-by: Mark Dilger <hornschnorter@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/53a28052-f9f3-1808-fed9-460fd43035ab%402ndquadrant.com
* Change floating-point output format for improved performance.Andrew Gierth2019-02-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, floating-point output was done by rounding to a specific decimal precision; by default, to 6 or 15 decimal digits (losing information) or as requested using extra_float_digits. Drivers that wanted exact float values, and applications like pg_dump that must preserve values exactly, set extra_float_digits=3 (or sometimes 2 for historical reasons, though this isn't enough for float4). Unfortunately, decimal rounded output is slow enough to become a noticable bottleneck when dealing with large result sets or COPY of large tables when many floating-point values are involved. Floating-point output can be done much faster when the output is not rounded to a specific decimal length, but rather is chosen as the shortest decimal representation that is closer to the original float value than to any other value representable in the same precision. The recently published Ryu algorithm by Ulf Adams is both relatively simple and remarkably fast. Accordingly, change float4out/float8out to output shortest decimal representations if extra_float_digits is greater than 0, and make that the new default. Applications that need rounded output can set extra_float_digits back to 0 or below, and take the resulting performance hit. We make one concession to portability for systems with buggy floating-point input: we do not output decimal values that fall exactly halfway between adjacent representable binary values (which would rely on the reader doing round-to-nearest-even correctly). This is known to be a problem at least for VS2013 on Windows. Our version of the Ryu code originates from https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the following (significant) modifications: - Output format is changed to use fixed-point notation for small exponents, as printf would, and also to use lowercase 'e', a minimum of 2 exponent digits, and a mandatory sign on the exponent, to keep the formatting as close as possible to previous output. - The output of exact midpoint values is disabled as noted above. - The integer fast-path code is changed somewhat (since we have fixed-point output and the upstream did not). - Our project style has been largely applied to the code with the exception of C99 declaration-after-statement, which has been retained as an exception to our present policy. - Most of upstream's debugging and conditionals are removed, and we use our own configure tests to determine things like uint128 availability. Changing the float output format obviously affects a number of regression tests. This patch uses an explicit setting of extra_float_digits=0 for test output that is not expected to be exactly reproducible (e.g. due to numerical instability or differing algorithms for transcendental functions). Conversions from floats to numeric are unchanged by this patch. These may appear in index expressions and it is not yet clear whether any change should be made, so that can be left for another day. This patch assumes that the only supported floating point format is now IEEE format, and the documentation is updated to reflect that. Code by me, adapting the work of Ulf Adams and other contributors. References: https://dl.acm.org/citation.cfm?id=3192369 Reviewed-by: Tom Lane, Andres Freund, Donald Dong Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
* More unconstify usePeter Eisentraut2019-02-13
| | | | | | | Replace casts whose only purpose is to cast away const with the unconstify() macro. Discussion: https://www.postgresql.org/message-id/flat/53a28052-f9f3-1808-fed9-460fd43035ab%402ndquadrant.com
* Remove useless castsPeter Eisentraut2019-02-13
| | | | | | | Some of these were uselessly casting away "const", some were just nearby, but they where all unnecessary anyway. Discussion: https://www.postgresql.org/message-id/flat/53a28052-f9f3-1808-fed9-460fd43035ab%402ndquadrant.com
* Relax overly strict assertionAlvaro Herrera2019-02-12
| | | | | | | | | | | | | | | Ever since its birth, ReorderBufferBuildTupleCidHash() has contained an assertion that a catalog tuple cannot change Cmax after acquiring one. But that's wrong: if a subtransaction executes DDL that affects that catalog tuple, and later aborts and another DDL affects the same tuple, it will change Cmax. Relax the assertion to merely verify that the Cmax remains valid and monotonically increasing, instead. Add a test that tickles the relevant code. Diagnosed by, and initial patch submitted by: Arseny Sher Co-authored-by: Arseny Sher Discussion: https://postgr.es/m/874l9p8hyw.fsf@ars-thinkpad
* Build out the planner support function infrastructure.Tom Lane2019-02-09
| | | | | | | | | | | | | | | | | | | | | | | | Add support function requests for estimating the selectivity, cost, and number of result rows (if a SRF) of the target function. The lack of a way to estimate selectivity of a boolean-returning function in WHERE has been a recognized deficiency of the planner since Berkeley days. This commit finally fixes it. In addition, non-constant estimates of cost and number of output rows are now possible. We still fall back to looking at procost and prorows if the support function doesn't service the request, of course. To make concrete use of the possibility of estimating output rowcount for SRFs, this commit adds support functions for array_unnest(anyarray) and the integer variants of generate_series; the lack of plausible rowcount estimates for those, even when it's obvious to a human, has been a repeated subject of complaints. Obviously, much more could now be done in this line, but I'm mostly just trying to get the infrastructure in place. Discussion: https://postgr.es/m/15193.1548028093@sss.pgh.pa.us
* Split create_foreignscan_path() into three functions.Tom Lane2019-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now postgres_fdw has been using create_foreignscan_path() to generate not only base-relation paths, but also paths for foreign joins and foreign upperrels. This is wrong, because create_foreignscan_path() calls get_baserel_parampathinfo() which will only do the right thing for baserels. It accidentally fails to fail for unparameterized paths, which are the only ones postgres_fdw (thought it) was handling, but we really need different APIs for the baserel and join cases. In HEAD, the best thing to do seems to be to split up the baserel, joinrel, and upperrel cases into three functions so that they can have different APIs. I haven't actually given create_foreign_join_path a different API in this commit: we should spend a bit of time thinking about just what we want to do there, since perhaps FDWs would want to do something different from the build-up-a-join-pairwise approach that get_joinrel_parampathinfo expects. In the meantime, since postgres_fdw isn't prepared to generate parameterized joins anyway, just give it a defense against trying to plan joins with lateral refs. In addition (and this is what triggered this whole mess) fix bug #15613 from Srinivasan S A, by teaching file_fdw and postgres_fdw that plain baserel foreign paths still have outer refs if the relation has lateral_relids. Add some assertions in relnode.c to catch future occurrences of the same error --- in particular, to catch other FDWs doing that, but also as backstop against core-code mistakes like the one fixed by commit bdd9a99aa. Bug #15613 also needs to be fixed in the back branches, but the appropriate fix will look quite a bit different there, since we don't want to assume that existing FDWs get the word right away. Discussion: https://postgr.es/m/15613-092be1be9576c728@postgresql.org
* Avoid amcheck inline compression false positives.Peter Geoghegan2019-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous tacit assumption that index_form_tuple() hides differences in the TOAST state of its input datums was wrong. Normalize input varlena datums by decompressing compressed values, and forming a new index tuple for fingerprinting using uncompressed inputs. The final normalized representation may actually be compressed once again within index_form_tuple(), though that shouldn't matter. When the original tuple is found to have no datums that are compressed inline, fingerprint the original tuple directly. Normalization avoids false positive reports of corruption in certain cases. For example, the executor can apply toasting with some inline compression to an entire heap tuple because its input has a single external TOAST pointer. Varlena datums for other attributes that are not particularly good candidates for inline compression can be compressed in the heap tuple in passing, without the representation of the same values in index tuples ever receiving concomitant inline compression. Add a test case to recreate the issue in a simpler though less realistic way: by exploiting differences in pg_attribute.attstorage between heap and index relations. This bug was discovered by me during testing of an upcoming set of nbtree enhancements. It was also independently reported by Andreas Kunert, as bug #15597. His test case was rather more realistic than the one I ended up using. Bug: #15597 Discussion: https://postgr.es/m/CAH2-WznrVd9ie+TTJ45nDT+v2nUt6YJwQrT9SebCdQKtAvfPZw@mail.gmail.com Discussion: https://postgr.es/m/15597-294e5d3e7f01c407@postgresql.org Backpatch: 11-, where heapallindexed verification was introduced.
* Make FSM test portable.Amit Kapila2019-02-04
| | | | | | | | | | | | In b0eaa4c51b, we allow FSM to be created only after 4 pages. One of the tests check the FSM contents and to do that it populates many tuples in the relation. The FSM contents depend on the availability of freespace in the page and it could vary because of the alignment of tuples. This commit removes the dependency on FSM contents. Author: Amit Kapila Discussion: https://postgr.es/m/CAA4eK1KADF6K1bagr0--mGv3dMcZ%3DH_Z-Qtvdfbp5PjaC6PJJA%40mail.gmail.com
* Avoid creation of the free space map for small heap relations, take 2.Amit Kapila2019-02-04
| | | | | | | | | | | | | | | | | | | | | | | | Previously, all heaps had FSMs. For very small tables, this means that the FSM took up more space than the heap did. This is wasteful, so now we refrain from creating the FSM for heaps with 4 pages or fewer. If the last known target block has insufficient space, we still try to insert into some other page before giving up and extending the relation, since doing otherwise leads to table bloat. Testing showed that trying every page penalized performance slightly, so we compromise and try every other page. This way, we visit at most two pages. Any pages with wasted free space become visible at next relation extension, so we still control table bloat. As a bonus, directly attempting one or two pages can even be faster than consulting the FSM would have been. Once the FSM is created for a heap we don't remove it even if somebody deletes all the rows from the corresponding relation. We don't think it is a useful optimization as it is quite likely that relation will again grow to the same size. Author: John Naylor, Amit Kapila Reviewed-by: Amit Kapila Tested-by: Mithun C Y Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
* Renaming for new subscripting mechanismAlvaro Herrera2019-02-01
| | | | | | | | | | | | Over at patch https://commitfest.postgresql.org/21/1062/ Dmitry wants to introduce a more generic subscription mechanism, which allows subscripting not only arrays but also other object types such as JSONB. That functionality is introduced in a largish invasive patch, out of which this internal renaming patch was extracted. Author: Dmitry Dolgov Reviewed-by: Tom Lane, Arthur Zakirov Discussion: https://postgr.es/m/CA+q6zcUK4EqPAu7XRRO5CCjMwhz5zvg+rfWuLzVoxp_5sKS6=w@mail.gmail.com
* Add combining characters to unaccent.rules.Thomas Munro2019-02-01
| | | | | | | | Strip certain classes of combining characters, so that accents encoded this way are removed. Author: Hugh Ranalli Discussion: https://postgr.es/m/15548-cef1b3f8de190d4f%40postgresql.org
* Rename nodes/relation.h to nodes/pathnodes.h.Tom Lane2019-01-29
| | | | | | | | | | | | | The old name of this file was never a very good indication of what it was for. Now that there's also access/relation.h, we have a potential confusion hazard as well, so let's rename it to something more apropos. Per discussion, "pathnodes.h" is reasonable, since a good fraction of the file is Path node definitions. While at it, tweak a couple of other headers that were gratuitously importing relation.h into modules that don't need it. Discussion: https://postgr.es/m/7719.1548688728@sss.pgh.pa.us
* Refactor planner's header files.Tom Lane2019-01-29
| | | | | | | | | | | | | | | | | | | | | | | | Create a new header optimizer/optimizer.h, which exposes just the planner functions that can be used "at arm's length", without need to access Paths or the other planner-internal data structures defined in nodes/relation.h. This is intended to provide the whole planner API seen by most of the rest of the system; although FDWs still need to use additional stuff, and more thought is also needed about just what selfuncs.c should rely on. The main point of doing this now is to limit the amount of new #include baggage that will be needed by "planner support functions", which I expect to introduce later, and which will be in relevant datatype modules rather than anywhere near the planner. This commit just moves relevant declarations into optimizer.h from other header files (a couple of which go away because everything got moved), and adjusts #include lists to match. There's further cleanup that could be done if we want to decide that some stuff being exposed by optimizer.h doesn't belong in the planner at all, but I'll leave that for another day. Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
* postgres_fdw: Fix test for cached costs in estimate_path_cost_size().Etsuro Fujita2019-01-29
| | | | | | | | | | | estimate_path_cost_size() failed to re-use cached costs when the cached startup/total cost was 0, so it calculated the costs redundantly. This is an oversight in commit aa09cd242f; but apply the patch to HEAD only because there are no reports of actual trouble from that. Author: Etsuro Fujita Discussion: https://postgr.es/m/5C4AF3F3.4060409%40lab.ntt.co.jp
* In the planner, replace an empty FROM clause with a dummy RTE.Tom Lane2019-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fact that "SELECT expression" has no base relations has long been a thorn in the side of the planner. It makes it hard to flatten a sub-query that looks like that, or is a trivial VALUES() item, because the planner generally uses relid sets to identify sub-relations, and such a sub-query would have an empty relid set if we flattened it. prepjointree.c contains some baroque logic that works around this in certain special cases --- but there is a much better answer. We can replace an empty FROM clause with a dummy RTE that acts like a table of one row and no columns, and then there are no such corner cases to worry about. Instead we need some logic to get rid of useless dummy RTEs, but that's simpler and covers more cases than what was there before. For really trivial cases, where the query is just "SELECT expression" and nothing else, there's a hazard that adding the extra RTE makes for a noticeable slowdown; even though it's not much processing, there's not that much for the planner to do overall. However testing says that the penalty is very small, close to the noise level. In more complex queries, this is able to find optimizations that we could not find before. The new RTE type is called RTE_RESULT, since the "scan" plan type it gives rise to is a Result node (the same plan we produced for a "SELECT expression" query before). To avoid confusion, rename the old ResultPath path type to GroupResultPath, reflecting that it's only used in degenerate grouping cases where we know the query produces just one grouped row. (It wouldn't work to unify the two cases, because there are different rules about where the associated quals live during query_planner.) Note: although this touches readfuncs.c, I don't think a catversion bump is required, because the added case can't occur in stored rules, only plans. Patch by me, reviewed by David Rowley and Mark Dilger Discussion: https://postgr.es/m/15944.1521127664@sss.pgh.pa.us
* Revert "Avoid creation of the free space map for small heap relations."Amit Kapila2019-01-28
| | | | This reverts commit ac88d2962a96a9c7e83d5acfc28fe49a72812086.
* Avoid creation of the free space map for small heap relations.Amit Kapila2019-01-28
| | | | | | | | | | | | | | | | | | | | | | | | Previously, all heaps had FSMs. For very small tables, this means that the FSM took up more space than the heap did. This is wasteful, so now we refrain from creating the FSM for heaps with 4 pages or fewer. If the last known target block has insufficient space, we still try to insert into some other page before giving up and extending the relation, since doing otherwise leads to table bloat. Testing showed that trying every page penalized performance slightly, so we compromise and try every other page. This way, we visit at most two pages. Any pages with wasted free space become visible at next relation extension, so we still control table bloat. As a bonus, directly attempting one or two pages can even be faster than consulting the FSM would have been. Once the FSM is created for a heap we don't remove it even if somebody deletes all the rows from the corresponding relation. We don't think it is a useful optimization as it is quite likely that relation will again grow to the same size. Author: John Naylor with design inputs and some code contribution by Amit Kapila Reviewed-by: Amit Kapila Tested-by: Mithun C Y Discussion: https://www.postgresql.org/message-id/CAJVSVGWvB13PzpbLEecFuGFc5V2fsO736BsdTakPiPAcdMM5tQ@mail.gmail.com
* Change function call information to be variable length.Andres Freund2019-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this change FunctionCallInfoData, the struct arguments etc for V1 function calls are stored in, always had space for FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two arrays. For nearly every function call 100 arguments is far more than needed, therefore wasting memory. Arg and argnull being two separate arrays also guarantees that to access a single argument, two cachelines have to be touched. Change the layout so there's a single variable-length array with pairs of value / isnull. That drastically reduces memory consumption for most function calls (on x86-64 a two argument function now uses 64bytes, previously 936 bytes), and makes it very likely that argument value and its nullness are on the same cacheline. Arguments are stored in a new NullableDatum struct, which, due to padding, needs more memory per argument than before. But as usually far fewer arguments are stored, and individual arguments are cheaper to access, that's still a clear win. It's likely that there's other places where conversion to NullableDatum arrays would make sense, e.g. TupleTableSlots, but that's for another commit. Because the function call information is now variable-length allocations have to take the number of arguments into account. For heap allocations that can be done with SizeForFunctionCallInfoData(), for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro that helps to allocate an appropriately sized and aligned variable. Some places with stack allocation function call information don't know the number of arguments at compile time, and currently variably sized stack allocations aren't allowed in postgres. Therefore allow for FUNC_MAX_ARGS space in these cases. They're not that common, so for now that seems acceptable. Because of the need to allocate FunctionCallInfo of the appropriate size, older extensions may need to update their code. To avoid subtle breakages, the FunctionCallInfoData struct has been renamed to FunctionCallInfoBaseData. Most code only references FunctionCallInfo, so that shouldn't cause much collateral damage. This change is also a prerequisite for more efficient expression JIT compilation (by allocating the function call information on the stack, allowing LLVM to optimize it away); previously the size of the call information caused problems inside LLVM's optimizer. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
* postgres_fdw: Account for tlist eval costs in estimate_path_cost_size().Etsuro Fujita2019-01-24
| | | | | | | | | | | | | | | | | | Previously, estimate_path_cost_size() didn't account for tlist eval costs, except when costing a foreign-grouping path using local statistics, but such costs should be accounted for when costing that path using remote estimates, because some of the tlist expressions might be evaluated locally. Also, such costs should be accounted for in the case of a foreign-scan or foreign-join path, because the tlist might contain PlaceHolderVars, which postgres_fdw currently evaluates locally. This also fixes an oversight in my commit f8f6e44676. Like that commit, apply this to HEAD only to avoid destabilizing existing plan choices. Author: Etsuro Fujita Discussion: https://postgr.es/m/5BFD3EAD.2060301%40lab.ntt.co.jp
* Fix misc typos in comments.Heikki Linnakangas2019-01-23
| | | | | | Spotted mostly by Fabien Coelho. Discussion: https://www.postgresql.org/message-id/alpine.DEB.2.21.1901230947050.16643@lancre
* Move remaining code from tqual.[ch] to heapam.h / heapam_visibility.c.Andres Freund2019-01-21
| | | | | | | | | | | | | | Given these routines are heap specific, and that there will be more generic visibility support in via table AM, it makes sense to move the prototypes to heapam.h (routines like HeapTupleSatisfiesVacuum will not be exposed in a generic fashion, because they are too storage specific). Similarly, the code in tqual.c is specific to heap, so moving it into access/heap/ makes sense. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Move generic snapshot related code from tqual.h to snapmgr.h.Andres Freund2019-01-21
| | | | | | | | | | | | | | | | The code in tqual.c is largely heap specific. Due to the upcoming pluggable storage work, it therefore makes sense to move it into access/heap/ (as the file's header notes, the tqual name isn't very good). But the various statically allocated snapshot and snapshot initialization functions are now (see previous commit) generic and do not depend on functions declared in tqual.h anymore. Therefore move. Also move XidInMVCCSnapshot as that's useful for future AMs, and already used outside of tqual.c. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Change snapshot type to be determined by enum rather than callback.Andres Freund2019-01-21
| | | | | | | | | | | | | | | | | | | | | | This is in preparation for allowing the same snapshot be used for different table AMs. With the current callback based approach we would need one callback for each supported AM, which clearly would not be extensible. Thus add a new Snapshot->snapshot_type field, and move the dispatch into HeapTupleSatisfiesVisibility() (which is now a function). Later work will then dispatch calls to HeapTupleSatisfiesVisibility() and other AMs visibility functions depending on the type of the table. The central SnapshotType enum also seems like a good location to centralize documentation about the intended behaviour of various types of snapshots. As tqual.h isn't included by bufmgr.h any more (as HeapTupleSatisfies* isn't referenced by TestForOldSnapshot() anymore) a few files now need to include it directly. Author: Andres Freund, loosely based on earlier work by Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
* Fix sepgsql regression test.Tom Lane2019-01-21
| | | | | | | Message order in the expected output changes due to commit f1ad067fc. Per buildfarm. Discussion: https://postgr.es/m/20190121201134.dyx6anto6akflh5d@alap3.anarazel.de
* Remove superfluous tqual.h includes.Andres Freund2019-01-21
| | | | | | | | | | | | Most of these had been obsoleted by 568d4138c / the SnapshotNow removal. This is is preparation for moving most of tqual.[ch] into either snapmgr.h or heapam.h, which in turn is in preparation for pluggable table AMs. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Replace uses of heap_open et al with the corresponding table_* function.Andres Freund2019-01-21
| | | | | Author: Andres Freund Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
* Replace heapam.h includes with {table, relation}.h where applicable.Andres Freund2019-01-21
| | | | | | | | | A lot of files only included heapam.h for relation_open, heap_open etc - replace the heapam.h include in those files with the narrower header. Author: Andres Freund Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de