aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Fix oversight in data-type change for autovacuum_vacuum_cost_delay.Tom Lane2019-03-26
| | | | | | | | | | | | | Commit caf626b2c missed that the relevant reloptions entry needs to be moved from the intRelOpts[] array to realRelOpts[]. Somewhat surprisingly, it seems to work anyway, perhaps because the desired default and limit values are all integers. We ought to have either a simpler data structure or better cross-checking here, but that's for another patch. Nikolay Shaplov Discussion: https://postgr.es/m/4861742.12LTaSB3sv@x200m
* Get rid of duplicate child RTE for a partitioned table.Tom Lane2019-03-26
| | | | | | | | | | | | | | | | | | We've been creating duplicate RTEs for partitioned tables just because we do so for regular inheritance parent tables. But unlike regular-inheritance parents which are themselves regular tables and thus need to be scanned, partitioned tables don't need the extra RTE. This makes the conditions for building a child RTE the same as those for building an AppendRelInfo, allowing minor simplification in expand_single_inheritance_child. Since the planner's actual processing is driven off the AppendRelInfo list, nothing much changes beyond that, we just have one fewer useless RTE entry. Amit Langote, reviewed and hacked a bit by me Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
* Fix misplaced constPeter Eisentraut2019-03-26
| | | | | | These instances were apparently trying to carry the const qualifier from the arguments through the complex casts, but for that the const qualifier was misplaced.
* Remove heap_hot_search().Andres Freund2019-03-25
| | | | | | | | | | | | | After 71bdc99d0d7, "tableam: Add helper for indexes to check if a corresponding table tuples exist." there's no in-core user left. As there's unlikely to be an external user, and such an external user could easily be adjusted to use table_index_fetch_tuple_check(), remove heap_hot_search(). Per complaint from Peter Geoghegan Author: Andres Freund Discussion: https://postgr.es/m/CAH2-Wzn0Oq4ftJrTqRAsWy2WGjv0QrJcwoZ+yqWsF_Z5vjUBFw@mail.gmail.com
* Fix crash when using partition bound expressionsMichael Paquier2019-03-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 7c079d7, partition bounds are able to use generalized expression syntax when processed, treating "minvalue" and "maxvalue" as specific cases as they get passed down for transformation as a column references. The checks for infinite bounds in range expressions have been lax though, causing crashes when trying to use column reference names with more than one field. Here is an example causing a crash: CREATE TABLE list_parted (a int) PARTITION BY LIST (a); CREATE TABLE part_list_crash PARTITION OF list_parted FOR VALUES IN (somename.somename); Note that the creation of the second relation should fail as partition bounds cannot have column references in their expressions, so when finding an expression which does not match the expected infinite bounds, then this commit lets the generic transformation machinery check after it. The error message generated in this case references as well a missing RTE, which is confusing. This problem will be treated separately as it impacts as well default expressions for some time, and for now only the cases where a crash can happen are fixed. While on it, extend the set of regression tests in place for list partition bounds and add an extra set for range partition bounds. Reported-by: Alexander Lakhin Author: Michael Paquier Reviewed-by: Amit Langote Discussion: https://postgr.es/m/15668-0377b1981aa1a393@postgresql.org
* tableam: Add table_get_latest_tid, to wrap heap_get_latest_tid.Andres Freund2019-03-25
| | | | | | | | | | This primarily is to allow WHERE CURRENT OF to continue to work as it currently does. It's not clear to me that these semantics make sense for every AM, but it works for the in-core heap, and the out of core zheap. We can refine it further at a later point if necessary. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* tableam: Add helper for indexes to check if a corresponding table tuples exist.Andres Freund2019-03-25
| | | | | | | | This is, likely exclusively, useful to verify that conflicts detected in a unique index are with live tuples, rather than dead ones. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Improve planner's selectivity estimates for inequalities on CTID.Tom Lane2019-03-25
| | | | | | | | | | | | | | | | | We were getting just DEFAULT_INEQ_SEL for comparisons such as "ctid >= constant", but it's possible to do a lot better if we don't mind some assumptions about the table's tuple density being reasonably uniform. There are already assumptions much like that elsewhere in the planner, so that hardly seems like much of an objection. Extracted from a patch set that also proposes to introduce a special executor node type for such queries. Not sure if that's going to make it into v12, but improving the selectivity estimate is useful independently of that. Edmund Horner, reviewed by David Rowley Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
* Suppress Append and MergeAppend plan nodes that have a single child.Tom Lane2019-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | If there's only one child relation, the Append or MergeAppend isn't doing anything useful, and can be elided. It does have a purpose during planning though, which is to serve as a buffer between parent and child Var numbering. Therefore we keep it all the way through to setrefs.c, and get rid of it only after fixing references in the plan level(s) above it. This works largely the same as setrefs.c's ancient hack to get rid of no-op SubqueryScan nodes, and can even share some code with that. Note the change to make setrefs.c use apply_tlist_labeling rather than ad-hoc code. This has the effect of propagating the child's resjunk and ressortgroupref labels, which formerly weren't propagated when removing a SubqueryScan. Doing that is demonstrably necessary for the [Merge]Append cases, and seems harmless for SubqueryScan, if only because trivial_subqueryscan is afraid to collapse cases where the resjunk marking differs. (I suspect that restriction could now be removed, though it's unclear that it'd make any new matches possible, since the outer query can't have references to a child resjunk column.) David Rowley, reviewed by Alvaro Herrera and Tomas Vondra Discussion: https://postgr.es/m/CAKJS1f_7u8ATyJ1JGTMHFoKDvZdeF-iEBhs+sM_SXowOr9cArg@mail.gmail.com
* Add "split after new tuple" nbtree optimization.Peter Geoghegan2019-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add additional heuristics to the algorithm for locating an optimal split location. New logic identifies localized monotonically increasing values in indexes with multiple columns. When this insertion pattern is detected, page splits split just after the new item that provoked a page split (or apply leaf fillfactor in the style of a rightmost page split). This optimization is a variation of the long established leaf fillfactor optimization used during rightmost page splits. 50/50 page splits are only appropriate with a pattern of truly random insertions, where the average space utilization ends up at 65% - 70%. Without this patch, affected cases have leaf pages that are no more than about 50% full on average. Future insertions can never make use of the free space left behind. With this patch, affected cases have leaf pages that are about 90% full on average (assuming a fillfactor of 90). Localized monotonically increasing insertion patterns are presumed to be fairly common in real-world applications. There is a fair amount of anecdotal evidence for this. Both pg_depend system catalog indexes (pg_depend_depender_index and pg_depend_reference_index) are at least 20% smaller after the regression tests are run when the optimization is available. Furthermore, many of the indexes created by a fair use implementation of TPC-C for Postgres are consistently about 40% smaller when the optimization is available. Note that even pg_upgrade'd v3 indexes make use of this optimization. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/CAH2-WzkpKeZJrXvR_p7VSY1b-s85E3gHyTbZQzR0BkJ5LrWF_A@mail.gmail.com
* Further code review for new integerset code.Tom Lane2019-03-25
| | | | | Mostly cosmetic adjustments, but I added a more reliable method of detecting whether an iteration is in progress.
* Fix use of wrong datatype with sizeof().Robert Haas2019-03-25
| | | | | | | | OID and int are the same size, but they are not the same thing. David Rowley Discussion: http://postgr.es/m/CAKJS1f_MhS++XngkTvWL9X1v8M5t-0N0B-R465yHQY=TmNV0Ew@mail.gmail.com
* Add progress reporting for CLUSTER and VACUUM FULL.Robert Haas2019-03-25
| | | | | | | | | | | | | | | This uses the same progress reporting infrastructure added in commit c16dc1aca5e01e6acaadfcf38f5fc964a381dc62 and extends it to these additional cases. We lack the ability to track the internal progress of sorts and index builds so the information reported is coarse-grained for some parts of the operation, but it still seems like a significant improvement over having nothing at all. Tatsuro Yamada, reviewed by Thomas Munro, Masahiko Sawada, Michael Paquier, Jeff Janes, Alvaro Herrera, Rafia Sabih, and by me. A fair amount of polishing also by me. Discussion: http://postgr.es/m/59A77072.3090401@lab.ntt.co.jp
* Get rid of backtracking in jsonpath_scan.lAlexander Korotkov2019-03-25
| | | | | | | | | | Non-backtracking flex parsers work faster than backtracking ones. So, this commit gets rid of backtracking in jsonpath_scan.l. That required explicit handling of some cases as well as manual backtracking for some cases. More regression tests for numerics are added. Discussion: https://mail.google.com/mail/u/0?ik=a20b091faa&view=om&permmsgid=msg-f%3A1628425344167939063 Author: John Naylor, Nikita Gluknov, Alexander Korotkov
* Cosmetic changes for jsonpath_gram.y and jsonpath_scan.lAlexander Korotkov2019-03-25
| | | | | | | | | | | This commit include formatting improvements, renamings and comments. Also, it makes jsonpath_scan.l be more uniform with other our lexers. Firstly, states names are renamed to more short alternatives. Secondly, <INITIAL> prefix removed from the rules. Corresponding rules are moved to the tail, so they would anyway work only in initial state. Author: Alexander Korotkov Reviewed-by: John Naylor
* Clean up the Simple-8b encoder code.Heikki Linnakangas2019-03-25
| | | | | | | | | | | | | | | | | | | | | | | | Coverity complained that simple8b_encode() might read beyond the end of the 'diffs' array, in the loop to encode the integers. That was a false positive, because we never get into the loop in modes 0 or 1, and the array is large enough for all the other modes. But I admit it's very subtle, so it's not surprising that Coverity didn't see it, and it's not very obvious to humans either. Refactor it, so that the second loop re-computes the differences, instead of carrying them over from the first loop in the 'diffs' array. This way, the 'diffs' array is not needed anymore. It makes no measurable difference in performance, and seems more straightforward this way. Also, improve the comments in simple8b_encode(): fix the comment about its return value that was flat-out wrong, and explain the condition when it returns EMPTY_CODEWORD better. In the passing, move the 'selector' from the codeword's low bits to the high bits. It doesn't matter much, but looking at the original paper, and googling around for other Simple-8b implementations, that's how it's usually done. Per Coverity, and Tom Lane's report off-list.
* Add macro to cast away volatile without allowing changes to underlying typePeter Eisentraut2019-03-25
| | | | | | This adds unvolatize(), which works just like unconstify() but for volatile. Discussion: https://www.postgresql.org/message-id/flat/7a5cbea7-b8df-e910-0f10-04014bcad701%402ndquadrant.com
* tableam: Add and use table_fetch_row_version().Andres Freund2019-03-25
| | | | | | | | | | | | | | | | | | This is essentially the tableam version of heapam_fetch(), i.e. fetching a tuple identified by a tid, performing visibility checks. Note that this different from table_index_fetch_tuple(), which is for index lookups. It therefore has to handle a tid pointing to an earlier version of a tuple if the AM uses an optimization like heap's HOT. Add comments to that end. This commit removes the stats_relation argument from heap_fetch, as it's been unused for a long time. Author: Andres Freund Reviewed-By: Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* tableam: Use in CREATE TABLE AS and CREATE MATERIALIZED VIEW.Andres Freund2019-03-24
| | | | | | | | | | | | | Previously those directly performed a heap_insert(). Use table_insert() instead. The input slot of those routines is not of the target relation - we could fix that by copying if necessary, but that'd not be beneficial for performance. As those codepaths don't access any AM specific tuple fields (say xmin/xmax), there's no need to use an AM specific slot. Author: Andres Freund Reviewed-By: Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Sort dependent objects before reporting them in DROP ROLE.Tom Lane2019-03-24
| | | | | | | | | | | | | | | | | | Commit 8aa9dd74b didn't quite finish the job in this area after all, because DROP ROLE has a code path distinct from DROP OWNED BY, and it was still reporting dependent objects in whatever order the index scan returned them in. Buildfarm experience shows that index ordering of equal-keyed objects is significantly less stable than before in the wake of using heap TIDs as tie-breakers. So if we try to hide the unstable ordering by suppressing DETAIL reports, we're just going to end up having to do that for every DROP that reports multiple objects. That's not great from a coverage or problem-detection standpoint, and it's something we'll inevitably forget in future patches, leading to more iterations of fixing-an- unstable-result. So let's just bite the bullet and sort here too. Discussion: https://postgr.es/m/E1h6eep-0001Mw-Vd@gemulon.postgresql.org
* Remove dead code from nbtsplitloc.c.Peter Geoghegan2019-03-24
| | | | | | | | | | It doesn't make sense to consider the possibility that there will only be one candidate split point when choosing among split points to find the split with the lowest penalty. This is a vestige of an earlier version of the patch that became commit fab25024. Issue spotted while rereviewing coverage of the nbtree patch series using gcov.
* Make current_logfiles use permissions assigned to files in data directoryMichael Paquier2019-03-24
| | | | | | | | | | | | | | | | | | | | | | | Since its introduction in 19dc233c, current_logfiles has been assigned the same permissions as a log file, which can be enforced with log_file_mode. This setup can lead to incompatibility problems with group access permissions as current_logfiles is not located in the log directory, but at the root of the data folder. Hence, if group permissions are used but log_file_mode is more restrictive, a backup with a user in the group having read access could fail even if the log directory is located outside of the data folder. Per discussion with the folks mentioned below, we have concluded that current_logfiles should not be treated as a log file as it only stores metadata related to log files, and that it should use the same permissions as all other files in the data directory. This solution has the merit to be simple and fixes all the interaction problems between group access and log_file_mode. Author: Haribabu Kommi Reviewed-by: Stephen Frost, Robert Haas, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CAJrrPGcEotF1P7AWoeQyD3Pqr-0xkQg_Herv98DjbaMj+naozw@mail.gmail.com Backpatch-through: 11, where group access has been added.
* Transaction chainingPeter Eisentraut2019-03-24
| | | | | | | | | | | | | Add command variants COMMIT AND CHAIN and ROLLBACK AND CHAIN, which start new transactions with the same transaction characteristics as the just finished one, per SQL standard. Support for transaction chaining in PL/pgSQL is also added. This functionality is especially useful when running COMMIT in a loop in PL/pgSQL. Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr> Discussion: https://www.postgresql.org/message-id/flat/28536681-324b-10dc-ade8-ab46f7645a5a@2ndquadrant.com
* tableam: Add tuple_{insert, delete, update, lock} and use.Andres Freund2019-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds new, required, table AM callbacks for insert/delete/update and lock_tuple. To be able to reasonably use those, the EvalPlanQual mechanism had to be adapted, moving more logic into the AM. Previously both delete/update/lock call-sites and the EPQ mechanism had to have awareness of the specific tuple format to be able to fetch the latest version of a tuple. Obviously that needs to be abstracted away. To do so, move the logic that find the latest row version into the AM. lock_tuple has a new flag argument, TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last version, rather than the current one. It'd have been possible to do so via a separate callback as well, but finding the last version usually also necessitates locking the newest version, making it sensible to combine the two. This replaces the previous use of EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously signaled either a concurrent update or delete, is now split into two, to avoid callers needing AM specific knowledge to differentiate. The move of finding the latest row version into tuple_lock means that encountering a row concurrently moved into another partition will now raise an error about "tuple to be locked" rather than "tuple to be updated/deleted" - which is accurate, as that always happens when locking rows. While possible slightly less helpful for users, it seems like an acceptable trade-off. As part of this commit HTSU_Result has been renamed to TM_Result, and its members been expanded to differentiated between updating and deleting. HeapUpdateFailureData has been renamed to TM_FailureData. The interface to speculative insertion is changed so nodeModifyTable.c does not have to set the speculative token itself anymore. Instead there's a version of tuple_insert, tuple_insert_speculative, that performs the speculative insertion (without requiring a flag to signal that fact), and the speculative insertion is either made permanent with table_complete_speculative(succeeded = true) or aborted with succeeded = false). Note that multi_insert is not yet routed through tableam, nor is COPY. Changing multi_insert requires changes to copy.c that are large enough to better be done separately. Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED VIEW are also only going to be adjusted in a later commit. Author: Andres Freund and Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
* Remove inadequate check for duplicate "xml" PI.Tom Lane2019-03-23
| | | | | | I failed to think about PIs starting with "xml". We don't really need this check at all, so just take it out. Oversight in commit 8d1dadb25 et al.
* Accept XML documents when xmloption = content, as required by SQL:2006+.Tom Lane2019-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we were using the SQL:2003 definition, which doesn't allow this, but that creates a serious dump/restore gotcha: there is no setting of xmloption that will allow all valid XML data. Hence, switch to the 2006 definition. Since libxml doesn't accept <!DOCTYPE> directives in the mode we use for CONTENT parsing, the implementation is to detect <!DOCTYPE> in the input and switch to DOCUMENT parsing mode. This should not cost much, because <!DOCTYPE> should be close to the front of the input if it's there at all. It's possible that this causes the error messages for malformed input to be slightly different than they were before, if said input includes <!DOCTYPE>; but that does not seem like a big problem. In passing, buy back a few cycles in parsing of large XML documents by not doing strlen() of the whole input in parse_xml_decl(). Back-patch because dump/restore failures are not nice. This change shouldn't break any cases that worked before, so it seems safe to back-patch. Chapman Flack (revised a bit by me) Discussion: https://postgr.es/m/CAN-V+g-6JqUQEQZ55Q3toXEN6d5Ez5uvzL4VR+8KtvJKj31taw@mail.gmail.com
* Add nbtree high key "continuescan" optimization.Peter Geoghegan2019-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach nbtree forward index scans to check the high key before moving to the right sibling page in the hope of finding that it isn't actually necessary to do so. The new check may indicate that the scan definitely cannot find matching tuples to the right, ending the scan immediately. We already opportunistically force a similar "continuescan orientated" key check of the final non-pivot tuple when it's clear that it cannot be returned to the scan due to being dead-to-all. The new high key check is complementary. The new approach for forward scans is more effective than checking the final non-pivot tuple, especially with composite indexes and non-unique indexes. The improvements to the logic for picking a split point added by commit fab25024 make it likely that relatively dissimilar high keys will appear on a page. A distinguishing key value that can only appear on non-pivot tuples on the right sibling page will often be present in leaf page high keys. Since forcing the final item to be key checked no longer makes any difference in the case of forward scans, the existing extra key check is now only used for backwards scans. Backward scans continue to opportunistically check the final non-pivot tuple, which is actually the first non-pivot tuple on the page (not the last). Note that even pg_upgrade'd v3 indexes make use of this optimization. Author: Peter Geoghegan, Heikki Linnakangas Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/CAH2-WzkOmUduME31QnuTFpimejuQoiZ-HOf0pOWeFZNhTMctvA@mail.gmail.com
* Add unreachable "break" to satisfy -Wimplicit-fallthrough.Tom Lane2019-03-23
| | | | | | gcc is a bit pickier about this than perhaps it should be. Discussion: https://postgr.es/m/E1h6zzT-0003ft-DD@gemulon.postgresql.org
* Revert "Add gitignore entries for jsonpath_gram.h"Peter Eisentraut2019-03-23
| | | | | | This reverts commit 4e274a043fc8310ce1148190ef674beca06e990c. These files aren't actually built anymore since 550b9d26f.
* Add gitignore entries for jsonpath_gram.hPeter Eisentraut2019-03-22
|
* Rearrange make_partitionedrel_pruneinfo to avoid work when we can't prune.Tom Lane2019-03-22
| | | | | | | | | | | | | | | | Postpone most of the effort of constructing PartitionedRelPruneInfos until after we have found out whether run-time pruning is needed at all. This costs very little duplicated effort (basically just an extra find_base_rel() call per partition) and saves quite a bit when we can't do run-time pruning. Also, merge the first loop (for building relid_subpart_map) into the second loop, since we don't need the map to be valid during that loop. Amit Langote Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
* Don't copy PartitionBoundInfo in set_relation_partition_info.Tom Lane2019-03-22
| | | | | | | | | | | | | | | | I (tgl) remain dubious that it's a good idea for PartitionDirectory to hold a pin on a relcache entry throughout planning, rather than copying the data or using some kind of refcount scheme. However, it's certainly the responsibility of the PartitionDirectory code to ensure that what it's handing back is a stable data structure, not that of its caller. So this is a pretty clear oversight in commit 898e5e329, and one that can cost a lot of performance when there are many partitions. Amit Langote (extracted from a much larger patch set) Discussion: https://postgr.es/m/CA+TgmoY3bRmGB6-DUnoVy5fJoreiBJ43rwMrQRCdPXuKt4Ykaw@mail.gmail.com Discussion: https://postgr.es/m/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp
* Fix yet more portability bugs in integerset and its tests.Heikki Linnakangas2019-03-22
| | | | | | | | | | | | There were more large constants that needed UINT64CONST. And one variable was declared as "int", when it needed to be uint64. These bugs were only visible on 32-bit systems; clearly I should've tested on one, given that this code does a lot of work with 64-bit integers. Also, in the test "huge distances" test, the code created some values with random distances between them, but the test logic didn't take into account the possibility that the random distance was exactly 1. That never actually happens with the seed we're using, but let's be tidy.
* Fix bug in the GiST vacuum's 2nd stage.Heikki Linnakangas2019-03-22
| | | | | | | We mustn't assume that the IndexVacuumInfo pointer passed to bulkdelete() stage is still valid in the vacuumcleanup() stage. Per very pink buildfarm.
* Delete empty pages during GiST VACUUM.Heikki Linnakangas2019-03-22
| | | | | | | | | | | | | | | To do this, we scan GiST two times. In the first pass we make note of empty leaf pages and internal pages. At second pass we scan through internal pages, looking for downlinks to the empty pages. Deleting internal pages is still not supported, like in nbtree, the last child of an internal page is never deleted. That means that if you have a workload where new keys are always inserted to different area than where old keys are removed, the index will still grow without bound. But the rate of growth will be an order of magnitude slower than before. Author: Andrey Borodin Discussion: https://www.postgresql.org/message-id/B1E4DF12-6CD3-4706-BDBD-BF3283328F60@yandex-team.ru
* Add IntegerSet, to hold large sets of 64-bit ints efficiently.Heikki Linnakangas2019-03-22
| | | | | | | | | | | | | | | | | | | | | The set is implemented as a B-tree, with a compact representation at leaf items, using Simple-8b algorithm, so that clusters of nearby values use less memory. The IntegerSet isn't used for anything yet, aside from the test code, but we have two patches in the works that would benefit from this: A patch to allow GiST vacuum to delete empty pages, and a patch to reduce heap VACUUM's memory usage, by storing the list of dead TIDs more efficiently and lifting the 1 GB limit on its size. This includes a unit test module, in src/test/modules/test_integerset. It can be used to verify correctness, as a regression test, but if you run it manully, it can also print memory usage and execution time of some of the tests. Author: Heikki Linnakangas, Andrey Borodin Reviewed-by: Julien Rouhaud Discussion: https://www.postgresql.org/message-id/b5e82599-1966-5783-733c-1a947ddb729f@iki.fi
* Collations with nondeterministic comparisonPeter Eisentraut2019-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a flag "deterministic" to collations. If that is false, such a collation disables various optimizations that assume that strings are equal only if they are byte-wise equal. That then allows use cases such as case-insensitive or accent-insensitive comparisons or handling of strings with different Unicode normal forms. This functionality is only supported with the ICU provider. At least glibc doesn't appear to have any locales that work in a nondeterministic way, so it's not worth supporting this for the libc provider. The term "deterministic comparison" in this context is from Unicode Technical Standard #10 (https://unicode.org/reports/tr10/#Deterministic_Comparison). This patch makes changes in three areas: - CREATE COLLATION DDL changes and system catalog changes to support this new flag. - Many executor nodes and auxiliary code are extended to track collations. Previously, this code would just throw away collation information, because the eventually-called user-defined functions didn't use it since they only cared about equality, which didn't need collation information. - String data type functions that do equality comparisons and hashing are changed to take the (non-)deterministic flag into account. For comparison, this just means skipping various shortcuts and tie breakers that use byte-wise comparison. For hashing, we first need to convert the input string to a canonical "sort key" using the ICU analogue of strxfrm(). Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Discussion: https://www.postgresql.org/message-id/flat/1ccc668f-4cbc-0bef-af67-450b47cdfee7@2ndquadrant.com
* Fix crash with pg_partition_rootMichael Paquier2019-03-22
| | | | | | | | | | | Trying to call the function with the top-most parent of a partition tree was leading to a crash. In this case the correct result is to return the top-most parent itself. Reported-by: Álvaro Herrera Author: Michael Paquier Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20190322032612.GA323@alvherre.pgsql
* Fix dependency recording bug for partitioned PKsAlvaro Herrera2019-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | When DefineIndex recurses to create constraints on partitions, it needs to use the value returned by index_constraint_create to set up partition dependencies. However, in the course of fixing the DEPENDENCY_INTERNAL_AUTO mess, commit 1d92a0c9f7dd introduced some code to that function that clobbered the return value, causing the recorded OID to be of the wrong object. Close examination of pg_depend after creating the tables leads to indescribable objects :-( My sin (in commit bdc3d7fa2376, while preparing for DDL deparsing in event triggers) was to use a variable name for the return value that's typically used for throwaway objects in dependency-setting calls ("referenced"). Fix by changing the variable names to match extended practice (the return value is "myself" rather than "referenced".) The pg_upgrade test notices the problem (in an indirect way: the pg_dump outputs are in different order), but only if you create the objects in a specific way that wasn't being used in the existing tests. Add a stanza to leave some objects around that shows the bug. Catversion bump because preexisting databases might have bogus pg_depend entries. Discussion: https://postgr.es/m/20190318204235.GA30360@alvherre.pgsql
* Improve error reporting for DROP FUNCTION/PROCEDURE/AGGREGATE/ROUTINE.Tom Lane2019-03-21
| | | | | | | | | | | | | | | | | | | | | | | These commands allow the argument type list to be omitted if there is just one object that matches by name. However, if that syntax was used with DROP IF EXISTS and there was more than one match, you got a "function ... does not exist, skipping" notice message rather than a truthful complaint about the ambiguity. This was basically due to poor factorization and a rats-nest of logic, so refactor the relevant lookup code to make it cleaner. Note that this amounts to narrowing the scope of which sorts of error conditions IF EXISTS will bypass. Per discussion, we only intend it to skip no-such-object cases, not multiple-possible-matches cases. Per bug #15572 from Ash Marath. Although this definitely seems like a bug, it's not clear that people would thank us for changing the behavior in minor releases, so no back-patch. David Rowley, reviewed by Julien Rouhaud and Pavel Stehule Discussion: https://postgr.es/m/15572-ed1b9ed09503de8a@postgresql.org
* Add DNS SRV support for LDAP server discovery.Thomas Munro2019-03-21
| | | | | | | | | | | | | | | | | LDAP servers can be advertised on a network with RFC 2782 DNS SRV records. The OpenLDAP command-line tools automatically try to find servers that way, if no server name is provided by the user. Teach PostgreSQL to do the same using OpenLDAP's support functions, when building with OpenLDAP. For now, we assume that HAVE_LDAP_INITIALIZE (an OpenLDAP extension available since OpenLDAP 2.0 and also present in Apple LDAP) implies that you also have ldap_domain2hostlist() (which arrived in the same OpenLDAP version and is also present in Apple LDAP). Author: Thomas Munro Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/CAEepm=2hAnSfhdsd6vXsM6VZVN0br-FbAZ-O+Swk18S5HkCP=A@mail.gmail.com
* Sort the dependent objects before deletion in DROP OWNED BY.Tom Lane2019-03-20
| | | | | | | | | | This finishes a task we left undone in commit f1ad067fc, by extending the delete-in-descending-OID-order rule to deletions triggered by DROP OWNED BY. We've coped with machine-dependent deletion orders one time too many, and the new issues caused by Peter G's recent nbtree hacking seem like the last straw. Discussion: https://postgr.es/m/E1h6eep-0001Mw-Vd@gemulon.postgresql.org
* Add index_get_partition convenience functionAlvaro Herrera2019-03-20
| | | | | | | | This new function simplifies some existing coding, as well as supports future patches. Discussion: https://postgr.es/m/201901222145.t6wws6t6vrcu@alvherre.pgsql Reviewed-by: Amit Langote, Jesper Pedersen
* Fix spurious compiler warning in nbtxlog.c.Peter Geoghegan2019-03-20
| | | | | | Cleanup from commit dd299df8. Per complaint from Tom Lane.
* Restore RI trigger sanity checkAlvaro Herrera2019-03-20
| | | | | | | | I unnecessarily removed this check in 3de241dba86f because I misunderstood what the final representation of constraints across a partitioning hierarchy was to be. Put it back (in both branches). Discussion: https://postgr.es/m/201901222145.t6wws6t6vrcu@alvherre.pgsql
* Consider secondary factors during nbtree splits.Peter Geoghegan2019-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach nbtree to give some consideration to how "distinguishing" candidate leaf page split points are. This should not noticeably affect the balance of free space within each half of the split, while still making suffix truncation truncate away significantly more attributes on average. The logic for choosing a leaf split point now uses a fallback mode in the case where the page is full of duplicates and it isn't possible to find even a minimally distinguishing split point. When the page is full of duplicates, the split should pack the left half very tightly, while leaving the right half mostly empty. Our assumption is that logical duplicates will almost always be inserted in ascending heap TID order with v4 indexes. This strategy leaves most of the free space on the half of the split that will likely be where future logical duplicates of the same value need to be placed. The number of cycles added is not very noticeable. This is important because deciding on a split point takes place while at least one exclusive buffer lock is held. We avoid using authoritative insertion scankey comparisons to save cycles, unlike suffix truncation proper. We use a faster binary comparison instead. Note that even pg_upgrade'd v3 indexes make use of these optimizations. Benchmarking has shown that even v3 indexes benefit, despite the fact that suffix truncation will only truncate non-key attributes in INCLUDE indexes. Grouping relatively similar tuples together is beneficial in and of itself, since it reduces the number of leaf pages that must be accessed by subsequent index scans. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/CAH2-WzmmoLNQOj9mAD78iQHfWLJDszHEDrAzGTUMG3mVh5xWPw@mail.gmail.com
* Make heap TID a tiebreaker nbtree index column.Peter Geoghegan2019-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make nbtree treat all index tuples as having a heap TID attribute. Index searches can distinguish duplicates by heap TID, since heap TID is always guaranteed to be unique. This general approach has numerous benefits for performance, and is prerequisite to teaching VACUUM to perform "retail index tuple deletion". Naively adding a new attribute to every pivot tuple has unacceptable overhead (it bloats internal pages), so suffix truncation of pivot tuples is added. This will usually truncate away the "extra" heap TID attribute from pivot tuples during a leaf page split, and may also truncate away additional user attributes. This can increase fan-out, especially in a multi-column index. Truncation can only occur at the attribute granularity, which isn't particularly effective, but works well enough for now. A future patch may add support for truncating "within" text attributes by generating truncated key values using new opclass infrastructure. Only new indexes (BTREE_VERSION 4 indexes) will have insertions that treat heap TID as a tiebreaker attribute, or will have pivot tuples undergo suffix truncation during a leaf page split (on-disk compatibility with versions 2 and 3 is preserved). Upgrades to version 4 cannot be performed on-the-fly, unlike upgrades from version 2 to version 3. contrib/amcheck continues to work with version 2 and 3 indexes, while also enforcing stricter invariants when verifying version 4 indexes. These stricter invariants are the same invariants described by "3.1.12 Sequencing" from the Lehman and Yao paper. A later patch will enhance the logic used by nbtree to pick a split point. This patch is likely to negatively impact performance without smarter choices around the precise point to split leaf pages at. Making these two mostly-distinct sets of enhancements into distinct commits seems like it might clarify their design, even though neither commit is particularly useful on its own. The maximum allowed size of new tuples is reduced by an amount equal to the space required to store an extra MAXALIGN()'d TID in a new high key during leaf page splits. The user-facing definition of the "1/3 of a page" restriction is already imprecise, and so does not need to be revised. However, there should be a compatibility note in the v12 release notes. Author: Peter Geoghegan Reviewed-By: Heikki Linnakangas, Alexander Korotkov Discussion: https://postgr.es/m/CAH2-WzkVb0Kom=R+88fDFb=JSxZMFvbHVC6Mn9LJ2n=X=kS-Uw@mail.gmail.com
* Refactor nbtree insertion scankeys.Peter Geoghegan2019-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use dedicated struct to represent nbtree insertion scan keys. Having a dedicated struct makes the difference between search type scankeys and insertion scankeys a lot clearer, and simplifies the signature of several related functions. This is based on a suggestion by Andrey Lepikhov. Streamline how unique index insertions cache binary search progress. Cache the state of in-progress binary searches within _bt_check_unique() for later instead of having callers avoid repeating the binary search in an ad-hoc manner. This makes it easy to add a new optimization: _bt_check_unique() now falls out of its loop immediately in the common case where it's already clear that there couldn't possibly be a duplicate. The new _bt_check_unique() scheme makes it a lot easier to manage cached binary search effort afterwards, from within _bt_findinsertloc(). This is needed for the upcoming patch to make nbtree tuples unique by treating heap TID as a final tiebreaker column. Unique key binary searches need to restore lower and upper bounds. They cannot simply continue to use the >= lower bound as the offset to insert at, because the heap TID tiebreaker column must be used in comparisons for the restored binary search (unlike the original _bt_check_unique() binary search, where scankey's heap TID column must be omitted). Author: Peter Geoghegan, Heikki Linnakangas Reviewed-By: Heikki Linnakangas, Andrey Lepikhov Discussion: https://postgr.es/m/CAH2-WzmE6AhUdk9NdWBf4K3HjWXZBX3+umC7mH7+WDrKcRtsOw@mail.gmail.com
* Get rid of jsonpath_gram.h and jsonpath_scanner.hAlexander Korotkov2019-03-20
| | | | | | | | | Jsonpath grammar and scanner are both quite small. It doesn't worth complexity to compile them separately. This commit makes grammar and scanner be compiled at once. Therefore, jsonpath_gram.h and jsonpath_gram.h are no longer needed. This commit also does some reorganization of code in jsonpath_gram.y. Discussion: https://postgr.es/m/d47b2023-3ecb-5f04-d253-d557547cf74f%402ndQuadrant.com
* Rename typedef in jsonpath_gram.y from "string" to "JsonPathString"Alexander Korotkov2019-03-19
| | | | Reason is the same as in 75c57058b0.