aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Skip full index scan during cleanup of B-tree indexes when possibleTeodor Sigaev2018-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vacuum of index consists from two stages: multiple (zero of more) ambulkdelete calls and one amvacuumcleanup call. When workload on particular table is append-only, then autovacuum isn't intended to touch this table. However, user may run vacuum manually in order to fill visibility map and get benefits of index-only scans. Then ambulkdelete wouldn't be called for indexes of such table (because no heap tuples were deleted), only amvacuumcleanup would be called In this case, amvacuumcleanup would perform full index scan for two objectives: put recyclable pages into free space map and update index statistics. This patch allows btvacuumclanup to skip full index scan when two conditions are satisfied: no pages are going to be put into free space map and index statistics isn't stalled. In order to check first condition, we store oldest btpo_xact in the meta-page. When it's precedes RecentGlobalXmin, then there are some recyclable pages. In order to check second condition we store number of heap tuples observed during previous full index scan by cleanup. If fraction of newly inserted tuples is less than vacuum_cleanup_index_scale_factor, then statistics isn't considered to be stalled. vacuum_cleanup_index_scale_factor can be defined as both reloption and GUC (default). This patch bumps B-tree meta-page version. Upgrade of meta-page is performed "on the fly": during VACUUM meta-page is rewritten with new version. No special handling in pg_upgrade is required. Author: Masahiko Sawada, Alexander Korotkov Review by: Peter Geoghegan, Kyotaro Horiguchi, Alexander Korotkov, Yura Sokolov Discussion: https://www.postgresql.org/message-id/flat/CAD21AoAX+d2oD_nrd9O2YkpzHaFr=uQeGr9s1rKC3O4ENc568g@mail.gmail.com
* Don't clone internal triggers to partitionsAlvaro Herrera2018-04-03
| | | | | | | | | | | | | | Trigger cloning to partitions was supposed to occur for user-visible triggers only, but during development the protection that prevented it from occurring to internal triggers was lost. Reinstate it, as well as add a test case to ensure internal triggers (in the tested case, triggers implementing a deferred unique constraint) are not cloned. Without the code fix, the partitions in the test end up with different numbers of triggers, which is clearly wrong ... Bug in 86f575948c77. Discussion: https://postgr.es/m/20180403214903.ozfagwjcpk337uw7@alvherre.pgsql
* Pass correct TupDesc to ri_NullCheck() in AssertAlvaro Herrera2018-04-03
| | | | | | | | | | | | | Previous coding was passing the wrong table's tuple descriptor, which accidentally fails to fail because no existing test case exercises a foreign key in which the referenced attributes are further to the right of the referencing attributes. Add a test so that further breakage is visible. This got broken in 16828d5c0273. Discussion: https://postgr.es/m/20180403204723.fqte755nukgm42uf@alvherre.pgsql
* Prevent accidental linking of system-supplied copies of libpq.so etc.Tom Lane2018-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were being careless in some places about the order of -L switches in link command lines, such that -L switches referring to external directories could come before those referring to directories within the build tree. This made it possible to accidentally link a system-supplied library, for example /usr/lib/libpq.so, in place of the one built in the build tree. Hilarity ensued, the more so the older the system-supplied library is. To fix, break LDFLAGS into two parts, a sub-variable LDFLAGS_INTERNAL and the main LDFLAGS variable, both of which are "recursively expanded" so that they can be incrementally adjusted by different makefiles. Establish a policy that -L switches for directories in the build tree must always be added to LDFLAGS_INTERNAL, while -L switches for external directories must always be added to LDFLAGS. This is sufficient to ensure a safe search order. For simplicity, we typically also put -l switches for the respective libraries into those same variables. (Traditional make usage would have us put -l switches into LIBS, but cleaning that up is a project for another day, as there's no clear need for it.) This turns out to also require separating SHLIB_LINK into two variables, SHLIB_LINK and SHLIB_LINK_INTERNAL, with a similar rule about which switches go into which variable. And likewise for PG_LIBS. Although this change might appear to affect external users of pgxs.mk, I think it doesn't; they shouldn't have any need to touch the _INTERNAL variables. In passing, tweak src/common/Makefile so that the value of CPPFLAGS recorded in pg_config lacks "-DFRONTEND" and the recorded value of LDFLAGS lacks "-L../../../src/common". Both of those things are mistakes, apparently introduced during prior code rearrangements, as old versions of pg_config don't print them. In general we don't want anything that's specific to the src/common subdirectory to appear in those outputs. This is certainly a bug fix, but in view of the lack of field complaints, I'm unsure whether it's worth the risk of back-patching. In any case it seems wise to see what the buildfarm makes of it first. Discussion: https://postgr.es/m/25214.1522604295@sss.pgh.pa.us
* C comment: mention null handling in BuildTupleFromCStrings()Bruce Momjian2018-04-03
| | | | | | Discussion: https://postgr.es/m/CAFjFpRcF-wNbe0w-m3NpkEwr9shmOZ=GoESOzd2Wog9h55J8sA@mail.gmail.com Author: Ashutosh Bapat
* Add prefix operator for TEXT type.Teodor Sigaev2018-04-03
| | | | | | | | | | | | The prefix operator along with SP-GiST indexes can be used as an alternative for LIKE 'word%' commands and it doesn't have a limitation of string/prefix length as B-Tree has. Bump catalog version Author: Ildus Kurbangaliev with some editorization by me Review by: Arthur Zakirov, Alexander Korotkov, and me Discussion: https://www.postgresql.org/message-id/flat/20180202180327.222b04b3@wp.localdomain
* Properly use INT64_FORMAT in outputMagnus Hagander2018-04-03
| | | | Per buildfarm animal prairiedog, suggestion solution from Tom.
* Fix for checksum validation patchMagnus Hagander2018-04-03
| | | | | | | Reorder the check for non-BLCKSZ size reads to make sure we don't abort sending the file in this case. Missed in the previous commit.
* Validate page level checksums in base backupsMagnus Hagander2018-04-03
| | | | | | | | | | | | | | | When base backups are run over the replication protocol (for example using pg_basebackup), verify the checksums of all data blocks if checksums are enabled. If checksum failures are encountered, log them as warnings but don't abort the backup. This becomes the default behaviour in pg_basebackup (provided checksums are enabled on the server), so add a switch (-k) to disable the checks if necessary. Author: Michael Banck Reviewed-By: Magnus Hagander, David Steele Discussion: https://postgr.es/m/20180228180856.GE13784@nighthawk.caipicrew.dd-dns.de
* WITH support in MERGESimon Riggs2018-04-03
| | | | | | Author: Peter Geoghegan Recursive support removed, no tests Docs added by me
* New files for MERGESimon Riggs2018-04-03
|
* MERGE SQL Command following SQL:2016Simon Riggs2018-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows a task that would other require multiple PL statements. e.g. MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular and partitioned tables, including column and row security enforcement, as well as support for row, statement and transition triggers. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used statically from PL/pgSQL. MERGE does not yet support inheritance, write rules, RETURNING clauses, updatable views or foreign tables. MERGE follows SQL Standard per the most recent SQL:2016. Includes full tests and documentation, including full isolation tests to demonstrate the concurrent behavior. This version written from scratch in 2017 by Simon Riggs, using docs and tests originally written in 2009. Later work from Pavan Deolasee has been both complex and deep, leaving the lead author credit now in his hands. Extensive discussion of concurrency from Peter Geoghegan, with thanks for the time and effort contributed. Various issues reported via sqlsmith by Andreas Seltenreich Authors: Pavan Deolasee, Simon Riggs Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
* Revert "MERGE SQL Command following SQL:2016"Simon Riggs2018-04-02
| | | | This reverts commit e6597dc3533946b98acba7871bd4ca1f7a3d4c1d.
* Revert "Modified files for MERGE"Simon Riggs2018-04-02
| | | | This reverts commit 354f13855e6381d288dfaa52bcd4f2cb0fd4a5eb.
* Modified files for MERGESimon Riggs2018-04-02
|
* MERGE SQL Command following SQL:2016Simon Riggs2018-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows a task that would other require multiple PL statements. e.g. MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular and partitioned tables, including column and row security enforcement, as well as support for row, statement and transition triggers. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used statically from PL/pgSQL. MERGE does not yet support inheritance, write rules, RETURNING clauses, updatable views or foreign tables. MERGE follows SQL Standard per the most recent SQL:2016. Includes full tests and documentation, including full isolation tests to demonstrate the concurrent behavior. This version written from scratch in 2017 by Simon Riggs, using docs and tests originally written in 2009. Later work from Pavan Deolasee has been both complex and deep, leaving the lead author credit now in his hands. Extensive discussion of concurrency from Peter Geoghegan, with thanks for the time and effort contributed. Various issues reported via sqlsmith by Andreas Seltenreich Authors: Pavan Deolasee, Simon Riggs Reviewers: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
* Fix some dubious WAL-parsing code.Tom Lane2018-04-02
| | | | | | | | | | Coverity complained about possible buffer overrun in two places added by commit 1eb6d6527, and AFAICS it's reasonable to worry: even granting that the WAL originator properly truncated the commit GID to GIDSIZE, we should not really bet our lives on that having the same value as it does in the current build. Hence, use strlcpy() not strcpy(), and adjust the pointer advancement logic to be sure we skip over the whole source string even if strlcpy() truncated it.
* Make be-secure-common.c more consistent for future SSL implementationsPeter Eisentraut2018-04-02
| | | | | | | | | | | | | | | | | Recent commit 8a3d9425 has introduced be-secure-common.c, which is aimed at including backend-side APIs that can be used by any SSL implementation. The purpose is similar to fe-secure-common.c for the frontend-side APIs. However, this has forgotten to include check_ssl_key_file_permissions() in the move, which causes a double dependency between be-secure.c and be-secure-openssl.c. Refactor the code in a more logical way. This also puts into light an API which is usable by future SSL implementations for permissions on SSL key files. Author: Michael Paquier <michael@paquier.xyz>
* postgres_fdw: Push down partition-wise aggregation.Robert Haas2018-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 7012b132d07c2b4ea15b0b3cb1ea9f3278801d98, postgres_fdw has been able to push down the toplevel aggregation operation to the remote server. Commit e2f1eb0ee30d144628ab523432320f174a2c8966 made it possible to break down the toplevel aggregation into one aggregate per partition. This commit lets postgres_fdw push down aggregation in that case just as it does at the top level. In order to make this work, this commit adds an additional argument to the GetForeignUpperPaths FDW API. A matching argument is added to the signature for create_upper_paths_hook. Third-party code using either of these will need to be updated. Also adjust create_foreignscan_plan() so that it picks up the correct set of relids in this case. Jeevan Chalke, reviewed by Ashutosh Bapat and by me and with some adjustments by me. The larger patch series of which this patch is a part was also reviewed and tested by Antonin Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal Legrand, and Rafia Sabih. Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com Discussion: http://postgr.es/m/CAM2+6=XPWujjmj5zUaBTGDoB38CemwcPmjkRy0qOcsQj_V+2sQ@mail.gmail.com
* Fix a boatload of typos in C comments.Tom Lane2018-04-01
| | | | | | Justin Pryzby Discussion: https://postgr.es/m/20180331105640.GK28454@telsasoft.com
* Fix non-portable use of round().Andres Freund2018-03-31
| | | | | | | | | | | | | | | round() is from C99. Use rint() instead. There are behavioral differences between round() and rint(), but they should not matter to the Bloom filter optimal_k() function. We already assume POSIX behavior for rint(), so there is no question of rint() not using "rounds towards nearest" as its rounding mode. Cleanup from commit 51bc271790eb234a1ba4d14d3e6530f70de92ab5. Per buildfarm member thrips. Author: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wzn76eCGUonARy-wrVtMHsf+4cvbK_oJAWTLfORTU5ki0w@mail.gmail.com
* Add Bloom filter implementation.Andres Freund2018-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | A Bloom filter is a space-efficient, probabilistic data structure that can be used to test set membership. Callers will sometimes incur false positives, but never false negatives. The rate of false positives is a function of the total number of elements and the amount of memory available for the Bloom filter. Two classic applications of Bloom filters are cache filtering, and data synchronization testing. Any user of Bloom filters must accept the possibility of false positives as a cost worth paying for the benefit in space efficiency. This commit adds a test harness extension module, test_bloomfilter. It can be used to get a sense of how the Bloom filter implementation performs under varying conditions. This is infrastructure for the upcoming "heapallindexed" amcheck patch, which verifies the consistency of a heap relation against one of its indexes. Author: Peter Geoghegan Reviewed-By: Andrey Borodin, Michael Paquier, Thomas Munro, Andres Freund Discussion: https://postgr.es/m/CAH2-Wzm5VmG7cu1N-H=nnS57wZThoSDQU+F5dewx3o84M+jY=g@mail.gmail.com
* Small cleanups in fast default code.Andrew Dunstan2018-04-01
| | | | Problems identified by Andres Freund and Haribabu Kommi
* Remove PARTIAL_LINKING build mode.Andres Freund2018-03-30
| | | | | | | | | | | | | In 9956ddc19164b02dc1925fb389a1af77472eba5e, ten years ago, the current objfile.txt based linking model was introduced. It's time to retire the old SUBSYS.o based model. This primarily is pertinent because the bitcode files for LLVM based inlining are not produced when using PARTIAL_LINKING. It does not seem worth to fix PARTIAL_LINKING to support that. Author: Andres Freund Discussion: https://postgr.es/m/20180121204356.d5oeu34jetqhmdv2@alap3.anarazel.de
* Fix bug with view locking code.Tatsuo Ishii2018-03-31
| | | | | | | | | | | | | | | LockViewRecurese() obtains view relation using heap_open() and passes it to get_view_query() to get view info. It immediately closes the relation then uses the returned view info by calling LockViewRecurse_walker(). Since get_view_query() returns a pointer within the relcache, the relcache should be kept until LockViewRecurse_walker() returns. Otherwise the relation could point to a garbage memory area. Fix is moving the heap_close() call after LockViewRecurse_walker(). Problem reported by Tom Lane (buildfarm is unhappy, especially prion since it enables -DRELCACHE_FORCE_RELEASE cpp flag), fix by me.
* Add SKIP_LOCKED option to RangeVarGetRelidExtended().Andres Freund2018-03-30
| | | | | | | | This will be used for VACUUM (SKIP LOCKED). Author: Nathan Bossart Reviewed-By: Michael Paquier and Andres Freund Discussion: https://postgr.es/m/20180306005349.b65whmvj7z6hbe2y@alap3.anarazel.de
* Combine options for RangeVarGetRelidExtended() into a flags argument.Andres Freund2018-03-30
| | | | | | | | | | | | | A followup patch will add a SKIP_LOCKED option. To avoid introducing evermore arguments, breaking existing callers each time, introduce a flags argument. This'll no doubt break a few external users... Also change the MISSING_OK behaviour so a DEBUG1 debug message is emitted when a relation is not found. Author: Nathan Bossart Reviewed-By: Michael Paquier and Andres Freund Discussion: https://postgr.es/m/20180306005349.b65whmvj7z6hbe2y@alap3.anarazel.de
* Enhance pg_stat_wal_receiver view to display host and port of sender server.Fujii Masao2018-03-31
| | | | | | | | | | | | | | | | Previously there was no way in the standby side to find out the host and port of the sender server that the walreceiver was currently connected to when multiple hosts and ports were specified in primary_conninfo. For that purpose, this patch adds sender_host and sender_port columns into pg_stat_wal_receiver view. They report the host and port that the active replication connection currently uses. Bump catalog version. Author: Haribabu Kommi Reviewed-by: Michael Paquier and me Discussion: https://postgr.es/m/CAJrrPGcV_aq8=cdqkFhVDJKEnDQ70yRTTdY9RODzMnXNrCz2Ow@mail.gmail.com
* Ensure that WAL pages skipped by a forced WAL switch are zero-filled.Tom Lane2018-03-30
| | | | | | | | | | | | In the previous coding, skipped pages were mostly zeroes, but they still had valid WAL page headers. That makes them very much less compressible than an unbroken string of zeroes would be --- about 10X worse for bzip2 compression, for instance. We don't need those headers, so tweak the logic so that we zero them out. Chapman Flack, reviewed by Daniel Gustafsson Discussion: https://postgr.es/m/579297F8.7020107@anastigmatix.net
* Remove obsolete SLRU wrapping and warnings from predicate.c.Tom Lane2018-03-30
| | | | | | | | | | | | | | | | When SSI was developed, slru.c was limited to segment files with names in the range 0000-FFFF. This didn't allow enough space for predicate.c to store every possible XID when spilling old transactions to disk, so it would wrap around sooner and print warnings. Since commits 638cf09e and 73c986ad increased the number of segment files slru.c could manage, that behavior is unnecessary. Therefore remove that code. Also remove the macro OldSerXidSegment, which has been unused since 4cd3fb6e. Thomas Munro, reviewed by Anastasia Lubennikova Discussion: https://postgr.es/m/CAEepm=3XfsTSxgEbEOmxu0QDiXy0o18NUg2nC89JZcCGE+XFPA@mail.gmail.com
* Improve out-of-memory error reports by including memory context name.Tom Lane2018-03-30
| | | | | | | | | | | | | | | | | Add the target context's name to the errdetail field of "out of memory" errors in mcxt.c. Per discussion, this seems likely to be useful to help narrow down the cause of a reported failure, and it costs little. Also, now that context names are required to be compile-time constants in all cases, there's little reason to be concerned about security issues from exposing these names to users. (Because of such concerns, we are *not* including the context "ident" field.) In passing, add unlikely() markers to the allocation-failed tests, just to be sure the compiler is on the right page about that. Also, in palloc and friends, copy CurrentMemoryContext into a local variable, as that's almost surely cheaper to reference than a global. Discussion: https://postgr.es/m/1099.1522285628@sss.pgh.pa.us
* Do index FSM vacuuming sooner.Tom Lane2018-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | In btree and SP-GiST indexes, move the responsibility for calling IndexFreeSpaceMapVacuum from the vacuumcleanup phase to the bulkdelete phase, and do it if and only if we found some pages that could be put into FSM. As in commit 851a26e26, the idea is to make free pages visible to FSM searchers sooner when vacuuming very large tables (large enough to need multiple bulkdelete scans). This adds more redundant work than that commit did, since we have to scan the entire index FSM each time rather than being able to localize what needs to be updated; but it still seems worthwhile. However, we can buy something back by not touching the FSM at all when there are no pages that can be put in it. That will result in slower recovery from corrupt upper FSM pages in such a scenario, but it doesn't seem like that's a case we need to optimize for. Hash indexes don't use FSM at all. GIN, GiST, and bloom indexes update FSM during the vacuumcleanup phase not bulkdelete, so that doing something comparable to this would be a much more invasive change, and it's not clear it's worth it. BRIN indexes do things sufficiently differently that this change doesn't apply to them, either. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
* Don't call IS_DUMMY_REL() when cheapest_total_path might be junk.Robert Haas2018-03-30
| | | | | | | | | | | Unlike the previous coding, this might result in a Gather per Append subplan when the target list is parallel-restricted, but such a plan is probably worth considering in that case, since a single Gather on top of the entire Append is impossible. Per Andres Freund and the buildfarm. Discussion: http://postgr.es/m/20180330050351.bmxx4cdtz67czjda@alap3.anarazel.de
* Predicate locking in GIN indexTeodor Sigaev2018-03-30
| | | | | | | | | | | Predicate locks are used on per page basis only if fastupdate = off, in opposite case predicate lock on pending list will effectively lock whole index, to reduce locking overhead, just lock a relation. Entry and posting trees are essentially B-tree, so locks are acquired on leaf pages only. Author: Shubham Barai with some editorization by me and Dmitry Ivanov Review by: Alexander Korotkov, Dmitry Ivanov, Fedor Sigaev Discussion: https://www.postgresql.org/message-id/flat/CALxAEPt5sWW+EwTaKUGFL5_XFcZ0MuGBcyJ70oqbWqr42YKR8Q@mail.gmail.com
* Fix typo in commentMagnus Hagander2018-03-30
| | | | Author: Michael Paquier <michael@paquier.xyz>
* Allow to lock views.Tatsuo Ishii2018-03-30
| | | | | | | | | | Now all tables used in view definitions can be recursively locked by a LOCK command. Author: Yugo Nagata Reviewed by Robert Haas, Thomas Munro and me. Discussion: https://postgr.es/m/20171011183629.eb2817b3.nagata%40sraoss.co.jp
* Improve JIT docs.Andres Freund2018-03-29
| | | | | Author: John Naylor and Andres Freund Discussion: https://postgr.es/m/CAJVSVGUs-VcwSY7-Kx-GQe__8hvWuA4Uhyf3gxoMXeiZqebE9g@mail.gmail.com
* Remove 'target' from GroupPathExtraData.Robert Haas2018-03-29
| | | | | | | | It's not needed. Jeevan Chalke Discussion: http://postgr.es/m/CAM2+6=XPWujjmj5zUaBTGDoB38CemwcPmjkRy0qOcsQj_V+2sQ@mail.gmail.com
* Rewrite the code that applies scan/join targets to paths.Robert Haas2018-03-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If the toplevel scan/join target list is parallel-safe, postpone generating Gather (or Gather Merge) paths until after the toplevel has been adjusted to return it. This (correctly) makes queries with expensive functions in the target list more likely to choose a parallel plan, since the cost of the plan now reflects the fact that the evaluation will happen in the workers rather than the leader. The original complaint about this problem was from Jeff Janes. If the toplevel scan/join relation is partitioned, recursively apply the changes to all partitions. This sometimes allows us to get rid of Result nodes, because Append is not projection-capable but its children may be. It also cleans up what appears to be incorrect SRF handling from commit e2f1eb0ee30d144628ab523432320f174a2c8966: the old code had no knowledge of SRFs for child scan/join rels. Because we now use create_projection_path() in some cases where we formerly used apply_projection_to_path(), this changes the ordering of columns in some queries generated by postgres_fdw. Update regression outputs accordingly. Patch by me, reviewed by Amit Kapila and by Ashutosh Bapat. Other fixes for this problem (substantially different from this version) were reviewed by Dilip Kumar, Amit Khandekar, and Marina Polyakova. Discussion: http://postgr.es/m/CAMkU=1ycXNipvhWuweUVpKuyu6SpNjF=yHWu4c4US5JgVGxtZQ@mail.gmail.com
* Postpone generate_gather_paths for topmost scan/join rel.Robert Haas2018-03-29
| | | | | | | | | | | | | | | Don't call generate_gather_paths for the topmost scan/join relation when it is initially populated with paths. Instead, do the work in grouping_planner. By itself, this gains nothing; in fact it loses slightly because we end up calling set_cheapest() for the topmost scan/join rel twice rather than once. However, it paves the way for a future commit which will postpone generate_gather_paths for the topmost scan/join relation even further, allowing more accurate costing of parallel paths. Amit Kapila and Robert Haas. Earlier versions of this patch (which different substantially) were reviewed by Dilip Kumar, Amit Khandekar, Marina Polyakova, and Ashutosh Bapat.
* Teach create_projection_plan to omit projection where possible.Robert Haas2018-03-29
| | | | | | | | | | | | | | | | | We sometimes insert a ProjectionPath into a plan tree when projection is not strictly required. The existing code already arranges to avoid emitting a Result node when the ProjectionPath's subpath can perform the projection itself, but previously it didn't consider the possibility that the parent node might not actually require the projection to be performed at all. Skipping projection when it's not required can not only avoid Result nodes that aren't needed, but also avoid losing the "physical tlist" optimization unneccessarily. Patch by me, reviewed by Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoakT5gmahbPWGqrR2nAdFOMAOnOXYoWHRdVfGWs34t6_A@mail.gmail.com
* C comments: "a" <--> "an" correctionsBruce Momjian2018-03-29
| | | | | | | | Reported-by: Michael Paquier, Abhijit Menon-Sen Discussion: https://postgr.es/m/20180305045854.GB2266@paquier.xyz Author: Michael Paquier, Abhijit Menon-Sen, me
* README change: update for hash access methodBruce Momjian2018-03-29
| | | | | | Reported-by: Thomas Munro, Justin Pryzby Discussion: https://postgr.es/m/CAEepm=1_682z-09DNHj4GkCJAqWK-D6h9Oq5ea84T1oqq1-Utg@mail.gmail.com
* Remove unnecessary BufferGetPage() calls in fsm_vacuum_page().Tom Lane2018-03-29
| | | | | | | | Just noticed that these were quite redundant, since we're holding the page address in a local variable anyway, and we have pin on the buffer throughout. Also improve a comment.
* Remove UpdateFreeSpaceMap(), use FreeSpaceMapVacuumRange() instead.Tom Lane2018-03-29
| | | | | | | | | | | | | | | | | FreeSpaceMapVacuumRange has the same effect, is more efficient if many pages are involved, and makes fewer assumptions about how it's used. Notably, Claudio Freire pointed out that UpdateFreeSpaceMap could fail if the specified freespace value isn't the maximum possible. This isn't a problem for the single existing user, but the function represents an attractive nuisance IMO, because it's named as though it were a general-purpose update function and its limitations are undocumented. In any case we don't need multiple ways to get the same result. In passing, do some code review and cleanup in RelationAddExtraBlocks. In particular, I see no excuse for it to omit the PageIsNew safety check that's done in the mainline extension path in RelationGetBufferForTuple. Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
* C comment: fix wording about shared memory message queueBruce Momjian2018-03-29
| | | | | | Reported-by: Tels Discussion: https://postgr.es/m/e66e05bc55f5ce904e361ad17a3395ae.squirrel@sm.webmail.pair.com
* While vacuuming a large table, update upper-level FSM data every so often.Tom Lane2018-03-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VACUUM updates leaf-level FSM entries immediately after cleaning the corresponding heap blocks. fsmpage.c updates the intra-page search trees on the leaf-level FSM pages when this happens, but it does not touch the upper-level FSM pages, so that the released space might not actually be findable by searchers. Previously, updating the upper-level pages happened only at the conclusion of the VACUUM run, in a single FreeSpaceMapVacuum() call. This is bad because the VACUUM might get canceled before ever reaching that point, so that from the point of view of searchers no space has been freed at all, leading to table bloat. We can improve matters by updating the upper pages immediately after each cycle of index-cleaning and heap-cleaning, processing just the FSM pages corresponding to the range of heap blocks we have now fully cleaned. This adds a small amount of extra work, since the FSM pages leading down to each range boundary will be touched twice, but it's pretty negligible compared to everything else going on in a large VACUUM. If there are no indexes, VACUUM doesn't work in cycles but just cleans each heap page on first visit. In that case we just arbitrarily update upper FSM pages after each 8GB of heap. That maintains the goal of not letting all this work slide until the very end, and it doesn't seem worth expending extra complexity on a case that so seldom occurs in practice. In either case, the FSM is fully up to date before any attempt is made to truncate the relation, so that the most likely scenario for VACUUM cancellation no longer results in out-of-date upper FSM pages. When we do successfully truncate, adjusting the FSM to reflect that is now fully handled within FreeSpaceMapTruncateRel. Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional tweaks by me Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
* Add casts from jsonbTeodor Sigaev2018-03-29
| | | | | | | | | | | | | Add explicit cast from scalar jsonb to all numeric and bool types. It would be better to have cast from scalar jsonb to text too but there is already a cast from jsonb to text as just text representation of json. There is no way to have two different casts for the same type's pair. Bump catalog version Author: Anastasia Lubennikova with editorization by Nikita Glukhov and me Review by: Aleksander Alekseev, Nikita Glukhov, Darafei Praliaskouski Discussion: https://www.postgresql.org/message-id/flat/0154d35a-24ae-f063-5273-9ffcdf1c7f2e@postgrespro.ru
* Fix typo in commentMagnus Hagander2018-03-29
| | | | Arthur Zakirov, confirmed by Thomas Munro
* Allow committing inside cursor loopPeter Eisentraut2018-03-28
| | | | | | | | | | | | | | | Previously, committing or aborting inside a cursor loop was prohibited because that would close and remove the cursor. To allow that, automatically convert such cursors to holdable cursors so they survive commits or rollbacks. Portals now have a new state "auto-held", which means they have been converted automatically from pinned. An auto-held portal is kept on transaction commit or rollback, but is still removed when returning to the main loop on error. This supports all languages that have cursor loop constructs: PL/pgSQL, PL/Python, PL/Perl. Reviewed-by: Ildus Kurbangaliev <i.kurbangaliev@postgrespro.ru>