aboutsummaryrefslogtreecommitdiff
path: root/contrib/postgres_fdw/sql
Commit message (Collapse)AuthorAge
...
* postgres_fdw: Avoid 'variable not found in subplan target list' error.Etsuro Fujita2022-09-14
| | | | | | | | | | | | | | | | | | | | | | The tlist of the EvalPlanQual outer plan for a ForeignScan node is adjusted to produce a tuple whose descriptor matches the scan tuple slot for the ForeignScan node. But in the case where the outer plan contains an extra Sort node, if the new tlist contained columns required only for evaluating PlaceHolderVars or columns required only for evaluating local conditions, this would cause setrefs.c to fail with the error. The cause of this is that when creating the outer plan by injecting the Sort node into an alternative local join plan that could emit such extra columns as well, we fail to arrange for the outer plan to propagate them up through the Sort node, causing setrefs.c to fail to match up them in the new tlist to what is available from the outer plan. Repair. Per report from Alexander Pyhalov. Richard Guo and Etsuro Fujita, reviewed by Alexander Pyhalov and Tom Lane. Backpatch to all supported versions. Discussion: http://postgr.es/m/cfb17bf6dfdf876467bd5ef533852d18%40postgrespro.ru
* postgres_fdw: Disable batch insertion when there are WCO constraints.Etsuro Fujita2022-08-05
| | | | | | | | | | | | | | | | | | | | | | When inserting a view referencing a foreign table that has WITH CHECK OPTION constraints, in single-insert mode postgres_fdw retrieves the data that was actually inserted on the remote side so that the WITH CHECK OPTION constraints are enforced with the data locally, but in batch-insert mode it cannot currently retrieve the data (except for the row first inserted through the view), resulting in enforcing the WITH CHECK OPTION constraints with the data passed from the core (except for the first-inserted row), which led to incorrect results when inserting into a view referencing a foreign table in which a remote BEFORE ROW INSERT trigger changes the rows inserted through the view so that they violate the view's WITH CHECK OPTION constraint. Also, the query inserting into the view caused an assertion failure in assert-enabled builds. Fix these by disabling batch insertion when inserting into such a view. Back-patch to v14 where batch insertion was added. Discussion: https://postgr.es/m/CAPmGK17LpbTZs4m4a_6THP54UBeK9fHvX8aVVA%2BC6yEZDZwQcg%40mail.gmail.com
* Improve performance of ORDER BY / DISTINCT aggregatesDavid Rowley2022-08-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ORDER BY / DISTINCT aggreagtes have, since implemented in Postgres, been executed by always performing a sort in nodeAgg.c to sort the tuples in the current group into the correct order before calling the transition function on the sorted tuples. This was not great as often there might be an index that could have provided pre-sorted input and allowed the transition functions to be called as the rows come in, rather than having to store them in a tuplestore in order to sort them once all the tuples for the group have arrived. Here we change the planner so it requests a path with a sort order which supports the most amount of ORDER BY / DISTINCT aggregate functions and add new code to the executor to allow it to support the processing of ORDER BY / DISTINCT aggregates where the tuples are already sorted in the correct order. Since there can be many ORDER BY / DISTINCT aggregates in any given query level, it's very possible that we can't find an order that suits all of these aggregates. The sort order that the planner chooses is simply the one that suits the most aggregate functions. We take the most strictly sorted variation of each order and see how many aggregate functions can use that, then we try again with the order of the remaining aggregates to see if another order would suit more aggregate functions. For example: SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ... would request the sort order to be {a, b} because {a} is a subset of the sort order of {a,b}, but; SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ... would just pick a plan ordered by {a} (we give precedence to aggregates which are earlier in the targetlist). SELECT agg(a ORDER BY a),agg2(a ORDER BY b),agg3(a ORDER BY b) ... would choose to order by {b} since two aggregates suit that vs just one that requires input ordered by {a}. Author: David Rowley Reviewed-by: Ronan Dunklau, James Coleman, Ranier Vilela, Richard Guo, Tom Lane Discussion: https://postgr.es/m/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com
* postgres_fdw: be more wary about shippability of reg* constants.Tom Lane2022-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | | Don't consider a constant of regconfig or other reg* types to be shippable unless it refers to a built-in object, or an object in an extension that's been marked shippable. Without this restriction, we're too likely to send a constant that will fail to parse on the remote server. For the regconfig type only, consider OIDs up to 16383 to be "built in", rather than the normal cutoff of 9999. Otherwise the initdb-created text search configurations will be considered unshippable, which is unlikely to make anyone happy. It's possible that this new restriction will de-optimize queries that were working satisfactorily before. Users can restore any lost performance by making sure that objects that can be expected to exist on the remote side are in shippable extensions. However, that's not a change that people are likely to be happy about having to make after a minor-release update. Between that consideration and the lack of field complaints, let's just change this in HEAD. Noted while fixing bug #17483, although this is not precisely the problem that that report complained about. Discussion: https://postgr.es/m/1423433.1652722406@sss.pgh.pa.us
* postgres_fdw: set search_path to 'pg_catalog' while deparsing constants.Tom Lane2022-07-17
| | | | | | | | | | | | | The motivation for this is to ensure successful transmission of the values of constants of regconfig and other reg* types. The remote will be reading them with search_path = 'pg_catalog', so schema qualification is necessary when referencing objects in other schemas. Per bug #17483 from Emmanuel Quincerot. Back-patch to all supported versions. (There's some other stuff to do here, but it's less back-patchable.) Discussion: https://postgr.es/m/1423433.1652722406@sss.pgh.pa.us
* Support TRUNCATE triggers on foreign tables.Fujii Masao2022-07-12
| | | | | | | | | | Now some foreign data wrappers support TRUNCATE command. So it's useful to support TRUNCATE triggers on foreign tables for audit logging or for preventing undesired truncation. Author: Yugo Nagata Reviewed-by: Fujii Masao, Ian Lawrence Barwick Discussion: https://postgr.es/m/20220630193848.5b02e0d6076b86617a915682@sraoss.co.jp
* Disable asynchronous execution if using gating Result nodes.Etsuro Fujita2022-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | mark_async_capable_plan(), which is called from create_append_plan() to determine whether subplans are async-capable, failed to take into account that the given subplan created from a given subpath might include a gating Result node if the subpath is a SubqueryScanPath or ForeignPath, causing a segmentation fault there when the subplan created from a SubqueryScanPath includes the Result node, or causing ExecAsyncRequest() to throw an error about an unrecognized node type when the subplan created from a ForeignPath includes the Result node, because in the latter case the Result node was unintentionally considered as async-capable, but we don't currently support executing Result nodes asynchronously. Fix by modifying mark_async_capable_plan() to disable asynchronous execution in such cases. Also, adjust code in the ProjectionPath case in mark_async_capable_plan(), for consistency with other cases, and adjust/improve comments there. is_async_capable_path() added in commit 27e1f1456, which was rewritten to mark_async_capable_plan() in a later commit, has the same issue, causing the error at execution mentioned above, so back-patch to v14 where the aforesaid commit went in. Per report from Justin Pryzby. Etsuro Fujita, reviewed by Zhihong Yu and Justin Pryzby. Discussion: https://postgr.es/m/20220408124338.GK24419%40telsasoft.com
* postgres_fdw: Disable batch insert when BEFORE ROW INSERT triggers exist.Etsuro Fujita2022-04-21
| | | | | | | | | | | Previously, we allowed this, but such triggers might query the table to insert into and act differently if the tuples that have already been processed and prepared for insertion are not there, so disable it in such cases. Back-patch to v14 where batch insert was added. Discussion: https://postgr.es/m/CAPmGK16_uPqsmgK0-LpLSUk54_BoK13bPrhxhfjSoSTVz414hA%40mail.gmail.com
* Allow asynchronous execution in more cases.Etsuro Fujita2022-04-06
| | | | | | | | | | | | | | | | In commit 27e1f1456, create_append_plan() only allowed the subplan created from a given subpath to be executed asynchronously when it was an async-capable ForeignPath. To extend coverage, this patch handles cases when the given subpath includes some other Path types as well that can be omitted in the plan processing, such as a ProjectionPath directly atop an async-capable ForeignPath, allowing asynchronous execution in partitioned-scan/partitioned-join queries with non-Var tlist expressions and more UNION queries. Andrey Lepikhov and Etsuro Fujita, reviewed by Alexander Pyhalov and Zhihong Yu. Discussion: https://postgr.es/m/659c37a8-3e71-0ff2-394c-f04428c76f08%40postgrespro.ru
* Fix postgres_fdw to check shippability of sort clauses properly.Tom Lane2022-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | postgres_fdw would push ORDER BY clauses to the remote side without verifying that the sort operator is safe to ship. Moreover, it failed to print a suitable USING clause if the sort operator isn't default for the sort expression's type. The net result of this is that the remote sort might not have anywhere near the semantics we expect, which'd be disastrous for locally-performed merge joins in particular. We addressed similar issues in the context of ORDER BY within an aggregate function call in commit 7012b132d, but failed to notice that query-level ORDER BY was broken. Thus, much of the necessary logic already existed, but it requires refactoring to be usable in both cases. Back-patch to all supported branches. In HEAD only, remove the core code's copy of find_em_expr_for_rel, which is no longer used and really should never have been pushed into equivclass.c in the first place. Ronan Dunklau, per report from David Rowley; reviews by David Rowley, Ranier Vilela, and myself Discussion: https://postgr.es/m/CAApHDvr4OeC2DBVY--zVP83-K=bYrTD7F8SZDhN4g+pj2f2S-A@mail.gmail.com
* postgres_fdw: Add support for parallel commit.Etsuro Fujita2022-02-24
| | | | | | | | | | | | postgres_fdw commits remote (sub)transactions opened on remote server(s) in a local (sub)transaction one by one when the local (sub)transaction commits. This patch allows it to commit the remote (sub)transactions in parallel to improve performance. This is enabled by the server option "parallel_commit". The default is false. Etsuro Fujita, reviewed by Fujii Masao and David Zhang. Discussion: http://postgr.es/m/CAPmGK17dAZCXvwnfpr1eTfknTGdt%3DhYTV9405Gt5SqPOX8K84w%40mail.gmail.com
* postgres_fdw: Make postgres_fdw.application_name support more escape sequences.Fujii Masao2022-02-18
| | | | | | | | | | | | Commit 6e0cb3dec1 allowed postgres_fdw.application_name to include escape sequences %a (application name), %d (database name), %u (user name) and %p (pid). In addition to them, this commit makes it support the escape sequences for session ID (%c) and cluster name (%C). These are helpful to investigate where each remote transactions came from. Author: Fujii Masao Reviewed-by: Ryohei Takahashi, Kyotaro Horiguchi Discussion: https://postgr.es/m/1041dc9a-c976-049f-9f14-e7d94c29c4b2@oss.nttdata.com
* postgres_fdw: Fix handling of a pending asynchronous request in ↵Etsuro Fujita2022-01-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | postgresReScanForeignScan(). Commit 27e1f1456 failed to process a pending asynchronous request made for a given ForeignScan node in postgresReScanForeignScan() (if any) in cases where we would only reset the next_tuple counter in that function, contradicting the assumption that there should be no pending asynchronous requests that have been made for async-capable subplans for the parent Append node after ReScan. This led to an assert failure in an assert-enabled build. I think this would also lead to mis-rewinding the cursor in that function in the case where we have already fetched one batch for the ForeignScan node and the asynchronous request has been made for the second batch, because even in that case we would just reset the counter when called from that function, so we would fail to execute MOVE BACKWARD ALL. To fix, modify that function to process the asynchronous request before restarting the scan. While at it, add a comment to a function to match other places. Per bug #17344 from Alexander Lakhin. Back-patch to v14 where the aforesaid commit came in. Patch by me. Test case by Alexander Lakhin, adjusted by me. Reviewed and tested by Alexander Lakhin and Dmitry Dolgov. Discussion: https://postgr.es/m/17344-226b78b00de73a7e@postgresql.org
* postgres_fdw: Add regression test for postgres_fdw.application_name GUC.Fujii Masao2022-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 449ab63505 added postgres_fdw.application_name GUC that specifies a value for application_name configuration parameter used when postgres_fdw establishes a connection to a foreign server. Also commit 6e0cb3dec1 allowed it to include escape sequences. Both commits added the regression tests for the GUC, but those tests were reverted by commits 98dbef90eb and 5e64ad3697 because they were unstable and caused some buildfarm members to report the failure. This is the third try to add the regression test for postgres_fdw.application_name GUC. One of issues to make the test unstable was to have used postgres_fdw_disconnect_all() to close the existing remote connections. The test expected that the remote connection and its corresponding backend at the remote server disappeared just after postgres_fdw_disconnect_all() was executed, but it could take a bit time for them to disappear. To make sure that they exit, this commit makes the test use pg_terminate_backend() with the timeout at the remote server, instead. If the timeout is set to greater than zero, this function waits until they are actually terminated (or until the given time has passed). Another issue was that the test didn't take into consideration the case where postgres_fdw.application_name containing some escape sequences was converted to the string larger than NAMEDATALEN. In this case it was truncated to less than NAMEDATALEN when it's passed to the remote server, but the test expected wrongly that full string of application_name was always viewable. This commit changes the test so that it can handle that case. Author: Fujii Masao Reviewed-by: Masahiko Sawada, Hayato Kuroda, Kyotaro Horiguchi Discussion: https://postgr.es/m/3220909.1631054766@sss.pgh.pa.us Discussion: https://postgr.es/m/20211224.180006.2247635208768233073.horikyota.ntt@gmail.com Discussion: https://postgr.es/m/e7b61420-a97b-8246-77c4-a0d48fba5a45@oss.nttdata.com
* postgres_fdw: Revert unstable tests for postgres_fdw.application_name.Fujii Masao2021-12-24
| | | | | | | Commit 6e0cb3dec1 added the tests that check that escape sequences in postgres_fdw.application_name setting are replaced with status information expectedly. But they were unstable and caused some buildfarm members to report the failure. This commit reverts those unstable tests.
* postgres_fdw: Allow postgres_fdw.application_name to include escape sequences.Fujii Masao2021-12-24
| | | | | | | | | | | | | | | | | application_name that used when postgres_fdw establishes a connection to a foreign server can be specified in either or both a connection parameter of a server object and GUC postgres_fdw.application_name. This commit allows those parameters to include escape sequences that begins with % character. Then postgres_fdw replaces those escape sequences with status information. For example, %d and %u are replaced with user name and database name in local server, respectively. This feature enables us to add information more easily to track remote transactions or queries, into application_name of a remote connection. Author: Hayato Kuroda Reviewed-by: Kyotaro Horiguchi, Masahiro Ikeda, Hou Zhijie, Fujii Masao Discussion: https://postgr.es/m/TYAPR01MB5866FAE71C66547C64616584F5EB9@TYAPR01MB5866.jpnprd01.prod.outlook.com Discussion: https://postgr.es/m/TYCPR01MB5870D1E8B949DAF6D3B84E02F5F29@TYCPR01MB5870.jpnprd01.prod.outlook.com
* Remove assertion for ALTER TABLE .. DETACH PARTITION CONCURRENTLYMichael Paquier2021-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | One code path related to this flavor of ALTER TABLE was checking that the relation to detach has to be a normal table or a partitioned table, which would fail if using the command with a different relation kind. Views, sequences and materialized views cannot be part of a partition tree, so these would cause the command to fail anyway, but the assertion was triggered. Foreign tables can be part of a partition tree, and again the assertion would have failed. The simplest solution is just to remove this assertion, so as we get the same failure as the non-concurrent code path. While on it, add a regression test in postgres_fdw for the concurrent partition detach of a foreign table, as per a suggestion from Alexander Lakhin. Issue introduced in 71f4c8c. Reported-by: Alexander Lakhin Author: Michael Paquier, Alexander Lakhin Reviewed-by: Peter Eisentraut, Kyotaro Horiguchi Discussion: https://postgr.es/m/17339-a9e09aaf38a3457a@postgresql.org Backpatch-through: 14
* Improve publication error messagesDaniel Gustafsson2021-11-17
| | | | | | | | | | | | | Commit 81d5995b4b introduced more fine-grained errormessages for incorrect relkinds for publication, while unlogged and temporary tables were reported with using the same message. This provides separate error messages for these types of relpersistence. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Discussion: https://postgr.es/m/CALj2ACW9S=AswyQHjtO6WMcsergMkCBTtzXGrM8DX26DzfeTLQ@mail.gmail.com
* postgres_fdw: suppress casts on constants in limited cases.Tom Lane2021-11-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When deparsing an expression of the form "remote_var OP constant", we'd normally apply a cast to the constant to make sure that the remote parser thinks it's of the same type we do. However, doing so is often not necessary, and it causes problems if the user has intentionally declared the local column as being of a different type than the remote column. A plausible use-case for that is using text to represent a type that's an enum on the remote side. A comparison on such a column will get shipped as "var = 'foo'::text", which blows up on the remote side because there's no enum = text operator. But if we simply leave off the explicit cast, the comparison will do exactly what the user wants. It's possible to do this without major risk of semantic problems, by relying on the longstanding parser heuristic that "if one operand of an operator is of type unknown, while the other one has a known type, assume that the unknown operand is also of that type". Hence, this patch leaves off the cast only if (a) the operator inputs have the same type locally; (b) the constant will print as a string literal or NULL, both of which are initially taken as type unknown; and (c) the non-Const input is a plain foreign Var. Rule (c) guarantees that the remote parser will know the type of the non-Const input; moreover, it means that if this cast-omission does cause any semantic surprises, that can only happen in cases where the local column has a different type than the remote column. That wasn't guaranteed to work anyway, and this patch should represent a net usability gain for such cases. One point that I (tgl) remain slightly uncomfortable with is that we will ignore an implicit RelabelType when deciding if the non-Const input is a plain Var. That makes it a little squishy to argue that the remote should resolve the Const as being of the same type as its Var, because then our Const is not the same type as our Var. However, if we don't do that, then this hack won't work as desired if the user chooses to use varchar rather than text to represent some remote column. That seems useful, so do it like this for now. We might have to give up the RelabelType-ignoring bit if any problems surface. Dian Fay, with review and kibitzing by me Discussion: https://postgr.es/m/C9LU294V7K4F.34LRRDU449O45@lamia
* Improve HINT message that FDW reports when there are no valid options.Fujii Masao2021-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The foreign data wrapper's validator function provides a HINT message with list of valid options for the object specified in CREATE or ALTER command, when the option given in the command is invalid. Previously postgresql_fdw_validator() and the validator functions for postgres_fdw and dblink_fdw worked in that way even there were no valid options in the object, which could lead to the HINT message with empty list (because there were no valid options). For example, ALTER FOREIGN DATA WRAPPER postgres_fdw OPTIONS (format 'csv') reported the following ERROR and HINT messages. This behavior was confusing. ERROR: invalid option "format" HINT: Valid options in this context are: There is no such issue in file_fdw. The validator function for file_fdw reports the HINT message "There are no valid options in this context." instead in that case. This commit improves postgresql_fdw_validator() and the validator functions for postgres_fdw and dblink_fdw so that they do likewise. For example, this change causes the above ALTER FOREIGN DATA WRAPPER command to report the following messages. ERROR: invalid option "nonexistent" HINT: There are no valid options in this context. Author: Kosei Masumura Reviewed-by: Bharath Rupireddy, Fujii Masao Discussion: https://postgr.es/m/557d06cebe19081bfcc83ee2affc98d3@oss.nttdata.com
* Fix null-pointer crash in postgres_fdw's conversion_error_callback.Tom Lane2021-10-06
| | | | | | | | | | | | | | | | Commit c7b7311f6 adjusted conversion_error_callback to always use information from the query's rangetable, to avoid doing catalog lookups in an already-failed transaction. However, as a result of the utterly inadequate documentation for make_tuple_from_result_row, I failed to realize that fsstate could be NULL in some contexts. That led to a crash if we got a conversion error in such a context. Fix by falling back to the previous coding when fsstate is NULL. Improve the commentary, too. Per report from Andrey Borodin. Back-patch to 9.6, like the previous patch. Discussion: https://postgr.es/m/08916396-55E4-4D68-AB3A-BD6066F9E5C0@yandex-team.ru
* Revoke PUBLIC CREATE from public schema, now owned by pg_database_owner.Noah Misch2021-09-09
| | | | | | | | | | | | | This switches the default ACL to what the documentation has recommended since CVE-2018-1058. Upgrades will carry forward any old ownership and ACL. Sites that declined the 2018 recommendation should take a fresh look. Recipes for commissioning a new database cluster from scratch may need to create a schema, grant more privileges, etc. Out-of-tree test suites may require such updates. Reviewed by Peter Eisentraut. Discussion: https://postgr.es/m/20201031163518.GB4039133@rfd.leadboat.com
* postgres_fdw: Revert unstable tests for postgres_fdw.application_name.Fujii Masao2021-09-08
| | | | | | | | | Commit 449ab63505 added the tests that check that postgres_fdw.application_name GUC works as expected. But they were unstable and caused some buildfarm members to report the failure. This commit reverts those unstable tests. Reported-by: Tom Lane as per buildfarm Discussion: https://postgr.es/m/3220909.1631054766@sss.pgh.pa.us
* postgres_fdw: Allow application_name of remote connection to be set via GUC.Fujii Masao2021-09-07
| | | | | | | | | | | | | | | | | | | | This commit adds postgres_fdw.application_name GUC which specifies a value for application_name configuration parameter used when postgres_fdw establishes a connection to a foreign server. This GUC setting always overrides application_name option of the foreign server object. This GUC is useful when we want to specify our own application_name per remote connection. Previously application_name of a remote connection could be set basically only via options of a server object. But which meant that every session connecting to the same foreign server basically should use the same application_name. Also if we want to change the setting, we had to execute "ALTER SERVER ... OPTIONS ..." command. It was inconvenient. Author: Hayato Kuroda Reviewed-by: Masahiro Ikeda, Fujii Masao Discussion: https://postgr.es/m/TYCPR01MB5870D1E8B949DAF6D3B84E02F5F29@TYCPR01MB5870.jpnprd01.prod.outlook.com
* postgres_fdw: Fix issues with generated columns in foreign tables.Etsuro Fujita2021-08-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | postgres_fdw imported generated columns from the remote tables as plain columns, and caused failures like "ERROR: cannot insert a non-DEFAULT value into column "foo"" when inserting into the foreign tables, as it tried to insert values into the generated columns. To fix, we do the following under the assumption that generated columns in a postgres_fdw foreign table are defined so that they represent generated columns in the underlying remote table: * Send DEFAULT for the generated columns to the foreign server on insert or update, not generated column values computed on the local server. * Add to postgresImportForeignSchema() an option "import_generated" to include column generated expressions in the definitions of foreign tables imported from a foreign server. The option is true by default. The assumption seems reasonable, because that would make a query of the postgres_fdw foreign table return values for the generated columns that are consistent with the generated expression. While here, fix another issue in postgresImportForeignSchema(): it tried to include column generated expressions as column default expressions in the foreign table definitions when the import_default option was enabled. Per bug #16631 from Daniel Cherniy. Back-patch to v12 where generated columns were added. Discussion: https://postgr.es/m/16631-e929fe9db0ffc7cf%40postgresql.org
* In postgres_fdw, allow CASE expressions to be pushed to the remote server.Tom Lane2021-07-30
| | | | | | | | | | | This is simple enough except for the need to check whether CaseTestExpr nodes have a collation that is not derived from a remote Var. For that, examine the CASE's "arg" expression and then pass that info down into the recursive examination of the WHEN expressions. Alexander Pyhalov, reviewed by Gilles Darold and myself Discussion: https://postgr.es/m/fda09032e90d85d9b726a41e03f9097f@postgrespro.ru
* postgres_fdw: Fix handling of pending asynchronous requests.Etsuro Fujita2021-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A pending asynchronous request is handled by process_pending_request(), which previously not only processed an in-progress remote query but performed ExecForeignScan() to produce a tuple to return to the local server asynchronously from the result of the remote query. But that led to a server crash when executing a query or led to an "InstrStartNode called twice in a row" or "InstrEndLoop called on running node" failure when doing EXPLAIN ANALYZE of it, in cases where the plan tree for it contained multiple async-capable nodes accessing the same initplan/subplan that contained multiple async-capable nodes scanning the same foreign tables as for the parent async-capable nodes, as reported by Andrey Lepikhov. The reason is that the second step in process_pending_request() invoked when executing the initplan/subplan for one of the parent async-capable nodes caused recursive execution of the initplan/subplan for another of the parent async-capable nodes. To fix, split process_pending_request() into the two steps and postpone the second step until ForeignAsyncConfigureWait() is called for each of the pending asynchronous requests. Also, in ExecAppendAsyncEventWait() we assumed that FDWs would register at least one wait event in a WaitEventSet created there when they were called from ForeignAsyncConfigureWait() in that function, but allow FDWs to register zero wait events in the WaitEventSet; modify ExecAppendAsyncEventWait() to just return in that case. Oversight in commit 27e1f1456. Back-patch to v14 where that commit went in. Andrey Lepikhov and Etsuro Fujita Discussion: https://postgr.es/m/fe5eaa19-1704-e4a4-76ee-3b9d37ade399@postgrespro.ru
* Change the name of the Result Cache node to MemoizeDavid Rowley2021-07-14
| | | | | | | | | | | "Result Cache" was never a great name for this node, but nobody managed to come up with another name that anyone liked enough. That was until David Johnston mentioned "Node Memoization", which Tom Lane revised to just "Memoize". People seem to like "Memoize", so let's do the rename. Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/20210708165145.GG1176@momjian.us Backpatch-through: 14, where Result Cache was introduced
* Rename debug_invalidate_system_caches_always to debug_discard_caches.Tom Lane2021-07-13
| | | | | | | | The name introduced by commit 4656e3d66 was agreed to be unreasonably long. To match this change, rename initdb's recently-added --clobber-cache option to --discard-caches. Discussion: https://postgr.es/m/1374320.1625430433@sss.pgh.pa.us
* Fix crash in postgres_fdw for provably-empty remote UPDATE/DELETE.Tom Lane2021-07-07
| | | | | | | | | | | | | In 86dc90056, I'd written find_modifytable_subplan with the assumption that if the immediate child of a ModifyTable is a Result, it must be a projecting Result with a subplan. However, if the UPDATE or DELETE has a provably-constant-false WHERE clause, that's not so: we'll generate a dummy subplan with a childless Result. Add the missing null-check so we don't crash on such cases. Per report from Alexander Pyhalov. Discussion: https://postgr.es/m/b9a6f53549456b2f3e2fd150dcd79d72@postgrespro.ru
* postgres_fdw: Tighten up allowed values for batch_size, fetch_size options.Fujii Masao2021-07-07
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously the values such as '100$%$#$#', '9,223,372,' were accepted and treated as valid integers for postgres_fdw options batch_size and fetch_size. Whereas this is not the case with fdw_startup_cost and fdw_tuple_cost options for which an error is thrown. This was because endptr was not used while converting strings to integers using strtol. This commit changes the logic so that it uses parse_int function instead of strtol as it serves the purpose by returning false in case if it is unable to convert the string to integer. Note that this function also rounds off the values such as '100.456' to 100 and '100.567' or '100.678' to 101. While on this, use parse_real for fdw_startup_cost and fdw_tuple_cost options. Since parse_int and parse_real are being used for reloptions and GUCs, it is more appropriate to use in postgres_fdw rather than using strtol and strtod directly. Back-patch to v14. Author: Bharath Rupireddy Reviewed-by: Ashutosh Bapat, Tom Lane, Kyotaro Horiguchi, Fujii Masao Discussion: https://postgr.es/m/CALj2ACVMO6wY5Pc4oe1OCgUOAtdjHuFsBDw8R5uoYR86eWFQDA@mail.gmail.com
* Avoid doing catalog lookups in postgres_fdw's conversion_error_callback.Tom Lane2021-07-06
| | | | | | | | | | | | | | | | | | | | As in 50371df26, this is a bad idea since the callback can't really know what error is being thrown and thus whether or not it is safe to attempt catalog accesses. Rather than pushing said accesses into the mainline code where they'd usually be a waste of cycles, we can look at the query's rangetable instead. This change does mean that we'll be printing query aliases (if any were used) rather than the table or column's true name. But that doesn't seem like a bad thing: it's certainly a more useful definition in self-join cases, for instance. In any case, it seems unlikely that any applications would be depending on this detail, so it seems safe to change. Patch by me. Original complaint by Andres Freund; Bharath Rupireddy noted the connection to conversion_error_callback. Discussion: https://postgr.es/m/20210106020229.ne5xnuu6wlondjpe@alap3.anarazel.de
* Further stabilize postgres_fdw test.Tom Lane2021-06-24
| | | | | | | | | | | | | | | | The queries involving ft1_nopw don't stably return the same row anymore. I surmise that an autovacuum hitting "S 1"."T 1" right after the updates introduced by f61db909d/5843659d0 freed some space, changing where subsequent insertions get stored. It's only by good luck that these results were stable before, though, since a LIMIT without ORDER BY isn't well defined, and it's not like we've ever treated that table as append-only in this test script. Since we only really care whether these commands succeed or not, just replace "SELECT *" with "SELECT 1". Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2021-06-23%2019%3A52%3A08
* Stabilize test case added by commit f61db909d.Tom Lane2021-06-20
| | | | | | | | | | | | | | | | | | | | | | Buildfarm members ayu and tern have sometimes shown a different plan than expected for this query. I'd been unable to reproduce that before today, but I finally realized what is happening. If there is a concurrent open transaction (probably an autovacuum run in the buildfarm, but this can also be arranged manually), then the index entries for the rows removed by the DELETE a few lines up are not killed promptly, causing a change in the planner's estimate of the extremal value of ft2.c1, which moves the rowcount estimate for "c1 > 1100" by enough to change the join plan from nestloop to hash. To fix, change the query condition to "c1 > 1000", causing the hash plan to be preferred whether or not a concurrent open transaction exists. Since this UPDATE is tailored to be a no-op, nothing else changes. Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=ayu&dt=2021-06-09%2022%3A45%3A48 Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=ayu&dt=2021-06-13%2022%3A38%3A18 Report: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-06-20%2004%3A55%3A36
* Fix copying data into slots with FDW batchingTomas Vondra2021-06-16
| | | | | | | | | | | | | | | | Commit b676ac443b optimized handling of tuple slots with bulk inserts into foreign tables, so that the slots are initialized only once and reused for all batches. The data was however copied into the slots only after the initialization, inserting duplicate values when the slot gets reused. Fixed by moving the ExecCopySlot outside the init branch. The existing postgres_fdw tests failed to catch this due to inserting data into foreign tables without unique indexes, and then checking only the number of inserted rows. This adds a new test with both a unique index and a check of inserted values. Reported-by: Alexander Pyhalov Discussion: https://postgr.es/m/7a8cf8d56b3d18e5c0bccd6cd42d04ac%40postgrespro.ru
* Adjust batch size in postgres_fdw to not use too many parametersTomas Vondra2021-06-08
| | | | | | | | | | | | | | | | The FE/BE protocol identifies parameters with an Int16 index, which limits the maximum number of parameters per query to 65535. With batching added to postges_fdw this limit is much easier to hit, as the whole batch is essentially a single query, making this error much easier to hit. The failures are a bit unpredictable, because it also depends on the number of columns in the query. So instead of just failing, this patch tweaks the batch_size to not exceed the maximum number of parameters. Reported-by: Hou Zhijie <houzj.fnst@cn.fujitsu.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/OS0PR01MB571603973C0AC2874AD6BF2594299%40OS0PR01MB5716.jpnprd01.prod.outlook.com
* Fix rescanning of async-aware Append nodes.Etsuro Fujita2021-06-07
| | | | | | | | | | | | | | | | | | | | | In cases where run-time pruning isn't required, the synchronous and asynchronous subplans for an async-aware Append node determined using classify_matching_subplans() should be re-used when rescanning the node, but the previous code re-determined them using that function repeatedly each time when rescanning the node, leading to incorrect results in a normal build and an Assert failure in an Assert-enabled build as that function doesn't assume that it's called repeatedly in such cases. Fix the code as mentioned above. My oversight in commit 27e1f1456. While at it, initialize async-related pointers/variables to NULL/zero explicitly in ExecInitAppend() and ExecReScanAppend(), just to be sure. (The variables would have been set to zero before we get to the latter function, but let's do so.) Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CAPmGK16Q4B2_KY%2BJH7rb7wQbw54AUprp7TMekGTd2T1B62yysQ%40mail.gmail.com
* Fix postgres_fdw failure with whole-row Vars of type RECORD.Tom Lane2021-06-04
| | | | | | | | | | | | | | | | | | | | | Commit 86dc90056 expects that FDWs can cope with whole-row Vars for their tables, even if the Vars are marked with vartype RECORDOID. Previously, whole-row Vars generated by the planner had vartype equal to the relevant table's rowtype OID. (The point behind this change is to enable sharing of resjunk columns across inheritance child tables.) It turns out that postgres_fdw fails to cope with this, though through bad fortune none of its test cases exposed that. Things mostly work, but when we try to read back a value of such a Var, the expected rowtype is not available to record_in(). Fortunately, it's not difficult to hack up the tupdesc that controls this process to substitute the foreign table's rowtype for RECORDOID. Thus we can solve the runtime problem while still sharing the resjunk column with other tables. Per report from Alexander Pyhalov. Discussion: https://postgr.es/m/7817fb9ebd6661cdf9b67dec6e129a78@postgrespro.ru
* Fix planner's row-mark code for inheritance from a foreign table.Tom Lane2021-06-02
| | | | | | | | | | | | | | | | Commit 428b260f8 broke planning of cases where row marks are needed (SELECT FOR UPDATE, etc) and one of the query's tables is a foreign table that has regular table(s) as inheritance children. We got the reverse case right, but apparently were thinking that foreign tables couldn't be inheritance parents. Not so; so we need to be able to add a CTID junk column while adding a new child, not only a wholerow junk column. Back-patch to v12 where the faulty code came in. Amit Langote Discussion: https://postgr.es/m/CA+HiwqEmo3FV1LAQ4TVyS2h1WM=kMkZUmbNuZSCnfHvMcUcPeA@mail.gmail.com
* Prevent asynchronous execution of direct foreign-table modifications.Etsuro Fujita2021-05-13
| | | | | | | | | | | | | | | | Commits 27e1f1456 and 86dc90056, which were independently discussed, cause a crash when executing an inherited foreign UPDATE/DELETE query with asynchronous execution enabled, where children of an Append node that is the direct/indirect child of the ModifyTable node are rewritten so as to modify foreign tables directly by postgresPlanDirectModify(); as in that case the direct modifications are executed asynchronously, which is not currently supported by asynchronous execution. Fix by disabling asynchronous execution of the direct modifications in that function. Author: Etsuro Fujita Reviewed-by: Amit Langote Discussion: https://postgr.es/m/CAPmGK158e9sJOfuWxfn%2B0ynrspXQU3JhNjSCbaoeSzMvnga%2Bbw%40mail.gmail.com
* Fix EXPLAIN ANALYZE for async-capable nodes.Etsuro Fujita2021-05-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | EXPLAIN ANALYZE for an async-capable ForeignScan node associated with postgres_fdw is done just by using instrumentation for ExecProcNode() called from the node's callbacks, causing the following problems: 1) If the remote table to scan is empty, the node is incorrectly considered as "never executed" by the command even if the node is executed, as ExecProcNode() isn't called from the node's callbacks at all in that case. 2) The command fails to collect timings for things other than ExecProcNode() done in the node, such as creating a cursor for the node's remote query. To fix these problems, add instrumentation for async-capable nodes, and modify postgres_fdw accordingly. My oversight in commit 27e1f1456. While at it, update a comment for the AsyncRequest struct in execnodes.h and the documentation for the ForeignAsyncRequest API in fdwhandler.sgml to match the code in ExecAsyncAppendResponse() in nodeAppend.c, and fix typos in comments in nodeAppend.c. Per report from Andrey Lepikhov, though I didn't use his patch. Reviewed-by: Andrey Lepikhov Discussion: https://postgr.es/m/2eb662bb-105d-fc20-7412-2f027cc3ca72%40postgrespro.ru
* Disable cache clobber to avoid breaking postgres_fdw termination test.Tom Lane2021-05-04
| | | | | | | | | | | | | | | | | | | | Commit 93f414614 improved a pre-existing test case so that it would show whether or not termination of the "remote" worker process happened. This soon exposed that, when debug_invalidate_system_caches_always (nee CLOBBER_CACHE_ALWAYS) is enabled, no such termination occurs. That's because cache invalidation forces postgres_fdw connections to be dropped at end of transaction, so that there's no worker to terminate. There's a race condition as to whether the worker will manage to get out of the BackendStatusArray before we look, but at least on buildfarm member hyrax, it's failed twice in two attempts. Rather than re-lobotomizing the test, let's fix this by transiently disabling debug_invalidate_system_caches_always. (Hooray for that being just a GUC nowadays, rather than a compile-time option.) If this proves not to be enough to make the test stable, we can do the other thing instead. Discussion: https://postgr.es/m/3854538.1620081771@sss.pgh.pa.us
* Don't pass "ONLY" options specified in TRUNCATE to foreign data wrapper.Fujii Masao2021-04-27
| | | | | | | | | | | | | | | | | | | | | | Commit 8ff1c94649 allowed TRUNCATE command to truncate foreign tables. Previously the information about "ONLY" options specified in TRUNCATE command were passed to the foreign data wrapper. Then postgres_fdw constructed the TRUNCATE command to issue the remote server and included "ONLY" options in it based on the passed information. On the other hand, "ONLY" options specified in SELECT, UPDATE or DELETE have no effect when accessing or modifying the remote table, i.e., are not passed to the foreign data wrapper. So it's inconsistent to make only TRUNCATE command pass the "ONLY" options to the foreign data wrapper. Therefore this commit changes the TRUNCATE command so that it doesn't pass the "ONLY" options to the foreign data wrapper, for the consistency with other statements. Also this commit changes postgres_fdw so that it always doesn't include "ONLY" options in the TRUNCATE command that it constructs. Author: Fujii Masao Reviewed-by: Bharath Rupireddy, Kyotaro Horiguchi, Justin Pryzby, Zhihong Yu Discussion: https://postgr.es/m/551ed8c1-f531-818b-664a-2cecdab99cd8@oss.nttdata.com
* Simplify tests of postgres_fdw terminating connectionsMichael Paquier2021-04-14
| | | | | | | | | | | | | | | | | | | The tests introduced in 32a9c0b for connections broken and re-established rely on pg_terminate_backend() for their logic. When these were introduced, this function simply sent a signal to a backend without waiting for the operation to complete, and the tests repeatedly looked at pg_stat_activity to check if the operation was completed or not. Since aaf0432, it is possible to define a timeout to make pg_terminate_backend() wait for a certain duration, so make use of it, with a timeout reasonably large enough (3min) to give enough room for the tests to pass even on slow machines. Some measurements show that the tests of postgres_fdw are much faster with this change. For example, on my laptop, they now take 4s instead of 6s. Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACXGY_EfGrMTjKjHy2zi-u1u9rdeioU_fro0T6Jo8t56KQ@mail.gmail.com
* Allow TRUNCATE command to truncate foreign tables.Fujii Masao2021-04-08
| | | | | | | | | | | | | | | | | | | | | | | This commit introduces new foreign data wrapper API for TRUNCATE. It extends TRUNCATE command so that it accepts foreign tables as the targets to truncate and invokes that API. Also it extends postgres_fdw so that it can issue TRUNCATE command to foreign servers, by adding new routine for that TRUNCATE API. The information about options specified in TRUNCATE command, e.g., ONLY, CACADE, etc is passed to FDW via API. The list of foreign tables to truncate is also passed to FDW. FDW truncates the foreign data sources that the passed foreign tables specify, based on those information. For example, postgres_fdw constructs TRUNCATE command using them and issues it to the foreign server. For performance, TRUNCATE command invokes the FDW routine for TRUNCATE once per foreign server that foreign tables to truncate belong to. Author: Kazutaka Onishi, Kohei KaiGai, slightly modified by Fujii Masao Reviewed-by: Bharath Rupireddy, Michael Paquier, Zhihong Yu, Alvaro Herrera, Stephen Frost, Ashutosh Bapat, Amit Langote, Daniel Gustafsson, Ibrar Ahmed, Fujii Masao Discussion: https://postgr.es/m/CAOP8fzb_gkReLput7OvOK+8NHgw-RKqNv59vem7=524krQTcWA@mail.gmail.com Discussion: https://postgr.es/m/CAJuF6cMWDDqU-vn_knZgma+2GMaout68YUgn1uyDnexRhqqM5Q@mail.gmail.com
* postgres_fdw: Allow partitions specified in LIMIT TO to be imported.Fujii Masao2021-04-07
| | | | | | | | | | | | | | | | | | | | Commit f49bcd4ef3 disallowed postgres_fdw to import table partitions. Because all data can be accessed through the partitioned table which is the root of the partitioning hierarchy, importing only partitioned table should allow access to all the data without creating extra objects. This is a reasonable default when importing a whole schema. But there may be the case where users want to explicitly import one of a partitioned tables' partitions. For that use case, this commit allows postgres_fdw to import tables or foreign tables which are partitions of some other table only when they are explicitly specified in LIMIT TO clause. It doesn't change the behavior that any partitions not specified in LIMIT TO are automatically excluded in IMPORT FOREIGN SCHEMA command. Author: Matthias van de Meent Reviewed-by: Bernd Helmle, Amit Langote, Michael Paquier, Fujii Masao Discussion: https://postgr.es/m/CAEze2Whwg4i=mzApMe+PXxCEfgoZmHGqdqQFW7J4bmj_5p6t1A@mail.gmail.com
* postgres_fdw: Add option to control whether to keep connections open.Fujii Masao2021-04-02
| | | | | | | | | | | | | | | | | | This commit adds a new option keep_connections that controls whether postgres_fdw keeps the connections to the foreign server open so that the subsequent queries can re-use them. This option can only be specified for a foreign server. The default is on. If set to off, all connections to the foreign server will be discarded at the end of transaction. Closed connections will be re-established when they are necessary by future queries using a foreign table. This option is useful, for example, when users want to prevent the connections from eating up the foreign servers connections capacity. Author: Bharath Rupireddy Reviewed-by: Alexey Kondratov, Vignesh C, Fujii Masao Discussion: https://postgr.es/m/CALj2ACVvrp5=AVp2PupEm+nAC8S4buqR3fJMmaCoc7ftT0aD2A@mail.gmail.com
* Add Result Cache executor node (take 2)David Rowley2021-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
* Revert b6002a796David Rowley2021-04-01
| | | | | | | | | | | | | This removes "Add Result Cache executor node". It seems that something weird is going on with the tracking of cache hits and misses as highlighted by many buildfarm animals. It's not yet clear what the problem is as other parts of the plan indicate that the cache did work correctly, it's just the hits and misses that were being reported as 0. This is especially a bad time to have the buildfarm so broken, so reverting before too many more animals go red. Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com
* Add Result Cache executor nodeDavid Rowley2021-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com