aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt
Commit message (Collapse)AuthorAge
* Adjust populate_record_field() to handle errors softlyAmit Langote2024-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a Node *escontext parameter to it and a bunch of functions downstream to it, replacing any ereport()s in that path by either errsave() or ereturn() as appropriate. This also adds code to those functions where necessary to return early upon encountering a soft error. The changes here are mainly intended to suppress errors in the functions of jsonfuncs.c. Functions in any external modules, such as arrayfuncs.c, that those functions may in turn call are not changed here based on the assumption that the various checks in jsonfuncs.c functions should ensure that only values that are structurally valid get passed to the functions in those external modules. An exception is made for domain_check() to allow handling domain constraint violation errors softly. For testing, this adds a function jsonb_populate_record_valid(), which returns true if jsonb_populate_record() would finish without causing an error for the provided JSON object, false otherwise. Note that jsonb_populate_record() internally calls populate_record(), which in turn uses populate_record_field(). Extracted from a much larger patch to add SQL/JSON query functions. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He, Peter Eisentraut Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqHROpf9e644D8BRqYvaAPmgBZVup-xKMDPk-nd4EpgzHw@mail.gmail.com Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com
* Add planner support functions for range operators <@ and @>.Tom Lane2024-01-20
| | | | | | | | | | | | | | | | These support functions will transform expressions with constant range values into direct comparisons on the range bound values, which are frequently better-optimizable. The transformation is skipped however if it would require double evaluation of a volatile or expensive element expression. Along the way, add the range opfamily OID to range typcache entries, since load_rangetype_info has to compute that anyway and it seems silly to duplicate the work later. Kim Johan Andersson and Jian He, reviewed by Laurenz Albe Discussion: https://postgr.es/m/94f64d1f-b8c0-b0c5-98bc-0793a34e0851@kimmet.dk
* Add support for parsing of large XML data (>= 10MB)Michael Paquier2024-01-17
| | | | | | | | | | | | | | | | | | | This commit adds XML_PARSE_HUGE to the libxml2 functions used in core for the parsing of XML objects, raising up the original limit of 10MB supported by libxml2. In most code paths of upstream, XML_MAX_TEXT_LENGTH (10^7) is the historical limit that gets upgraded to XML_MAX_HUGE_LENGTH (10^9) once XML_PARSE_HUGE is given to the parser calls. These are still limited by any palloc() calls for text, up to 1GB. This offers the possibility to handle within the backend XML objects larger than 10MB in general, with also a higher depth limit. This change affects the contrib module xml2, the xml data type and SQL/XML. Author: Dmitry Koval Reviewed-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/18274-98d16bc03520665f@postgresql.org
* struct XmlTableRoutine: use C99 designated initializersAlvaro Herrera2024-01-16
| | | | | | | | As in c27f8621eed et al. Not as critical as other cases we've handled, but I figure if we're going to add JsonbTableRoutine using TableFuncRoutine, this makes it easier to jump around the code.
* Allow examine_simple_variable() to work on INSERT RETURNING Vars.Tom Lane2024-01-08
| | | | | | | | | | | | | | | | | | | | | Since commit 599b33b94, this function assumed that every RTE_RELATION RangeTblEntry would have an associated RelOptInfo. But that's not so: we only build RelOptInfos for relations that are scanned by the query. In particular the target of an INSERT won't have one, so that Vars appearing in an INSERT ... RETURNING list will not have an associated RelOptInfo. This apparently wasn't a problem before commit f7816aec2 taught examine_simple_variable() to drill down into CTEs containing INSERT RETURNING, but it is now. To fix, add a fallback code path that gets the userid to use directly from the RTEPermissionInfo associated with the RTE. (Sadly, we must have two code paths, because not every RTE has a RTEPermissionInfo either.) Per report from Alexander Lakhin. No back-patch, since the case is apparently unreachable before f7816aec2. Discussion: https://postgr.es/m/608a4886-6c60-0f9e-97d5-591256bd4150@gmail.com
* Teach estimate_array_length() to use statistics where available.Tom Lane2024-01-04
| | | | | | | | | | | | | | | | | | | If we have DECHIST statistics about the argument expression, use the average number of distinct elements as the array length estimate. (It'd be better to use the average total number of elements, but that is not currently calculated by compute_array_stats(), and it's unclear that it'd be worth extra effort to get.) To do this, we have to change the signature of estimate_array_length to pass the "root" pointer. While at it, also change its result type to "double". That's probably not really necessary, but it avoids any risk of overflow of the value extracted from DECHIST. All existing callers are going to use the result in a "double" calculation anyway. Paul Jungwirth, reviewed by Jian He and myself Discussion: https://postgr.es/m/CA+renyUnM2d+SmrxKpDuAdpiq6FOM=FByvi6aS6yi__qyf6j9A@mail.gmail.com
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Second attempt at organizing jsonpath operators and methodsPeter Eisentraut2024-01-03
| | | | | | | Second attempt at 283a95da923. Since we can't reorder the enum values of JsonPathItemType, instead reorder the switch cases where they are used to generally follow the order of the enum values, for better maintainability.
* Revert "Reorganise jsonpath operators and methods"Peter Eisentraut2024-01-03
| | | | | | | | This reverts commit 283a95da923605c1cc148155db2d865d0801b419. The reordering of JsonPathItemType affects the binary on-disk compatibility of the jsonpath type, so we must not change it. Revert for now and consider.
* Reorganise jsonpath operators and methodsPeter Eisentraut2024-01-03
| | | | | | | | | Various jsonpath operators and methods add various keywords, switch cases, and documentation entries in some order. However, they are not consistent; reorder them for better maintainability or readability. Author: Jeevan Chalke <jeevan.chalke@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAM2+6=XjTyqrrqHAOj80r0wVQxJSxc0iyib9bPC55uFO9VKatg@mail.gmail.com
* Add numeric_int8_opt_error() to optionally suppress errorsPeter Eisentraut2024-01-03
| | | | | | | | | | This matches the existing numeric_int4_opt_error() (see commit 16d489b0fe). It will be used by a future JSON-related patch, which wants to report errors in its own way and thus does not want the internal functions to throw any error. Author: Jeevan Chalke <jeevan.chalke@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAM2+6=XjTyqrrqHAOj80r0wVQxJSxc0iyib9bPC55uFO9VKatg@mail.gmail.com
* jsonpath_exec: fix typo "absense" -> "absence"Robert Haas2024-01-02
| | | | | | Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna. Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org
* tsquery: fix typo "rewrited" -> "rewritten"Robert Haas2024-01-02
| | | | | | Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna. Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org
* Fix typos in comments and in one isolation test.Robert Haas2024-01-02
| | | | | | | Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna. Some subtractions by me. Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org
* Allow upgrades to preserve the full subscription's state.Amit Kapila2024-01-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This feature will allow us to replicate the changes on subscriber nodes after the upgrade. Previously, only the subscription metadata information was preserved. Without the list of relations and their state, it's not possible to re-enable the subscriptions without missing some records as the list of relations can only be refreshed after enabling the subscription (and therefore starting the apply worker). Even if we added a way to refresh the subscription while enabling a publication, we still wouldn't know which relations are new on the publication side, and therefore should be fully synced, and which shouldn't. To preserve the subscription relations, this patch teaches pg_dump to restore the content of pg_subscription_rel from the old cluster by using binary_upgrade_add_sub_rel_state SQL function. This is supported only in binary upgrade mode. The subscription's replication origin is needed to ensure that we don't replicate anything twice. To preserve the replication origins, this patch teaches pg_dump to update the replication origin along with creating a subscription by using binary_upgrade_replorigin_advance SQL function to restore the underlying replication origin remote LSN. This is supported only in binary upgrade mode. pg_upgrade will check that all the subscription relations are in 'i' (init) or in 'r' (ready) state and will error out if that's not the case, logging the reason for the failure. This helps to avoid the risk of any dangling slot or origin after the upgrade. Author: Vignesh C, Julien Rouhaud, Shlok Kyal Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
* Improve the implementation of information_schema._pg_expandarray().Tom Lane2023-12-27
| | | | | | | | | | | | | | | | | | | | | This function was originally coded with a handmade expansion of the array subscripts. We can do it a little faster and far more legibly today, by using unnest() WITH ORDINALITY. While at it, let's apply the rowcount estimation support that exists for the underlying unnest() function: reduce the default ROWS estimate to 100 and attach array_unnest_support. I'm not sure that array_unnest_support can do anything useful today with the call sites that exist in information_schema, but it can't hurt, and the existing default rowcount of 1000 is surely much too high for any of these cases. The psql.sql regression script is using _pg_expandarray() as a test case for \sf+. While we could keep doing so, the new one-line function body makes a poor test case for \sf+ row-numbering, so switch it to print another information_schema function. Discussion: https://postgr.es/m/1424303.1703355485@sss.pgh.pa.us
* Enhance checkpointer restartpoint statisticsAlexander Korotkov2023-12-25
| | | | | | | | | | | | | | | | | | | | | | | | Bhis commit introduces enhancements to the pg_stat_checkpointer view by adding three new columns: restartpoints_timed, restartpoints_req, and restartpoints_done. These additions aim to improve the visibility and monitoring of restartpoint processes on replicas. Previously, it was challenging to differentiate between successful and failed restartpoint requests. This limitation arises because restartpoints on replicas are dependent on checkpoint records from the primary, and cannot occur more frequently than these checkpoints. The new columns allow for clear distinction and tracking of restartpoint requests, their triggers, and successful completions. This enhancement aids database administrators and developers in better understanding and diagnosing issues related to restartpoint behavior, particularly in scenarios where restartpoint requests may fail. System catalog is changed. Catversion is bumped. Discussion: https://postgr.es/m/99b2ccd1-a77a-962a-0837-191cdf56c2b9%40inbox.ru Author: Anton A. Melnikov Reviewed-by: Kyotaro Horiguchi, Alexander Korotkov
* Micro-optimize datum_to_json_internal() some more.Nathan Bossart2023-12-18
| | | | | | | | | Commit dc3f9bc549 mainly targeted the JSONTYPE_NUMERIC code path. This commit applies similar optimizations (e.g., removing unnecessary runtime calls to strlen() and palloc()) to nearby code. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/20231208203708.GA4126315%40nathanxps13
* Micro-optimize JSONTYPE_NUMERIC code path in json.c.Nathan Bossart2023-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit does the following: * In datum_to_json_internal(), the call to IsValidJsonNumber() is replaced with simplified validation code. This avoids an extra call to strlen() in this path, and it avoids validating the entire string (which is okay since we know we're dealing with a numeric data type's output). * In datum_to_json_internal(), the call to escape_json() in the JSONTYPE_NUMERIC path is replaced with code that just surrounds the string with quotes. In passing, some other nearby calls to appendStringInfo() have been replaced with similar code to avoid unnecessary calls to vsnprintf(). * In composite_to_json(), the length of the separator is now determined at compile time to avoid unnecessary calls to strlen(). On my machine, this speeds up a benchmark for the proposed COPY TO (FORMAT json) command with many integers by upwards of 20%. There are likely other code paths that could be given a similar treatment, but that is left as a future exercise. Reviewed-by: Jeff Davis, Tom Lane, David Rowley, John Naylor Discussion: https://postgr.es/m/20231207231251.GB3359478%40nathanxps13
* Rename ShmemVariableCache to TransamVariablesHeikki Linnakangas2023-12-08
| | | | | | | | The old name was misleading: It's not a cache, the values kept in the struct are the authoritative source. Reviewed-by: Tristan Partin, Richard Guo Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi
* Fix issues in binary_upgrade_logical_slot_has_caught_up().Amit Kapila2023-12-07
| | | | | | | | | | | | | | | | | The commit 29d0a77fa6 labelled binary_upgrade_logical_slot_has_caught_up() as a non-strict function to allow providing a better error message to callers in case the passed slot_name is NULL. On further discussion, it seems that it is not helpful to have a different error message for NULL input in this function, so this patch marks the function as strict. This patch also removes the explicit permission check to use replication slots as this function is invoked only by superusers and instead adds an Assert. Reported-by: Masahiko Sawada Author: Hayato Kuroda Reviewed-by: Vignesh C Discussion: https://postgr.es/m/CAD21AoDSyiBKkMXBxN_gUayZZUCOgyHnG8Ge8rcPXNP3Tf6B4g@mail.gmail.com
* Apply quotes more consistently to GUC names in logsMichael Paquier2023-11-30
| | | | | | | | | | | | | | Quotes are applied to GUCs in a very inconsistent way across the code base, with a mix of double quotes or no quotes used. This commit removes double quotes around all the GUC names that are obviously referred to as parameters with non-English words (use of underscore, mixed case, etc). This is the result of a discussion with Álvaro Herrera, Nathan Bossart, Laurenz Albe, Peter Eisentraut, Tom Lane and Daniel Gustafsson. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com
* Don't use bms_membership() in cases where we don't need toDavid Rowley2023-11-28
| | | | | | | | | | | | | | | | | | | | | | 00b41463c adjusted Bitmapset so that an empty set is always represented as NULL. This makes checking for empty sets far cheaper than it used to be. There were various places in the code where we'd call bms_membership() to handle the 3 possible BMS_Membership values. For the BMS_SINGLETON case, we'd also call bms_singleton_member() to find the single set member. This can now be done in a more optimal way by first checking if the set is NULL and then not bothering with bms_membership() and simply call bms_get_singleton_member() instead to find the single member. This function will return false if there are multiple members in the set. Here we also tidy up some logic in examine_variable() for the single member case. There's now no need to call bms_is_member() as we've already established that we're working with a singleton Bitmapset, so we can just check if varRelid matches the singleton member. Reviewed-by: Richard Guo Discussion: https://postgr.es/m/CAApHDvqW+CxNPcY245GaWiuqkkqgTudtG2ncGvvSjGn2wdTZLA@mail.gmail.com
* Guard against overflow in interval_mul() and interval_div().Dean Rasheed2023-11-18
| | | | | | | | | | | | | | | | | | | | Commits 146604ec43 and a898b409f6 added overflow checks to interval_mul(), but not to interval_div(), which contains almost identical code, and so is susceptible to the same kinds of overflows. In addition, those checks did not catch all possible overflow conditions. Add additional checks to the "cascade down" code in interval_mul(), and copy all the overflow checks over to the corresponding code in interval_div(), so that they both generate "interval out of range" errors, rather than returning bogus results. Given that these errors are relatively easy to hit, back-patch to all supported branches. Per bug #18200 from Alexander Lakhin, and subsequent investigation. Discussion: https://postgr.es/m/18200-5ea288c7b2d504b1%40postgresql.org
* Extract column statistics from CTE references, if possible.Tom Lane2023-11-17
| | | | | | | | | | | | | | | | | | | | | | examine_simple_variable() left this as an unimplemented case years ago, with the result that plans for queries involving un-flattened CTEs might be much stupider than necessary. It's not hard to extend the existing logic for RTE_SUBQUERY cases to also be able to drill down into CTEs, so let's do that. There was some discussion of whether this patch breaks the idea of a MATERIALIZED CTE being an optimization fence. We concluded it's okay, because we already allow the outer planner level to see the estimated width and rowcount of the CTE result, and letting it see column statistics too seems fairly equivalent. Basically, what we expect of the optimization fence is that the outer query should not affect the plan chosen for the CTE query. Once that plan is chosen, it's okay for the outer planner level to make use of whatever information we have about it. Jian Guo and Tom Lane, per complaint from Hans Buschmann Discussion: https://postgr.es/m/4504e67078d648cdac3651b2960da6e7@nidsa.net
* Don't specify number of dimensions in cases where we don't know it.Tom Lane2023-11-17
| | | | | | | | | | | | | | | A few places in array_in() and plperl would report a misleading value (always MAXDIM+1) for the number of dimensions in the input, because we'd error out as soon as that was clearly too large rather than scanning the entire input. There doesn't seem to be much value in offering the true number, at least not enough to justify the extra complication involved in trying to get it. So just remove that parenthetical remark. We already have other places that do it like that, anyway. Per suggestions from Alexander Lakhin and Heikki Linnakangas. Discussion: https://postgr.es/m/2794005.1683042087@sss.pgh.pa.us
* Add target "slru" to pg_stat_reset_shared()Michael Paquier2023-11-16
| | | | | | | | | | | | | | | | Currently, pg_stat_reset_shared() cannot reset the counters in the view pg_stat_slru even if it is a type of shared stats. This patch adds support for a new value in pg_stat_reset_shared(), called "slru", able to do that. Note that pg_stat_reset_shared(NULL) also resets SLRU counters. There may be a point in removing pg_stat_reset_slru() that was introduced in 28cac71bd368 (v13~) as the new option overlaps with this function, but we would lose the ability to reset individual SLRU counters. This is left for future reconsideration. Author: Atsushi Torikoshi Discussion: https://postgr.es/m/e3c25d72e81378e7b64f3c52e0306fc9@oss.nttdata.com
* Support +/- infinity in the interval data type.Dean Rasheed2023-11-14
| | | | | | | | | | | | | | | | | | | | | | | | | This adds support for infinity to the interval data type, using the same input/output representation as the other date/time data types that support infinity. This allows various arithmetic operations on infinite dates, timestamps and intervals. The new values are represented by setting all fields of the interval to INT32/64_MIN for -infinity, and INT32/64_MAX for +infinity. This ensures that they compare as less/greater than all other interval values, without the need for any special-case comparison code. Note that, since those 2 values were formerly accepted as legal finite intervals, pg_upgrade and dump/restore from an old database will turn them from finite to infinite intervals. That seems OK, since those exact values should be extremely rare in practice, and they are outside the documented range supported by the interval type, which gives us a certain amount of leeway. Bump catalog version. Joseph Koshakow, Jian He, and Ashutosh Bapat, reviewed by me. Discussion: https://postgr.es/m/CAAvxfHea4%2BsPybKK7agDYOMo9N-Z3J6ZXf3BOM79pFsFNcRjwA%40mail.gmail.com
* Improve readability and error detection of array_in().Tom Lane2023-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite array_in() and its subroutines so that we make only one pass over the input text, rather than two. This requires potentially re-pallocing the working arrays values[] and nulls[] larger than our initial guess, but that cost will hopefully be made up by avoiding duplicate parsing. In any case this coding seems much clearer and more straightforward than what we had before. This also fixes array_in() to reject non-rectangular input (that is, different brace depths in different parts of the input) more reliably than before, and to give a better error message when it does so. This is analogous to the plpython and plperl fixes in 0553528e7 and f47004add. Like those PLs, we now accept input such as '{{},{}}' as a valid representation of an empty array, which we did not before. Additionally, reject explicit array subscripts that are outside the integer range (previously you just got whatever atoi() converted them to), and make some other minor improvements in error reporting. Although this is arguably a bug fix, it's also a behavioral change that might trip somebody up, so no back-patch. Tom Lane, Heikki Linnakangas, and Jian He. Thanks to Alexander Lakhin for the initial report and for review/testing. Discussion: https://postgr.es/m/2794005.1683042087@sss.pgh.pa.us
* Add ability to reset all shared stats types in pg_stat_reset_shared()Michael Paquier2023-11-12
| | | | | | | | | | | | | | | | | | | | | | Currently, pg_stat_reset_shared() can use an argument to specify the target of statistics to reset, doing nothing for NULL as it is strict. This patch adds to pg_stat_reset_shared() the possibility to reset all the stats types already handled in this function rather than do nothing if the argument value given is NULL or if nothing is specified (proisstrict is switched to false). Like previously, SLRUs are not included in what gets reset. The idea to use NULL or no argument to control if all the shared stats already covered by this function should be reset has been proposed by Andres Freund. Bump catalog version. Author: Atsushi Torikoshi Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy, Matthias van de Meent Discussion: https://postgr.es/m/4291a55137ddda77cf7cc5f46e846daf@oss.nttdata.com
* Avoid integer overflow hazard in interval_time().Dean Rasheed2023-11-09
| | | | | | | | | | | | When casting an interval to a time, the original code suffered from 64-bit integer overflow for inputs with a sufficiently large negative "time" field, leading to bogus results. Fix by rewriting the algorithm in a simpler form, that more obviously cannot overflow. While at it, improve the test coverage to include negative interval inputs. Discussion: https://postgr.es/m/CAEZATCXoUKHkcuq4q63hkiPsKZJd0kZWzgKtU%2BNT0aU4wbf_Pw%40mail.gmail.com
* Stop including parsenodes.h in plannodes.hAlvaro Herrera2023-11-07
| | | | | | | | | | | | | | | | | | | | I added it by mistake in commit 7103ebb7aae8. To clean up, struct MergeAction needs to be moved to primnodes.h from parsenodes.h. (This forces us to also move OverridingKind to primnodes.h). Having to add parsenodes.h to bootstrap.h as fallout is a bit surprising, since nothing nominally needs it there. However, per comments in bootscanner.l, it is needed so that YYSTYPE can be declared. I think this only started with commit dac048f71ebb, but I didn't actually verify that. In passing, stop including parsenodes.h in tcopprot.h. Nothing needs it there. Per discussion on a patch by Ashutosh Bapat. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/202311071106.6y7b2ascqjlz@alvherre.pgsql
* Detect integer overflow while computing new array dimensions.Tom Lane2023-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | array_set_element() and related functions allow an array to be enlarged by assigning to subscripts outside the current array bounds. While these places were careful to check that the new bounds are allowable, they neglected to consider the risk of integer overflow in computing the new bounds. In edge cases, we could compute new bounds that are invalid but get past the subsequent checks, allowing bad things to happen. Memory stomps that are potentially exploitable for arbitrary code execution are possible, and so is disclosure of server memory. To fix, perform the hazardous computations using overflow-detecting arithmetic routines, which fortunately exist in all still-supported branches. The test cases added for this generate (after patching) errors that mention the value of MaxArraySize, which is platform-dependent. Rather than introduce multiple expected-files, use psql's VERBOSITY parameter to suppress the printing of the message text. v11 psql lacks that parameter, so omit the tests in that branch. Our thanks to Pedro Gallegos for reporting this problem. Security: CVE-2023-5869
* Remove distprepPeter Eisentraut2023-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A PostgreSQL release tarball contains a number of prebuilt files, in particular files produced by bison, flex, perl, and well as html and man documentation. We have done this consistent with established practice at the time to not require these tools for building from a tarball. Some of these tools were hard to get, or get the right version of, from time to time, and shipping the prebuilt output was a convenience to users. Now this has at least two problems: One, we have to make the build system(s) work in two modes: Building from a git checkout and building from a tarball. This is pretty complicated, but it works so far for autoconf/make. It does not currently work for meson; you can currently only build with meson from a git checkout. Making meson builds work from a tarball seems very difficult or impossible. One particular problem is that since meson requires a separate build directory, we cannot make the build update files like gram.h in the source tree. So if you were to build from a tarball and update gram.y, you will have a gram.h in the source tree and one in the build tree, but the way things work is that the compiler will always use the one in the source tree. So you cannot, for example, make any gram.y changes when building from a tarball. This seems impossible to fix in a non-horrible way. Second, there is increased interest nowadays in precisely tracking the origin of software. We can reasonably track contributions into the git tree, and users can reasonably track the path from a tarball to packages and downloads and installs. But what happens between the git tree and the tarball is obscure and in some cases non-reproducible. The solution for both of these issues is to get rid of the step that adds prebuilt files to the tarball. The tarball now only contains what is in the git tree (*). Getting the additional build dependencies is no longer a problem nowadays, and the complications to keep these dual build modes working are significant. And of course we want to get the meson build system working universally. This commit removes the make distprep target altogether. The make dist target continues to do its job, it just doesn't call distprep anymore. (*) - The tarball also contains the INSTALL file that is built at make dist time, but not by distprep. This is unchanged for now. The make maintainer-clean target, whose job it is to remove the prebuilt files in addition to what make distclean does, is now just an alias to make distprep. (In practice, it is probably obsolete given that git clean is available.) The following programs are now hard build requirements in configure (they were already required by meson.build): - bison - flex - perl Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e07408d9-e5f2-d9fd-5672-f53354e9305e@eisentraut.org
* Add XMLText function (SQL/XML X038)Daniel Gustafsson2023-11-06
| | | | | | | | | | | | | | | | | | | This function implements the standard XMLTest function, which converts text into xml text nodes. It uses the libxml2 function xmlEncodeSpecialChars to escape predefined entities (&"<>), so that those do not cause any conflict when concatenating the text node output with existing xml documents. This also adds a note in features.sgml about not supporting XML(SEQUENCE). The SQL specification defines a RETURNING clause to a set of XML functions, where RETURNING CONTENT or RETURNING SEQUENCE can be defined. Since PostgreSQL doesn't support XML(SEQUENCE) all of these functions operate with an implicit RETURNING CONTENT. Author: Jim Jones <jim.jones@uni-muenster.de> Reviewed-by: Vik Fearing <vik@postgresfriends.org> Discussion: https://postgr.es/m/86617a66-ec95-581f-8d54-08059cca8885@uni-muenster.de
* Additional unicode primitive functions.Jeff Davis2023-11-01
| | | | | | | | | | | | Introduce unicode_version(), icu_unicode_version(), and unicode_assigned(). The latter requires introducing a new lookup table for the Unicode General Category, which is generated along with the other Unicode lookup tables. Discussion: https://postgr.es/m/CA+TgmoYzYR-yhU6k1XFCADeyj=Oyz2PkVsa3iKv+keM8wp-F_A@mail.gmail.com Reviewed-by: Peter Eisentraut
* Fix function name in commentDaniel Gustafsson2023-11-01
| | | | | | | | | | | The name of the function resulting from the macro expansion was incorrectly stated. Backpatch to 16 where it was introduced. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20231101.172308.1740861597185391383.horikyota.ntt@gmail.com Backpatch-through: v16
* doc: 1-byte varlena headers can be used for user PLAIN storageBruce Momjian2023-10-31
| | | | | | | | | | | | This also updates some C comments. Reported-by: suchithjn22@gmail.com Discussion: https://postgr.es/m/167336599095.2667301.15497893107226841625@wrigleys.postgresql.org Author: Laurenz Albe (doc patch) Backpatch-through: 11
* Introduce pg_stat_checkpointerMichael Paquier2023-10-30
| | | | | | | | | | | | | | | | | | | | | | | | Historically, the statistics of the checkpointer have been always part of pg_stat_bgwriter. This commit removes a few columns from pg_stat_bgwriter, and introduces pg_stat_checkpointer with equivalent, renamed columns (plus a new one for the reset timestamp): - checkpoints_timed -> num_timed - checkpoints_req -> num_requested - checkpoint_write_time -> write_time - checkpoint_sync_time -> sync_time - buffers_checkpoint -> buffers_written The fields of PgStat_CheckpointerStats and its SQL functions are renamed to match with the new field names, for consistency. Note that background writer and checkpointer have been split into two different processes in commits 806a2aee3791 and bf405ba8e460. The pgstat structures were already split, making this change straight-forward. Bump catalog version. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Andres Freund, Michael Paquier Discussion: https://postgr.es/m/CALj2ACVxX2ii=66RypXRweZe2EsBRiPMj0aHfRfHUeXJcC7kHg@mail.gmail.com
* Refactor some code related to transaction-level statistics for relationsMichael Paquier2023-10-30
| | | | | | | | | | | | | | | | | | | | This commit refactors find_tabstat_entry() so as transaction counters for inserted, updated and deleted tuples are included in the result returned. If a shared entry is found for a relation, its result is now a copy of the PgStat_TableStatus entry retrieved from shared memory. This idea has been proposed by Andres Freund. While on it, the following SQL functions, used in system views, are refactored with macros, in the same spirit as 83a1a1b56645, reducing the amount of code: - pg_stat_get_xact_tuples_deleted() - pg_stat_get_xact_tuples_inserted() - pg_stat_get_xact_tuples_updated() There is now only one caller of find_tabstat_entry() in the tree. Author: Bertrand Drouvot Discussion: https://postgr.es/m/b9e1f543-ee93-8168-d530-d961708ad9d3@gmail.com
* Guard against overflow in make_interval().Dean Rasheed2023-10-29
| | | | | | | | | | The original code did very little to guard against integer or floating point overflow when computing the interval's fields. Detect any such overflows and error out, rather than silently returning bogus results. Joseph Koshakow, reviewed by Ashutosh Bapat and me. Discussion: https://postgr.es/m/CAAvxfHcm1TPwH_zaGWuFoL8pZBestbRZTU6Z%3D-RvAdSXTPbKfg%40mail.gmail.com
* Remove buffers_backend and buffers_backend_fsync from pg_stat_checkpointerMichael Paquier2023-10-27
| | | | | | | | | | | | | | | | | | | | | | | Two attributes related to checkpointer statistics are removed in this commit: - buffers_backend, that counts the number of buffers written directly by a backend. - buffers_backend_fsync, that counts the number of times a backend had to do fsync() by its own. These are actually not checkpointer properties but backend properties. Also, pg_stat_io provides a more accurate and equivalent report of these numbers, by tracking all the I/O stats related to backends, including writes and fsyncs, so storing them in pg_stat_checkpointer was redundant. Thanks also to Robert Haas and Amit Kapila for their input. Bump catalog version. Author: Bharath Rupireddy Reviewed-by: Bertrand Drouvot, Andres Freund Discussion: https://postgr.es/m/20230210004604.mcszbscsqs3bc5nx@awork3.anarazel.de
* Optimize various aggregate deserialization functions, take 2David Rowley2023-10-27
| | | | | | | | | | | | | | | | | | | | f0efa5aec added initReadOnlyStringInfo to allow a StringInfo to be initialized from an existing buffer and also relaxed the requirement that a StringInfo's buffer must be NUL terminated at data[len]. Now that we have that, there's no need for these aggregate deserial functions to use appendBinaryStringInfo() as that rather wastefully palloc'd a new buffer and memcpy'd in the bytea's buffer. Instead, we can just use the bytea's buffer and point the StringInfo directly to that using the new initializer function. In Amdahl's law, this speeds up the serial portion of parallel aggregates and makes sum(numeric), avg(numeric), var_pop(numeric), var_samp(numeric), variance(numeric), stddev_pop(numeric), stddev_samp(numeric), stddev(numeric), array_agg(anyarray), string_agg(text) and string_agg(bytea) scale better in parallel queries. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvr%3De-YOigriSHHm324a40HPqcUhSp6pWWgjz5WwegR%3DcQ%40mail.gmail.com
* Add trailing commas to enum definitionsPeter Eisentraut2023-10-26
| | | | | | | | | | | | | | | | | | | | Since C99, there can be a trailing comma after the last value in an enum definition. A lot of new code has been introducing this style on the fly. Some new patches are now taking an inconsistent approach to this. Some add the last comma on the fly if they add a new last value, some are trying to preserve the existing style in each place, some are even dropping the last comma if there was one. We could nudge this all in a consistent direction if we just add the trailing commas everywhere once. I omitted a few places where there was a fixed "last" value that will always stay last. I also skipped the header files of libpq and ecpg, in case people want to use those with older compilers. There were also a small number of cases where the enum type wasn't used anywhere (but the enum values were), which ended up confusing pgindent a bit, so I left those alone. Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
* Introduce the concept of read-only StringInfosDavid Rowley2023-10-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were various places in our codebase which conjured up a StringInfo by manually assigning the StringInfo fields and setting the data field to point to some existing buffer. There wasn't much consistency here as to what fields like maxlen got set to and in one location we didn't correctly ensure that the buffer was correctly NUL terminated at len bytes, as per what was documented as required in stringinfo.h Here we introduce 2 new functions to initialize StringInfos. One allows callers to initialize a StringInfo passing along a buffer that is already allocated by palloc. Here the StringInfo code uses this buffer directly rather than doing any memcpying into a new allocation. Having this as a function allows us to verify the buffer is correctly NUL terminated. StringInfos initialized this way can be appended to and reset just like any other normal StringInfo. The other new initialization function also accepts an existing buffer, but the given buffer does not need to be a pointer to a palloc'd chunk. This buffer could be a pointer pointing partway into some palloc'd chunk or may not even be palloc'd at all. StringInfos initialized this way are deemed as "read-only". This means that it's not possible to append to them or reset them. For the latter of the two new initialization functions mentioned above, we relax the requirement that the data buffer must be NUL terminated. Relaxing this requirement is convenient in a few places as it can save us from having to allocate an entire new buffer just to add the NUL terminator or save us from having to temporarily add a NUL only to have to put the original char back again later. Incompatibility note: Here we also forego adding the NUL in a few places where it does not seem to be required. These locations are passing the given StringInfo into a type's receive function. It does not seem like any of our built-in receive functions require this, but perhaps there's some UDT out there in the wild which does require this. It is likely worthy of a mention in the release notes that a UDT's receive function mustn't rely on the input StringInfo being NUL terminated. Author: David Rowley Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CAApHDvorfO3iBZ%3DxpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ%40mail.gmail.com
* Migrate logical slots to the new node during an upgrade.Amit Kapila2023-10-26
| | | | | | | | | | | | | | | | | | | | | | | | While reading information from the old cluster, a list of logical slots is fetched. At the later part of upgrading, pg_upgrade revisits the list and restores slots by executing pg_create_logical_replication_slot() on the new cluster. Migration of logical replication slots is only supported when the old cluster is version 17.0 or later. If the old node has invalid slots or slots with unconsumed WAL records, the pg_upgrade fails. These checks are needed to prevent data loss. The significant advantage of this commit is that it makes it easy to continue logical replication even after upgrading the publisher node. Previously, pg_upgrade allowed copying publications to a new node. With this patch, adjusting the connection string to the new publisher will cause the apply worker on the subscriber to connect to the new publisher automatically. This enables seamless continuation of logical replication, even after an upgrade. Author: Hayato Kuroda, Hou Zhijie Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com
* Doc: modernize comment for boolin().Tom Lane2023-10-19
| | | | | | | | | | Most of the behavior described by this comment was moved to parse_bool_with_len() some time ago. Move what's still valuable there too, and drop the rest. Peter Smith Discussion: https://postgr.es/m/CAHut+PtMJURKp=U8Z=Ktp0zV40sEb1f-iEk9FvY2GQe+5ZBnwg@mail.gmail.com
* Dodge a compiler bug affecting timetz_zone/timetz_izone.Tom Lane2023-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | Use a modulo operator instead of implementing the same behavior with a loop. The loop solution is doubtless microscopically faster for the typical case of only wrapping into the very next day, but maybe not so much for large interval values. In any case, timetz is such a backwater that it's doubtful anybody would notice any performance change anyway. This avoids a compiler bug occurring in AIX's xlc, even in pretty late-model revisions. We did not have test coverage for the case where the initial result->time value is negative, so add that. For the moment, install this only in HEAD. My plan is to back-patch the test case, and then the code change assuming that buildfarm testing proves the bug occurs in the back branches. (That seems pretty likely, but let's find out for sure.) Per buildfarm results from commits 97957fdba and 2f0472030. Thanks to Michael Paquier for the idea to use a modulo operation to replace the faulty loop. Discussion: https://postgr.es/m/CA+hUKGK=DOC+hE-62FKfZy=Ybt5uLkrg3zCZD-jFykM-iPn8yw@mail.gmail.com
* Harden has_xxx_privilege() functions against concurrent object drops.Tom Lane2023-10-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The versions of these functions that accept object OIDs are supposed to return NULL, rather than failing, if the target object has been dropped. This makes it safe(r) to use them in queries that scan catalogs, since the functions will be applied to objects that are visible in the query's snapshot but might now be gone according to the catalog snapshot. In most cases we implemented this by doing a SearchSysCacheExists test and assuming that if that succeeds, we can safely invoke the appropriate aclchk.c function, which will immediately re-fetch the same syscache entry. It was argued that if the existence test succeeds then the followup fetch must succeed as well, for lack of any intervening AcceptInvalidationMessages call. Alexander Lakhin demonstrated that this is not so when CATCACHE_FORCE_RELEASE is enabled: the syscache entry will be forcibly dropped at the end of SearchSysCacheExists, and then it is possible for the catalog snapshot to get advanced while re-fetching the entry. Alexander's test case requires the operation to happen inside a parallel worker, but that seems incidental to the fundamental problem. What remains obscure is whether there is a way for this to happen in a non-debug build. Nonetheless, CATCACHE_FORCE_RELEASE is a very useful test methodology, so we'd better make the code safe for it. After some discussion we concluded that the most future-proof fix is to give up the assumption that checking SearchSysCacheExists can guarantee success of a later fetch. At best that assumption leads to fragile code --- for example, has_type_privilege appears broken for array types even if you believe the assumption holds. And it's not even particularly efficient. There had already been some work towards extending the aclchk.c APIs to include "is_missing" output flags, so this patch extends that work to cover all the aclchk.c functions that are used by the has_xxx_privilege() functions. (This allows getting rid of some ad-hoc decisions about not throwing errors in certain places in aclchk.c.) In passing, this fixes the has_sequence_privilege() functions to provide the same guarantees as their cousins: for some reason the SearchSysCacheExists tests never got added to those. There is more work to do to remove the unsafe coding pattern with SearchSysCacheExists in other places, but this is a pretty self-contained patch so I'll commit it separately. Per bug #18014 from Alexander Lakhin. Given the lack of hard evidence that there's a bug in non-debug builds, I'm content to fix this only in HEAD. (Perhaps we should clean up the has_sequence_privilege() oversight in the back branches, but in the absence of field complaints I'm not too excited about that either.) Discussion: https://postgr.es/m/18014-28c81cb79d44295d@postgresql.org
* Add support for AT LOCALMichael Paquier2023-10-13
| | | | | | | | | | | | | | | When converting a timestamp to/from with/without time zone, the SQL Standard specifies an AT LOCAL variant of AT TIME ZONE which uses the session's time zone. This includes three system functions able to do the work in the same way as the existing flavors for AT TIME ZONE, except that these need to be marked as stable as they depend on the session's TimeZone GUC. Bump catalog version. Author: Vik Fearing Reviewed-by: Laurenz Albe, Cary Huang, Michael Paquier Discussion: https://postgr.es/m/8e25dec4-5667-c1a5-6581-167d710c2182@postgresfriends.org