aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt
Commit message (Collapse)AuthorAge
...
* Work around spurious compiler warning in inet operatorsAndres Freund2023-03-16
| | | | | | | | | | | | | | | | | | | | | | | | | | gcc 12+ has complaints like the following: ../../../../../pgsql/src/backend/utils/adt/network.c: In function 'inetnot': ../../../../../pgsql/src/backend/utils/adt/network.c:1893:34: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 1893 | pdst[nb] = ~pip[nb]; | ~~~~~~~~~^~~~~~~~~~ ../../../../../pgsql/src/include/utils/inet.h:27:23: note: at offset -1 into destination object 'ipaddr' of size 16 27 | unsigned char ipaddr[16]; /* up to 128 bits of address */ | ^~~~~~ ../../../../../pgsql/src/include/utils/inet.h:27:23: note: at offset -1 into destination object 'ipaddr' of size 16 This is due to a compiler bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104986 It has been a year since the bug has been reported without getting fixed. As the warnings are verbose and use of gcc 12 is becoming more common, it seems worth working around the bug. Particularly because a simple reformulation of the loop condition fixes the issue and isn't any less readable. Author: Tom Lane <tgl@sss.pgh.pa.us> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/144536.1648326206@sss.pgh.pa.us Backpatch: 11-
* Tighten error checks in datetime input, and remove bogus "ISO" format.Tom Lane2023-03-16
| | | | | | | | | | | | | | | | | | | | | | | | DecodeDateTime and DecodeTimeOnly had support for date input in the style "Y2023M03D16", which the comments claimed to be an "ISO" format. However, so far as I can find there is no such format in ISO 8601; they write units before numbers in intervals, but not in datetimes. Furthermore, the lesser-known ISO 8601-2 spec actually defines an incompatible format "2023Y03M16D". None of our documentation mentions such a format either. So let's just drop it. That leaves us with only two cases for a prefix unit specifier in datetimes: Julian dates written as Jnnnn, and the "T" separator defined by ISO 8601. Add checks to catch misuse of these specifiers, that is consecutive specifiers or a dangling specifier at the end of the string. We do not however disallow a specifier that is separated from the field that it disambiguates (by noise words or unrelated fields). That being the case, remove some overly-aggressive error checks from the ISOTIME cases. Joseph Koshakow, editorialized a bit by me; thanks also to Peter Eisentraut for some standards-reading. Discussion: https://postgr.es/m/CAAvxfHf2Q1gKLiHGnuPOiyf0ASvKUM4BnMfsXuwgtYEb_Gx0Zw@mail.gmail.com
* Use "data directory" not "current directory" in error messages.Tom Lane2023-03-16
| | | | | | | | | | | The user receiving the message might not understand where the server's "current directory" is. "Data directory" seems clearer. (This would not be good for frontend code, but both of these messages are only issued in the backend.) Kyotaro Horiguchi Discussion: https://postgr.es/m/20230316.111646.1564684434328830712.horikyota.ntt@gmail.com
* Remove PgStat_BackendFunctionEntryMichael Paquier2023-03-16
| | | | | | | | | | This structure included only PgStat_FunctionCounts, and removing it facilitates some upcoming refactoring for pgstatfuncs.c to use more macros rather that mostly-duplicated functions. Author: Bertrand Drouvot Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/11d531fe-52fc-c6ea-7e8e-62f1b6ec626e@gmail.com
* Support [NO] INDENT option in XMLSERIALIZE().Tom Lane2023-03-15
| | | | | | | | | | | | | | This adds the ability to pretty-print XML documents ... according to libxml's somewhat idiosyncratic notions of what's pretty, anyway. One notable divergence from a strict reading of the spec is that libxml is willing to collapse empty nodes "<node></node>" to just "<node/>", whereas SQL and the underlying XML spec say that this option should only result in whitespace tweaks. Nonetheless, it seems close enough to justify using the SQL-standard syntax. Jim Jones, reviewed by Peter Smith and myself Discussion: https://postgr.es/m/2f5df461-dad8-6d7d-4568-08e10608a69b@uni-muenster.de
* Fix corner case bug in numeric to_char() some more.Tom Lane2023-03-14
| | | | | | | | | | | | | | | The band-aid applied in commit f0bedf3e4 turns out to still need some work: it made sure we didn't set Np->last_relevant too small (to the left of the decimal point), but it didn't prevent setting it too large (off the end of the partially-converted string). This could result in fetching data beyond the end of the allocated space, which with very bad luck could cause a SIGSEGV, though I don't see any hazard of interesting memory disclosure. Per bug #17839 from Thiago Nunes. The bug's pretty ancient, so back-patch to all supported versions. Discussion: https://postgr.es/m/17839-aada50db24d7b0da@postgresql.org
* Add support for the error functions erf() and erfc().Dean Rasheed2023-03-14
| | | | | | | | | | | | | | | | | | | | | Expose the standard error functions as SQL-callable functions. These are expected to be useful to people working with normal distributions, and we use them here to test the distribution from random_normal(). Since these functions are defined in the POSIX and C99 standards, they should in theory be available on all supported platforms. If that turns out not to be the case, more work will be needed. On all platforms tested so far, using extra_float_digits = -1 in the regression tests is sufficient to allow for variations between implementations. However, past experience has shown that there are almost certainly going to be additional unexpected portability issues, so these tests may well need further adjustments, based on the buildfarm results. Dean Rasheed, reviewed by Nathan Bossart and Thomas Munro. Discussion: https://postgr.es/m/CAEZATCXv5fi7+Vu-POiyai+ucF95+YMcCMafxV+eZuN1B-=MkQ@mail.gmail.com
* Fix JSON error reporting for many cases of erroneous string values.Tom Lane2023-03-13
| | | | | | | | | | | | | | | | | | | | | | | The majority of error exit cases in json_lex_string() failed to set lex->token_terminator, causing problems for the error context reporting code: it would see token_terminator less than token_start and do something more or less nuts. In v14 and up the end result could be as bad as a crash in report_json_context(). Older versions accidentally avoided that fate; but all versions produce error context lines that are far less useful than intended, because they'd stop at the end of the prior token instead of continuing to where the actually-bad input is. To fix, invent some macros that make it less notationally painful to do the right thing. Also add documentation about what the function is actually required to do; and in >= v14, add an assertion in report_json_context about token_terminator being sufficiently far advanced. Per report from Nikolay Shaplov. Back-patch to all supported versions. Discussion: https://postgr.es/m/7332649.x5DLKWyVIX@thinkpad-pgpro
* Reject combining "epoch" and "infinity" with other datetime fields.Tom Lane2023-03-09
| | | | | | | | | | | Datetime input formerly accepted combinations such as '1995-08-06 infinity', but this seems like a clear error. Reject any combination of regular y/m/d/h/m/s fields with these special tokens. Joseph Koshakow, reviewed by Keisuke Kuroda and myself Discussion: https://postgr.es/m/CAAvxfHdm8wwXwG_FFRaJ1nTHiMWb7YXS2YKCzCt8Q0a2ZoMcHg@mail.gmail.com
* Allow tailoring of ICU locales with custom rulesPeter Eisentraut2023-03-08
| | | | | | | | | | | | This exposes the ICU facility to add custom collation rules to a standard collation. New options are added to CREATE COLLATION, CREATE DATABASE, createdb, and initdb to set the rules. Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Daniel Verite <daniel@manitou-mail.org> Discussion: https://www.postgresql.org/message-id/flat/821c71a4-6ef0-d366-9acf-bb8e367f739f@enterprisedb.com
* Add support for unit "B" to pg_size_bytes()Peter Eisentraut2023-03-07
| | | | | | | | This makes it consistent with the units support in GUC. Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/0106914a-9eb5-22be-40d8-652cc88c827d%40enterprisedb.com
* Fix incorrect comment in pg_get_partkeydef()David Rowley2023-03-07
| | | | | | | | | The comment claimed the output of the function was prefixed by "PARTITION BY". This is incorrect. Author: Japin Li Reviewed-by: Ashutosh Bapat Discussion: https://postgr.es/m/MEYP282MB166923B446FF5FE55B9DACB7B6B69@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
* SQL JSON path enhanced numeric literalsPeter Eisentraut2023-03-05
| | | | | | | | | | | | | Add support for non-decimal integer literals and underscores in numeric literals to SQL JSON path language. This follows the rules of ECMAScript, as referred to by the SQL standard. Internally, all the numeric literal parsing of jsonpath goes through numeric_in, which already supports all this, so this patch is just a bit of lexer work and some tests and documentation. Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b11b25bb-6ec1-d42f-cedd-311eae59e1fb@enterprisedb.com
* Fix incorrect format placeholdersPeter Eisentraut2023-03-03
|
* Avoid fetching one past the end of translate()'s "to" parameter.Tom Lane2023-03-01
| | | | | | | | | | | | | | | | | This is usually harmless, but if you were very unlucky it could provoke a segfault due to the "to" string being right up against the end of memory. Found via valgrind testing (so we might've found it earlier, except that our regression tests lacked any exercise of translate()'s deletion feature). Fix by switching the order of the test-for-end-of-string and advance-pointer steps. While here, compute "to_ptr + tolen" just once. (Smarter compilers might figure that out for themselves, but let's just make sure.) Report and fix by Daniil Anisimov, in bug #17816. Discussion: https://postgr.es/m/17816-70f3d2764e88a108@postgresql.org
* Rework pg_input_error_message(), now renamed pg_input_error_info()Michael Paquier2023-02-28
| | | | | | | | | | | | | | | | | | | pg_input_error_info() is now a SQL function able to return a row with more than just the error message generated for incorrect data type inputs when these are able to handle soft failures, returning more contents of ErrorData, as of: - The error message (same as before). - The error detail, if set. - The error hint, if set. - SQL error code. All the regression tests that relied on pg_input_error_message() are updated to reflect the effects of the rename. Per discussion with Tom Lane and Andrew Dunstan. Author: Nathan Bossart Discussion: https://postgr.es/m/139a68e1-bd1f-a9a7-b5fe-0be9845c6311@dunslane.net
* Suppress compiler warnings in new pgstats code.Tom Lane2023-02-27
| | | | | | | | | | | | | | | | | | | | | Some clang versions whine about comparing an enum variable to a value outside the range of the enum, on the grounds that the result must be constant. In the cases we fix here, the loops will terminate only if the enum variable can in fact hold a value one beyond its declared range. While that's very likely to always be true for these enum types, it still seems like a poor coding practice to assume it; so use "int" loop variables instead to silence the warnings. (This matches what we've done in other places, for example loops over the range of ForkNumber.) While at it, let's drop the XXX_FIRST macros for these enums and just write zeroes for the loop start values. The apparent flexibility seems rather illusory given that iterating up to one-less-than- the-number-of-values is only correct for a zero-based range. Melanie Plageman Discussion: https://postgr.es/m/20520.1677435600@sss.pgh.pa.us
* Silence more compiler warnings introduced by d87d548cd0.Tom Lane2023-02-26
| | | | | | Per buildfarm, there are still a couple of functions where we get warnings from compilers that don't know that elog(ERROR) doesn't return.
* Silence compiler warnings introduced by d87d548cd0.Jeff Davis2023-02-24
| | | | | Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20230224002029.GQ1653@telsasoft.com
* Remove unnecessary #ifdef USE_ICU and branch.Jeff Davis2023-02-23
| | | | | | | | Now that the provider-independent API pg_strnxfrm() is available, we no longer need the special cases for ICU in hashfunc.c and varchar.c. Reviewed-by: Peter Eisentraut, Peter Geoghegan Discussion: https://postgr.es/m/a581136455c940d7bd0ff482d3a2bd51af25a94f.camel%40j-davis.com
* Refactor to introduce pg_locale_deterministic().Jeff Davis2023-02-23
| | | | | | | | Avoids the need of callers to test for NULL, and also avoids the need to access the pg_locale_t structure directly. Reviewed-by: Peter Eisentraut, Peter Geoghegan Discussion: https://postgr.es/m/a581136455c940d7bd0ff482d3a2bd51af25a94f.camel%40j-davis.com
* Refactor to add pg_strcoll(), pg_strxfrm(), and variants.Jeff Davis2023-02-23
| | | | | | | | | | | | | Offers a generally better separation of responsibilities for collation code. Also, a step towards multi-lib ICU, which should be based on a clean separation of the routines required for collation providers. Callers with NUL-terminated strings should call pg_strcoll() or pg_strxfrm(); callers with strings and their length should call the variants pg_strncoll() or pg_strnxfrm(). Reviewed-by: Peter Eisentraut, Peter Geoghegan Discussion: https://postgr.es/m/a581136455c940d7bd0ff482d3a2bd51af25a94f.camel%40j-davis.com
* Implement ANY_VALUE aggregatePeter Eisentraut2023-02-22
| | | | | | | | | | | SQL:2023 defines an ANY_VALUE aggregate whose purpose is to emit an implementation-dependent (i.e. non-deterministic) value from the aggregated rows. Author: Vik Fearing <vik@postgresfriends.org> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/5cff866c-10a8-d2df-32cb-e9072e6b04a2@postgresfriends.org
* Detect overflow in timestamp[tz] subtraction.Tom Lane2023-02-20
| | | | | | | | | | It's possible to overflow the int64 microseconds field of the output interval when subtracting two timestamps. Detect that instead of silently returning a bogus result. Nick Babadzhanian Discussion: https://postgr.es/m/CABw73Uq2oJ3E+kYvvDuY04EkhhkChim2e-PaghBDjOmgUAMWGw@mail.gmail.com
* Fix parsing of ISO-8601 interval fields with exponential notation.Tom Lane2023-02-20
| | | | | | | | | | | | | | | | | | | | | | | | Historically we've accepted interval input like 'P.1e10D'. This is probably an accident of having used strtod() to do the parsing, rather than something anyone intended, but it's been that way for a long time. Commit e39f99046 broke this by trying to parse the integer and fractional parts separately, without accounting for the possibility of an exponent. In principle that coding allowed for precise conversions of field values wider than 15 decimal digits, but that does not seem like a goal worth sweating bullets for. So, rather than trying to manage an exponent on top of the existing complexity, let's just revert to the previous coding that used strtod() by itself. We can still improve on the old code to the extent of allowing the value to range up to 1.0e15 rather than only INT_MAX. (Allowing more than that risks creating problems due to precision loss: the converted fractional part might have absolute value more than 1. Perhaps that could be dealt with in some way, but it really does not seem worth additional effort.) Per bug #17795 from Alexander Lakhin. Back-patch to v15 where the faulty code came in. Discussion: https://postgr.es/m/17795-748d6db3ed95d313@postgresql.org
* Print the correct aliases for DML target tables in ruleutils.Tom Lane2023-02-17
| | | | | | | | | | | | | | | | | | | | | | | | | ruleutils.c blindly printed the user-given alias (or nothing if there hadn't been one) for the target table of INSERT/UPDATE/DELETE queries. That works a large percentage of the time, but not always: for queries appearing in WITH, it's possible that we chose a different alias to avoid conflict with outer-scope names. Since the chosen alias would be used in any Var references to the target table, this'd lead to an inconsistent printout with consequences such as dump/restore failures. The correct logic for printing (or not) a relation alias was embedded in get_from_clause_item. Factor it out to a separate function so that we don't need a jointree node to use it. (Only a limited part of that function can be reached from these new call sites, but this seems like the cleanest non-duplicative factorization.) In passing, I got rid of a redundant "\d+ rules_src" step in rules.sql. Initial report from Jonathan Katz; thanks to Vignesh C for analysis. This has been broken for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/e947fa21-24b2-f922-375a-d4f763ef3e4b@postgresql.org Discussion: https://postgr.es/m/CALDaNm1MMntjmT_NJGp-Z=xbF02qHGAyuSHfYHias3TqQbPF2w@mail.gmail.com
* Change argument type of pq_sendbytes from char * to void *Peter Eisentraut2023-02-14
| | | | | | | | This is a follow-up to 1f605b82ba66ece8b421b10d41094dd2e3c0c48b. It allows getting rid of further casts at call sites. Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://www.postgresql.org/message-id/783a4edb-84f9-6df2-7470-2ef5ccc6607a@enterprisedb.com
* Consolidate ItemPointer to Datum conversion functionsPeter Eisentraut2023-02-13
| | | | | | | | | Instead of defining the same set of macros several times, define it once in an appropriate header file. In passing, convert to inline functions. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/844dd4c5-e5a1-3df1-bfaf-d1e1c2a16e45%40enterprisedb.com
* Avoid dereferencing an undefined pointer in DecodeInterval().Tom Lane2023-02-12
| | | | | | | | | | | | | | | Commit e39f99046 moved some code up closer to the start of DecodeInterval(), without noticing that it had been implicitly relying on previous checks to reject the case of empty input. Given empty input, we'd now dereference a pointer that hadn't been set, possibly leading to a core dump. (But if we fail to provoke a SIGSEGV, nothing bad happens, and the expected syntax error is thrown a bit later.) Per bug #17788 from Alexander Lakhin. Back-patch to v15 where the fault was introduced. Discussion: https://postgr.es/m/17788-dabac9f98f7eafd5@postgresql.org
* Add pg_stat_io view, providing more detailed IO statisticsAndres Freund2023-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Builds on 28e626bde00 and f30d62c2fc6. See the former for motivation. Rows of the view show IO operations for a particular backend type, IO target object, IO context combination (e.g. a client backend's operations on permanent relations in shared buffers) and each column in the view is the total number of IO Operations done (e.g. writes). So a cell in the view would be, for example, the number of blocks of relation data written from shared buffers by client backends since the last stats reset. In anticipation of tracking WAL IO and non-block-oriented IO (such as temporary file IO), the "op_bytes" column specifies the unit of the "reads", "writes", and "extends" columns for a given row. Rows for combinations of IO operation, backend type, target object and context that never occur, are ommitted entirely. For example, checkpointer will never operate on temporary relations. Similarly, if an IO operation never occurs for such a combination, the IO operation's cell will be null, to distinguish from 0 observed IO operations. For example, bgwriter should not perform reads. Note that some of the cells in the view are redundant with fields in pg_stat_bgwriter (e.g. buffers_backend). For now, these have been kept for backwards compatibility. Bumps catversion. Author: Melanie Plageman <melanieplageman@gmail.com> Author: Samay Sharma <smilingsamay@gmail.com> Reviewed-by: Maciek Sakrejda <m.sakrejda@gmail.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
* pgstat: Infrastructure for more detailed IO statisticsAndres Freund2023-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds the infrastructure for more detailed IO statistics. The calls to actually count IOs, a system view to access the new statistics, documentation and tests will be added in subsequent commits, to make review easier. While we already had some IO statistics, e.g. in pg_stat_bgwriter and pg_stat_database, they did not provide sufficient detail to understand what the main sources of IO are, or whether configuration changes could avoid IO. E.g., pg_stat_bgwriter.buffers_backend does contain the number of buffers written out by a backend, but as that includes extending relations (always done by backends) and writes triggered by the use of buffer access strategies, it cannot easily be used to tune background writer or checkpointer. Similarly, pg_stat_database.blks_read cannot easily be used to tune shared_buffers / compute a cache hit ratio, as the use of buffer access strategies will often prevent a large fraction of the read blocks to end up in shared_buffers. The new IO statistics count IO operations (evict, extend, fsync, read, reuse, and write), and are aggregated for each combination of backend type (backend, autovacuum worker, bgwriter, etc), target object of the IO (relations, temp relations) and context of the IO (normal, vacuum, bulkread, bulkwrite). What is tracked in this series of patches, is sufficient to perform the aforementioned analyses. Further details, e.g. tracking the number of buffer hits, would make that even easier, but was left out for now, to keep the scope of the already large patchset manageable. Bumps PGSTAT_FILE_FORMAT_ID. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
* Remove useless casts to (void *) in arguments of some system functionsPeter Eisentraut2023-02-07
| | | | | | | | The affected functions are: bsearch, memcmp, memcpy, memset, memmove, qsort, repalloc Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
* Remove useless casts to (void *) in hash_search() callsPeter Eisentraut2023-02-06
| | | | | | | | | | | Some of these appear to be leftovers from when hash_search() took a char * argument (changed in 5999e78fc45dcb91784b64b6e9ae43f4e4f68ca2). Since after this there is some more horizontal space available, do some light reformatting where suitable. Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/fd9adf5d-b1aa-e82f-e4c7-263c30145807%40enterprisedb.com
* Allow underscores in integer and numeric constants.Dean Rasheed2023-02-04
| | | | | | | | | | | | | | | | | | | This allows underscores to be used in integer and numeric literals, and their corresponding type input functions, for visual grouping. For example: 1_500_000_000 3.14159_26535_89793 0xffff_ffff 0b_1001_0001 A single underscore is allowed between any 2 digits, or immediately after the base prefix indicator of non-decimal integers, per SQL:202x draft. Peter Eisentraut and Dean Rasheed Discussion: https://postgr.es/m/84aae844-dc55-a4be-86d9-4f0fa405cc97%40enterprisedb.com
* Remove unused code related to unknown typePeter Eisentraut2023-02-04
| | | | | | | | These are leftovers obsoleted by cfd9be939e9c516243c5b6a49ad1e1a9a38f1052. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/e7887965-9e70-fd01-c2d1-5bc02f9169aa%40enterprisedb.com
* Make int64_div_fast_to_numeric() more robust.Dean Rasheed2023-02-03
| | | | | | | | | | | | | | | | | | The prior coding of int64_div_fast_to_numeric() had a number of bugs that would cause it to fail under different circumstances, such as with log10val2 <= 0, or log10val2 a multiple of 4, or in the "slow" numeric path with log10val2 >= 10. None of those could be triggered by any of our current code, which only uses log10val2 = 3 or 6. However, they made it a hazard for any future code that might use it. Also, since this is exported by numeric.c, users writing their own C code might choose to use it. Therefore fix, and back-patch to v14, where it was introduced. Dean Rasheed, reviewed by Tom Lane. Discussion: https://postgr.es/m/CAEZATCW8gXgW0tgPxPgHDPhVX71%2BSWFRkhnXy%2BTfGDsKLepu2g%40mail.gmail.com
* Clarify the choice of rscale in numeric_sqrt().Dean Rasheed2023-02-02
| | | | | | | | | | | | | Improve the comment explaining the choice of rscale in numeric_sqrt(), and ensure that the code works consistently when other values of NBASE/DEC_DIGITS are used. Note that, in practice, we always expect DEC_DIGITS == 4, and this does not change the computation in that case. Joel Jacobson and Dean Rasheed Discussion: https://postgr.es/m/06712c29-98e9-43b3-98da-f234d81c6e49%40app.fastmail.com
* Ensure that numeric.c compiles with other NBASE values.Dean Rasheed2023-02-02
| | | | | | | | | | As noted in the comments, support for different NBASE values is really only of historical interest, but as long as we're keeping it, we might as well make sure that it compiles. Joel Jacobson Discussion: https://postgr.es/m/06712c29-98e9-43b3-98da-f234d81c6e49%40app.fastmail.com
* Make Vars be outer-join-aware.Tom Lane2023-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally we used the same Var struct to represent the value of a table column everywhere in parse and plan trees. This choice predates our support for SQL outer joins, and it's really a pretty bad idea with outer joins, because the Var's value can depend on where it is in the tree: it might go to NULL above an outer join. So expression nodes that are equal() per equalfuncs.c might not represent the same value, which is a huge correctness hazard for the planner. To improve this, decorate Var nodes with a bitmapset showing which outer joins (identified by RTE indexes) may have nulled them at the point in the parse tree where the Var appears. This allows us to trust that equal() Vars represent the same value. A certain amount of klugery is still needed to cope with cases where we re-order two outer joins, but it's possible to make it work without sacrificing that core principle. PlaceHolderVars receive similar decoration for the same reason. In the planner, we include these outer join bitmapsets into the relids that an expression is considered to depend on, and in consequence also add outer-join relids to the relids of join RelOptInfos. This allows us to correctly perceive whether an expression can be calculated above or below a particular outer join. This change affects FDWs that want to plan foreign joins. They *must* follow suit when labeling foreign joins in order to match with the core planner, but for many purposes (if postgres_fdw is any guide) they'd prefer to consider only base relations within the join. To support both requirements, redefine ForeignScan.fs_relids as base+OJ relids, and add a new field fs_base_relids that's set up by the core planner. Large though it is, this commit just does the minimum necessary to install the new mechanisms and get check-world passing again. Follow-up patches will perform some cleanup. (The README additions and comments mention some stuff that will appear in the follow-up.) Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
* Teach planner about more monotonic window functionsDavid Rowley2023-01-27
| | | | | | | | | | | | | 9d9c02ccd introduced runConditions for window functions to allow monotonic window function evaluation to be made more efficient when the window function value went beyond some value that it would never go back from due to its monotonic nature. That commit added prosupport functions to inform the planner that row_number(), rank(), dense_rank() and some forms of count(*) were monotonic. Here we add support for ntile(), cume_dist() and percent_rank(). Reviewed-by: Melanie Plageman Discussion: https://postgr.es/m/CAApHDvqR+VqB8s+xR-24bzJbU8xyFrBszJ17qKgECf7cWxLCaA@mail.gmail.com
* Improve TimestampDifferenceMilliseconds to cope with overflow sanely.Tom Lane2023-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | We'd like to use TimestampDifferenceMilliseconds with the stop_time possibly being TIMESTAMP_INFINITY, but up to now it's disclaimed responsibility for overflow cases. Define it to clamp its output to the range [0, INT_MAX], handling overflow correctly. (INT_MAX rather than LONG_MAX seems appropriate, because the function is already described as being intended for calculating wait times for WaitLatch et al, and that infrastructure only handles waits up to INT_MAX. Also, this choice gets rid of cross-platform behavioral differences.) Having done that, we can replace some ad-hoc code in walreceiver.c with a simple call to TimestampDifferenceMilliseconds. While at it, fix some buglets in existing callers of TimestampDifferenceMilliseconds: basebackup_copy.c had not read the memo about TimestampDifferenceMilliseconds never returning a negative value, and postmaster.c had not read the memo about Min() and Max() being macros with multiple-evaluation hazards. Neither of these quite seem worth back-patching. Patch by me; thanks to Nathan Bossart for review. Discussion: https://postgr.es/m/3126727.1674759248@sss.pgh.pa.us
* Add non-decimal integer support to type numeric.Dean Rasheed2023-01-23
| | | | | | | | | | | | | | | | | | This enhances the numeric type input function, adding support for hexadecimal, octal, and binary integers of any size, up to the limits of the numeric type. Since 6fcda9aba8, such non-decimal integers have been accepted by the parser as integer literals and passed through to numeric_in(). This commit gives numeric_in() the ability to handle them. While at it, simplify the handling of NaN and infinities, reducing the number of calls to pg_strncasecmp(), and arrange for pg_strncasecmp() to not be called at all for regular numbers. This gives a significant performance improvement for decimal inputs, more than offsetting the small performance hit of checking for non-decimal input. Discussion: https://postgr.es/m/CAEZATCV8XShnmT9HZy25C%2Bo78CVOFmUN5EM9FRAZ5xvYTggPMg%40mail.gmail.com
* Optimise numeric division for 3 and 4 base-NBASE digit divisors.Dean Rasheed2023-01-23
| | | | | | | | | | | | | | | | | On platforms with 128-bit integer support, introduce a new function div_var_int64(), along the same lines as div_var_int() added in d1b307eef2 for divisors with 1 or 2 base-NBASE digits, and use it to speed up div_var() and div_var_fast() in a similar way when the divisor has 3 or 4 base-NBASE digits. This gives significant performance gains for divisors with 9-16 decimal digits. Joel Jacobson. Discussion: https://postgr.es/m/b7a5893d-af18-4c0b-8918-96de5f1bbf39%40app.fastmail.com https://postgr.es/m/CAEZATCXGm%3DDyTq%3DFrcOqC0gPMVveKUYTaD5KRRoajrUTiWxVMw%40mail.gmail.com
* Allow parallel aggregate on string_agg and array_aggDavid Rowley2023-01-23
| | | | | | | | | | | | | | This adds combine, serial and deserial functions for the array_agg() and string_agg() aggregate functions, thus allowing these aggregates to partake in partial aggregations. This allows both parallel aggregation to take place when these aggregates are present and also allows additional partition-wise aggregation plan shapes to include plans that require additional aggregation once the partially aggregated results from the partitions have been combined. Author: David Rowley Reviewed-by: Andres Freund, Tomas Vondra, Stephen Frost, Tom Lane Discussion: https://postgr.es/m/CAKJS1f9sx_6GTcvd6TMuZnNtCh0VhBzhX6FZqw17TgVFH-ga_A@mail.gmail.com
* Use appendStringInfoSpaces in more placesDavid Rowley2023-01-20
| | | | | | | | | | | | | | | | | | | | | | | This adjusts a few places which were appending a string constant containing spaces onto a StringInfo. We have appendStringInfoSpaces for that job, so let's use that instead. For the change to jsonb.c's add_indent() function, appendStringInfoString was being called inside a loop to append 4 spaces on each loop. This meant that enlargeStringInfo would get called once per loop. Here it should be much more efficient to get rid of the loop and just calculate the number of spaces with "level * 4" and just append all the spaces in one go. Here we additionally adjust the appendStringInfoSpaces function so it makes use of memset rather than a while loop to apply the required spaces to the StringInfo. One of the problems with the while loop was that it was incrementing one variable and decrementing another variable once per loop. That's more work than what's required to get the job done. We may as well use memset for this rather than trying to optimize the existing loop. Some testing has shown memset is faster even for very small sizes. Discussion: https://postgr.es/m/CAApHDvp_rKkvwudBKgBHniNRg67bzXVjyvVKfX0G2zS967K43A@mail.gmail.com
* Fix ts_headline() to handle ORs and phrase queries more honestly.Tom Lane2023-01-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch largely reverts what I did in commits c9b0c678d and 78e73e875. The maximum cover length limit that I added in 78e73e875 (to band-aid over c9b0c678d's performance issues) creates too many user-visible behavior discrepancies, as complained of for example in bug #17691. The real problem with hlCover() is not what I thought at the time, but more that it seems to have been designed with only AND tsquery semantics in mind. It doesn't work quite right for OR, and even less so for NOT or phrase queries. However, we can improve that situation by building a variant of TS_execute() that returns a list of match locations. We already get an ExecPhraseData struct representing match locations for the primitive case of a simple match, as well as one for a phrase match; we just need to add some logic to combine these for AND and OR operators. The result is a list of ExecPhraseDatas, which hlCover can regard as having simple AND semantics, so that its old algorithm works correctly. There's still a lot not to like about ts_headline's behavior, but I think the remaining issues have to do with the heuristics used in mark_hl_words and mark_hl_fragments (which, likewise, were not revisited when phrase search was added). Improving those is a task for another day. Patch by me; thanks to Alvaro Herrera for review. Discussion: https://postgr.es/m/840.1669405935@sss.pgh.pa.us
* Remove some dead code in selfuncs.cAlvaro Herrera2023-01-19
| | | | | | | | | | | RelOptInfo.userid is the same for all relations in a given inheritance tree, so the code in examine_variable() and example_simple_variable() that repeats the ACL checks on the root parent rel instead of a given leaf child relations need not recompute userid too. Author: Amit Langote <amitlangote09@gmail.com> Reported-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20221210201753.GA27893@telsasoft.com
* Display the leader apply worker's PID for parallel apply workers.Amit Kapila2023-01-18
| | | | | | | | | | | | | | | | | Add leader_pid to pg_stat_subscription. leader_pid is the process ID of the leader apply worker if this process is a parallel apply worker. If this field is NULL, it indicates that the process is a leader apply worker or a synchronization worker. The new column makes it easier to distinguish parallel apply workers from other kinds of workers and helps to identify the leader for the parallel workers corresponding to a particular subscription. Additionally, update the leader_pid column in pg_stat_activity as well to display the PID of the leader apply worker for parallel apply workers. Author: Hou Zhijie Reviewed-by: Peter Smith, Sawada Masahiko, Amit Kapila, Shveta Mallik Discussion: https://postgr.es/m/CAA4eK1+wyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw@mail.gmail.com
* Remove dead code in formatting.cJohn Naylor2023-01-17
| | | | | | | | | | | | | | | Remove some code guarded by IS_MINUS() or IS_PLUS(), where the entire stanza is inside an else-block where both of these are false. This should slightly improve test coverage. While at it, remove coding that apparently assumes that unsetting a bit is so expensive that we have to first check if it's already set in the first place. Per Coverity report from Ranier Vilela Analysis and review by Justin Pryzby Discussion: https://www.postgresql.org/message-id/20221223010818.GP1153%40telsasoft.com
* Store IdentLine->pg_user as an AuthTokenMichael Paquier2023-01-16
| | | | | | | | | | | | | | | While system_user was stored as an AuthToken in IdentLine, pg_user was stored as a plain string. This commit changes the code as we start storing pg_user as an AuthToken too. This does not have any functional changes, as all the operations on pg_user only use the string from the AuthToken. There is no regexp compiled and no check based on its quoting, yet. This is in preparation of more features that intend to extend its capabilities, like support for regexps and group membership. Author: Jelte Fennema Discussion: https://postgr.es/m/CAGECzQRNow4MwkBjgPxywXdJU_K3a9+Pm78JB7De3yQwwkTDew@mail.gmail.com