aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt
Commit message (Collapse)AuthorAge
...
* Improve selectivity estimation for assorted match-style operators.Tom Lane2020-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quite a few matching operators such as JSONB's @> used "contsel" and "contjoinsel" as their selectivity estimators. That was a bad idea, because (a) contsel is only a stub, yielding a fixed default estimate, and (b) that default is 0.001, meaning we estimate these operators as five times more selective than equality, which is surely pretty silly. There's a good model for improving this in ltree's ltreeparentsel(): for any "var OP constant" query, we can try applying the operator to all of the column's MCV and histogram values, taking the latter as being a random sample of the non-MCV values. That code is actually 100% generic, except for the question of exactly what default selectivity ought to be plugged in when we don't have stats. Hence, migrate the guts of ltreeparentsel() into the core code, provide wrappers "matchingsel" and "matchingjoinsel" with a more-appropriate default estimate, and use those for the non-geometric operators that formerly used contsel (mostly JSONB containment operators and tsquery matching). Also apply this code to some match-like operators in hstore, ltree, and pg_trgm, including the former users of ltreeparentsel as well as ones that improperly used contsel. Since commit 911e70207 just created new versions of those extensions that we haven't released yet, we can sneak this change into those new versions instead of having to create an additional generation of update scripts. Patch by me, reviewed by Alexey Bashtanov Discussion: https://postgr.es/m/12237.1582833074@sss.pgh.pa.us
* Teach pg_ls_dir_files() to ignore ENOENT failures from stat().Tom Lane2020-03-31
| | | | | | | | | | | | | | | | Buildfarm experience shows that this function can fail with ENOENT if some other process unlinks a file between when we read the directory entry and when we try to stat() it. The problem is old but we had not noticed it until 085b6b667 added regression test coverage. To fix, just ignore ENOENT failures. There is one other case that this might hide: a symlink that points to nowhere. That seems okay though, at least better than erroring. Back-patch to v10 where this function was added, since the regression test cases were too. Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
* Fix assorted typosMagnus Hagander2020-03-31
| | | | Author: Daniel Gustafsson <daniel@yesql.se>
* Implement operator class parametersAlexander Korotkov2020-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PostgreSQL provides set of template index access methods, where opclasses have much freedom in the semantics of indexing. These index AMs are GiST, GIN, SP-GiST and BRIN. There opclasses define representation of keys, operations on them and supported search strategies. So, it's natural that opclasses may be faced some tradeoffs, which require user-side decision. This commit implements opclass parameters allowing users to set some values, which tell opclass how to index the particular dataset. This commit doesn't introduce new storage in system catalog. Instead it uses pg_attribute.attoptions, which is used for table column storage options but unused for index attributes. In order to evade changing signature of each opclass support function, we implement unified way to pass options to opclass support functions. Options are set to fn_expr as the constant bytea expression. It's possible due to the fact that opclass support functions are executed outside of expressions, so fn_expr is unused for them. This commit comes with some examples of opclass options usage. We parametrize signature length in GiST. That applies to multiple opclasses: tsvector_ops, gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and gist_hstore_ops. Also we parametrize maximum number of integer ranges for gist__int_ops. However, the main future usage of this feature is expected to be json, where users would be able to specify which way to index particular json parts. Catversion is bumped. Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru Author: Nikita Glukhov, revised by me Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
* Improve the performance and accuracy of numeric sqrt() and ln().Dean Rasheed2020-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of using Newton's method to compute numeric square roots, use the Karatsuba square root algorithm, which performs better for numbers of all sizes. In practice, this is 3-5 times faster for inputs with just a few digits and up to around 10 times faster for larger inputs. Also, the new algorithm guarantees that the final digit of the result is correctly rounded, since it computes an integer square root with truncation, containing at least 1 extra decimal digit before rounding. The former algorithm would occasionally round the wrong way because it rounded both the intermediate and final results. In addition, arrange for sqrt_var() to explicitly support negative rscale values (rounding before the decimal point). This allows the argument reduction phase of ln_var() to be optimised for large inputs, since it only needs to compute square roots with a few more digits than the final ln() result, rather than computing all the digits before the decimal point. For very large inputs, this can be many thousands of times faster. In passing, optimise div_var_fast() in a couple of places where it was doing unnecessary work. Patch be me, reviewed by Tom Lane and Tels. Discussion: https://postgr.es/m/CAEZATCV1A7+jD3P30Zu31KjaxeSEyOn3v9d6tYegpxcq3cQu-g@mail.gmail.com
* Trigger autovacuum based on number of INSERTsDavid Rowley2020-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally autovacuum has only ever invoked a worker based on the estimated number of dead tuples in a table and for anti-wraparound purposes. For the latter, with certain classes of tables such as insert-only tables, anti-wraparound vacuums could be the first vacuum that the table ever receives. This could often lead to autovacuum workers being busy for extended periods of time due to having to potentially freeze every page in the table. This could be particularly bad for very large tables. New clusters, or recently pg_restored clusters could suffer even more as many large tables may have the same relfrozenxid, which could result in large numbers of tables requiring an anti-wraparound vacuum all at once. Here we aim to reduce the work required by anti-wraparound and aggressive vacuums in general, by triggering autovacuum when the table has received enough INSERTs. This is controlled by adding two new GUCs and reloptions; autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor. These work exactly the same as the existing scale factor and threshold controls, only base themselves off the number of inserts since the last vacuum, rather than the number of dead tuples. New controls were added rather than reusing the existing controls, to allow these new vacuums to be tuned independently and perhaps even completely disabled altogether, which can be done by setting autovacuum_vacuum_insert_threshold to -1. We make no attempt to skip index cleanup operations on these vacuums as they may trigger for an insert-mostly table which continually doesn't have enough dead tuples to trigger an autovacuum for the purpose of removing those dead tuples. If we were to skip cleaning the indexes in this case, then it is possible for the index(es) to become bloated over time. There are additional benefits to triggering autovacuums based on inserts, as tables which never contain enough dead tuples to trigger an autovacuum are now more likely to receive a vacuum, which can mark more of the table as "allvisible" and encourage the query planner to make use of Index Only Scans. Currently, we still obey vacuum_freeze_min_age when triggering these new autovacuums based on INSERTs. For large insert-only tables, it may be beneficial to lower the table's autovacuum_freeze_min_age so that tuples are eligible to be frozen sooner. Here we've opted not to zero that for these types of vacuums, since the table may just be insert-mostly and we may otherwise freeze tuples that are still destined to be updated or removed in the near future. There was some debate to what exactly the new scale factor and threshold should default to. For now, these are set to 0.2 and 1000, respectively. There may be some motivation to adjust these before the release. Author: Laurenz Albe, Darafei Praliaskouski Reviewed-by: Alvaro Herrera, Masahiko Sawada, Chris Travers, Andres Freund, Justin Pryzby Discussion: https://postgr.es/m/CAC8Q8t%2Bj36G_bLF%3D%2B0iMo6jGNWnLnWb1tujXuJr-%2Bx8ZCCTqoQ%40mail.gmail.com
* Go back to returning int from ereport auxiliary functions.Tom Lane2020-03-25
| | | | | | | | | | | | | | | | | | | This reverts the parts of commit 17a28b03645e27d73bf69a95d7569b61e58f06eb that changed ereport's auxiliary functions from returning dummy integer values to returning void. It turns out that a minority of compilers complain (not entirely unreasonably) about constructs such as (condition) ? errdetail(...) : 0 if errdetail() returns void rather than int. We could update those call sites to say "(void) 0" perhaps, but the expectation for this patch set was that ereport callers would not have to change anything. And this aspect of the patch set was already the most invasive and least compelling part of it, so let's just drop it. Per buildfarm. Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
* Add collation versions for Windows.Thomas Munro2020-03-25
| | | | | | | | On Vista and later, use GetNLSVersionEx() to request collation version information. Reviewed-by: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGJvqup3s%2BJowVTcacZADO6dOhfdBmvOPHLS3KXUJu41Jw%40mail.gmail.com
* Allow NULL version for individual collations.Thomas Munro2020-03-25
| | | | | | | | | | | Remove the documented restriction that collation providers must either return NULL for all collations or non-NULL for all collations. Use NULL for glibc collations like "C.UTF-8", which might otherwise lead future proposed commits to force unnecessary index rebuilds. Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Discussion: https://postgr.es/m/CA%2BhUKGJvqup3s%2BJowVTcacZADO6dOhfdBmvOPHLS3KXUJu41Jw%40mail.gmail.com
* Improve the internal implementation of ereport().Tom Lane2020-03-24
| | | | | | | | | | | | | | | | | | | | | | Change all the auxiliary error-reporting routines to return void, now that we no longer need to pretend they are passing something useful to errfinish(). While this probably doesn't save anything significant at the machine-code level, it allows detection of some additional types of mistakes. Pass the error location details (__FILE__, __LINE__, PG_FUNCNAME_MACRO) to errfinish not errstart. This shaves a few cycles off the case where errstart decides we're not going to emit anything. Re-implement elog() as a trivial wrapper around ereport(), removing the separate support infrastructure it used to have. Aside from getting rid of some now-surplus code, this means that elog() now really does have exactly the same semantics as ereport(), in particular that it can skip evaluation work if the message is not to be emitted. Andres Freund and Tom Lane Discussion: https://postgr.es/m/CA+fd4k6N8EjNvZpM8nme+y+05mz-SM8Z_BgkixzkA34R+ej0Kw@mail.gmail.com
* Add object names to partition integrity violations.Amit Kapila2020-03-23
| | | | | | | | | | | | All errors of SQLSTATE class 23 should include the name of an object associated with the error in separate fields of the error report message. We do this so that applications need not try to extract them from the possibly-localized human-readable text of the message. Reported-by: Chris Bandy Author: Chris Bandy Reviewed-by: Amit Kapila and Amit Langote Discussion: https://postgr.es/m/0aa113a3-3c7f-db48-bcd8-f9290b2269ae@gmail.com
* Introduce "anycompatible" family of polymorphic types.Tom Lane2020-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the pseudo-types anycompatible, anycompatiblearray, anycompatiblenonarray, and anycompatiblerange. They work much like anyelement, anyarray, anynonarray, and anyrange respectively, except that the actual input values need not match precisely in type. Instead, if we can find a common supertype (using the same rules as for UNION/CASE type resolution), then the parser automatically promotes the input values to that type. For example, "myfunc(anycompatible, anycompatible)" can match a call with one integer and one bigint argument, with the integer automatically promoted to bigint. With anyelement in the definition, the user would have had to cast the integer explicitly. The new types also provide a second, independent set of type variables for function matching; thus with "myfunc(anyelement, anyelement, anycompatible) returns anycompatible" the first two arguments are constrained to be the same type, but the third can be some other type, and the result has the type of the third argument. The need for more than one set of type variables was foreseen back when we first invented the polymorphic types, but we never did anything about it. Pavel Stehule, revised a bit by me Discussion: https://postgr.es/m/CAFj8pRDna7VqNi8gR+Tt2Ktmz0cq5G93guc3Sbn_NVPLdXAkqA@mail.gmail.com
* Implement type regcollationPeter Eisentraut2020-03-18
| | | | | | | | | This will be helpful for a following commit and it's also just generally useful, like the other reg* types. Author: Julien Rouhaud Reviewed-by: Thomas Munro and Michael Paquier Discussion: https://postgr.es/m/CAEepm%3D0uEQCpfq_%2BLYFBdArCe4Ot98t1aR4eYiYTe%3DyavQygiQ%40mail.gmail.com
* Remove useless pfree()s at the ends of various ValuePerCall SRFs.Tom Lane2020-03-16
| | | | | | | | | | | | | | | | | | | | We don't need to manually clean up allocations in a SRF's multi_call_memory_ctx, because the SRF_RETURN_DONE infrastructure takes care of that (and also ensures that it will happen even if the function never gets a final call, which simple manual cleanup cannot do). Hence, the code removed by this patch is a waste of code and cycles. Worse, it gives the impression that cleaning up manually is a thing, which can lead to more serious errors such as those fixed in commits 085b6b667 and b4570d33a. So we should get rid of it. These are not quite actual bugs though, so I couldn't muster the enthusiasm to back-patch. Fix in HEAD only. Justin Pryzby Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
* Avoid holding a directory FD open across assorted SRF calls.Tom Lane2020-03-16
| | | | | | | | | | | | | | | | | This extends the fixes made in commit 085b6b667 to other SRFs with the same bug, namely pg_logdir_ls(), pgrowlocks(), pg_timezone_names(), pg_ls_dir(), and pg_tablespace_databases(). Also adjust various comments and documentation to warn against expecting to clean up resources during a ValuePerCall SRF's final call. Back-patch to all supported branches, since these functions were all born broken. Justin Pryzby, with cosmetic tweaks by me Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
* Rearrange pseudotypes.c to get rid of duplicative code.Tom Lane2020-03-14
| | | | | | | | | | | | | | | | | | | | Commit a5954de10 replaced a lot of manually-coded stub I/O routines with code generated by macros. That was a good idea but it didn't go far enough, because there were still manually-coded stub input routines for types that had live output routines. Refactor the macro so that we can generate just a stub input routine at need. Also create similar macros to generate stub binary I/O routines, since we have some of those now. The only stub functions that remain hand-coded are shell_in() and shell_out(), which need to be separate because they use different error messages. While here, rearrange the commentary to discuss each type not each function. This provides a better way to explain the *why* of which types need which support, rather than just duplicatively annotating the functions. Discussion: https://postgr.es/m/24137.1584139352@sss.pgh.pa.us
* Unify several ways to tracking backend typePeter Eisentraut2020-03-13
| | | | | | | | | | | | | | Add a new global variable MyBackendType that uses the same BackendType enum that was previously only used by the stats collector. That way several duplicate ways of checking what type a particular process is can be simplified. Since it's no longer just for stats, move to miscinit.c and rename existing functions to match the expanded purpose. Reviewed-by: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Kuntal Ghosh <kuntalghosh.2007@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/c65e5196-4f04-4ead-9353-6088c19615a3@2ndquadrant.com
* Avoid holding a directory FD open across pg_ls_dir_files() calls.Tom Lane2020-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This coding technique is undesirable because (a) it leaks the FD for the rest of the transaction if the SRF is not run to completion, and (b) allocated FDs are a scarce resource, but multiple interleaved uses of the relevant functions could eat many such FDs. In v11 and later, a query such as "SELECT pg_ls_waldir() LIMIT 1" yields a warning about the leaked FD, and the only reason there's no warning in earlier branches is that fd.c didn't whine about such leaks before commit 9cb7db3f0. Even disregarding the warning, it wouldn't be too hard to run a backend out of FDs with careless use of these SQL functions. Hence, rewrite the function so that it reads the directory within a single call, returning the results as a tuplestore rather than via value-per-call mode. There are half a dozen other built-in SRFs with similar problems, but let's fix this one to start with, just to see if the buildfarm finds anything wrong with the code. In passing, fix bogus error report for stat() failure: it was whining about the directory when it should be fingering the individual file. Doubtless a copy-and-paste error. Back-patch to v10 where this function was added. Justin Pryzby, with cosmetic tweaks and test cases by me Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
* Remove utils/acl.h from catalog/objectaddress.hPeter Eisentraut2020-03-10
| | | | | | | | | | | | | | | | | | The need for this was removed by 8b9e9644dc6a9bd4b7a97950e6212f63880cf18b. A number of files now need to include utils/acl.h or parser/parse_node.h explicitly where they previously got it indirectly somehow. Since parser/parse_node.h already includes nodes/parsenodes.h, the latter is then removed where the former was added. Also, remove nodes/pg_list.h from objectaddress.h, since that's included via nodes/parsenodes.h. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/7601e258-26b2-8481-36d0-dc9dca6f28f1%402ndquadrant.com
* Add tg_updatedcols to TriggerDataPeter Eisentraut2020-03-09
| | | | | | | | | | This allows a trigger function to determine for an UPDATE trigger which columns were actually updated. This allows some optimizations in generic trigger functions such as lo_manage and tsvector_update_trigger. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://www.postgresql.org/message-id/flat/11c5f156-67a9-0fb5-8200-2a8018eb2e0c@2ndquadrant.com
* Allow Unicode escapes in any server encoding, not only UTF-8.Tom Lane2020-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | SQL includes provisions for numeric Unicode escapes in string literals and identifiers. Previously we only accepted those if they represented ASCII characters or the server encoding was UTF-8, making the conversion to internal form trivial. This patch adjusts things so that we'll call the appropriate encoding conversion function in less-trivial cases, allowing the escape sequence to be accepted so long as it corresponds to some character available in the server encoding. This also applies to processing of Unicode escapes in JSONB. However, the old restriction still applies to client-side JSON processing, since that hasn't got access to the server's encoding conversion infrastructure. This patch includes some lexer infrastructure that simplifies throwing errors with error cursors pointing into the middle of a string (or other complex token). For the moment I only used it for errors relating to Unicode escapes, but we might later expand the usage to some other cases. Patch by me, reviewed by John Naylor. Discussion: https://postgr.es/m/2393.1578958316@sss.pgh.pa.us
* Remove the "opaque" pseudo-type and associated compatibility hacks.Tom Lane2020-03-05
| | | | | | | | | | | | | | | | | | | A long time ago, it was necessary to declare datatype I/O functions, triggers, and language handler support functions in a very type-unsafe way involving a single pseudo-type "opaque". We got rid of those conventions in 7.3, but there was still support in various places to automatically convert such functions to the modern declaration style, to be able to transparently re-load dumps from pre-7.3 servers. It seems unnecessary to continue to support that anymore, so take out the hacks; whereupon the "opaque" pseudo-type itself is no longer needed and can be dropped. This is part of a group of patches removing various server-side kluges for transparently upgrading pre-8.0 dump files. Since we've had few complaints about dropping pg_dump's support for dumping from pre-8.0 servers (commit 64f3524e2), it seems okay to now remove these kluges. Discussion: https://postgr.es/m/4110.1583255415@sss.pgh.pa.us
* Remove RangeIOData->typiofuncAlvaro Herrera2020-03-05
| | | | | | | | | | | We used to carry the I/O function OID in RangeIOData, but it's not used for anything. Since the struct is not exposed to the world anyway, we can simplify it a bit. Also, rename the FmgrInfo member to match the accompanying 'typioparam' and put them in a more sensible order. Reviewed by Tom Lane and Paul Jungwirth. Discussion: https://postgr.es/m/20200304215711.GA8732@alvherre.pgsql
* Introduce macros for typalign and typstorage constants.Tom Lane2020-03-04
| | | | | | | | | | | | | | | | | | | | | Our usual practice for "poor man's enum" catalog columns is to define macros for the possible values and use those, not literal constants, in C code. But for some reason lost in the mists of time, this was never done for typalign/attalign or typstorage/attstorage. It's never too late to make it better though, so let's do that. The reason I got interested in this right now is the need to duplicate some uses of the TYPSTORAGE constants in an upcoming ALTER TYPE patch. But in general, this sort of change aids greppability and readability, so it's a good idea even without any specific motivation. I may have missed a few places that could be converted, and it's even more likely that pending patches will re-introduce some hard-coded references. But that's not fatal --- there's no expectation that we'd actually change any of these values. We can clean up stragglers over time. Discussion: https://postgr.es/m/16457.1583189537@sss.pgh.pa.us
* Allow to_date/to_timestamp to recognize non-English month/day names.Tom Lane2020-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | to_char() has long allowed the TM (translation mode) prefix to specify output of translated month or day names; but that prefix had no effect in input format strings. Now it does. to_date() and to_timestamp() will now recognize the same month or day names that to_char() would output for the same format code. Matching is case-insensitive (per the active collation's notion of what that means), just as it has always been for English month/day names without the TM prefix. (As per the discussion thread, there are lots of cases that this feature will not handle, such as alternate day names. But being able to accept what to_char() will output seems useful enough.) In passing, fix some shaky English and violations of message style guidelines in jsonpath errors for the .datetime() method, which depends on this code. Juan José Santamaría Flecha, reviewed and modified by me, with other commentary from Alvaro Herrera, Tomas Vondra, Arthur Zakirov, Peter Eisentraut, Mark Dilger. Discussion: https://postgr.es/m/CAC+AXB3u1jTngJcoC1nAHBf=M3v-jrEfo86UFtCqCjzbWS9QhA@mail.gmail.com
* Report progress of streaming base backup.Fujii Masao2020-03-03
| | | | | | | | | | | | | This commit adds pg_stat_progress_basebackup view that reports the progress while an application like pg_basebackup is taking a base backup. This uses the progress reporting infrastructure added by c16dc1aca5e0, adding support for streaming base backup. Bump catversion. Author: Fujii Masao Reviewed-by: Kyotaro Horiguchi, Amit Langote, Sergei Kornilov Discussion: https://postgr.es/m/9ed8b801-8215-1f3d-62d7-65bff53f6e94@oss.nttdata.com
* Fix corner-case loss of precision in numeric ln().Dean Rasheed2020-03-01
| | | | | | | | | | | | | | | | | | | | | | When deciding on the local rscale to use for the Taylor series expansion, ln_var() neglected to account for the fact that the result is subsequently multiplied by a factor of 2^(nsqrt+1), where nsqrt is the number of square root operations performed in the range reduction step, which can be as high as 22 for very large inputs. This could result in a loss of precision, particularly when combined with large rscale values, for which a large number of Taylor series terms is required (up to around 400). Fix by computing a few extra digits in the Taylor series, based on the weight of the multiplicative factor log10(2^(nsqrt+1)). It remains to be proven whether or not the other 8 extra digits used for the Taylor series is appropriate, but this at least deals with the obvious oversight of failing to account for the effects of the final multiplication. Per report from Justin AnyhowStep. Reviewed by Tom Lane. Discussion: https://postgr.es/m/16280-279f299d9c06e56f@postgresql.org
* Move src/backend/utils/hash/hashfn.c to src/commonRobert Haas2020-02-27
| | | | | | | | | | | | | | This also involves renaming src/include/utils/hashutils.h, which becomes src/include/common/hashfn.h. Perhaps an argument can be made for keeping the hashutils.h name, but it seemed more consistent to make it match the name of the file, and also more descriptive of what is actually going on here. Patch by me, reviewed by Suraj Kharage and Mark Dilger. Off-list advice on how not to break the Windows build from Davinder Singh and Amit Kapila. Discussion: http://postgr.es/m/CA+TgmoaRiG4TXND8QuM6JXFRkM_1wL2ZNhzaUKsuec9-4yrkgw@mail.gmail.com
* Add equalimage B-Tree support functions.Peter Geoghegan2020-02-26
| | | | | | | | | | | | | | | | | | | | | | Invent the concept of a B-Tree equalimage ("equality implies image equality") support function, registered as support function 4. This indicates whether it is safe (or not safe) to apply optimizations that assume that any two datums considered equal by an operator class's order method must be interchangeable without any loss of semantic information. This is static information about an operator class and a collation. Register an equalimage routine for almost all of the existing B-Tree opclasses. We only need two trivial routines for all of the opclasses that are included with the core distribution. There is one routine for opclasses that index non-collatable types (which returns 'true' unconditionally), plus another routine for collatable types (which returns 'true' when the collation is a deterministic collation). This patch is infrastructure for an upcoming patch that adds B-Tree deduplication. Author: Peter Geoghegan, Anastasia Lubennikova Discussion: https://postgr.es/m/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ@mail.gmail.com
* Assume that we have <wchar.h>.Tom Lane2020-02-21
| | | | | | | | | | | | | Windows has this, and so do all other live platforms according to the buildfarm; it's been required by POSIX since SUSv2. So remove the configure probe and tests of HAVE_WCHAR_H. This is part of a series of commits to get rid of no-longer-relevant configure checks and dead src/port/ code. I'm committing them separately to make it easier to back out individual changes if they prove less portable than I expect. Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us
* Assume that we have cbrt().Tom Lane2020-02-21
| | | | | | | | | | | | Windows has this, and so do all other live platforms according to the buildfarm, so remove the configure probe and float.c's substitute code. This is part of a series of commits to get rid of no-longer-relevant configure checks and dead src/port/ code. I'm committing them separately to make it easier to back out individual changes if they prove less portable than I expect. Discussion: https://postgr.es/m/15379.1582221614@sss.pgh.pa.us
* Remove duplicated words in commentsMichael Paquier2020-02-18
| | | | | | Author: Daniel Gustafsson Reviewed-by: Vik Fearing Discussion: https://postgr.es/m/EBC3BFEB-664C-4063-81ED-29F1227DB012@yesql.se
* Avoid a performance regression in float overflow/underflow detection.Tom Lane2020-02-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6bf0bc842 replaced float.c's CHECKFLOATVAL() macro with static inline subroutines, but that wasn't too well thought out. In the original coding, the unlikely condition (isinf(result) or result == 0) was checked first, and the inf_is_valid or zero_is_valid condition only afterwards. The inline-subroutine coding caused that to be swapped around, which is pretty horrid for performance because (a) in common cases the is_valid condition is twice as expensive to evaluate (e.g., requiring two isinf() calls not one) and (b) in common cases the is_valid condition is false, requiring us to perform the unlikely-condition check anyway. Net result is that one isinf() call becomes two or three, resulting in visible performance loss as reported by Keisuke Kuroda. The original fix proposal was to revert the replacement of the macro, but on second thought, that macro was just a bad idea from the beginning: if anything it's a net negative for readability of the code. So instead, let's just open-code all the overflow/underflow tests, being careful to test the unlikely condition first (and mark it unlikely() to help the compiler get the point). Also, rather than having N copies of the actual ereport() calls, collapse those into out-of-line error subroutines to save some code space. This does mean that the error file/line numbers won't be very helpful for figuring out where the issue really is --- but we'd already burned that bridge by putting the ereports into static inlines. In HEAD, check_float[48]_val() are gone altogether. In v12, leave them present in float.h but unused in the core code, just in case some extension is depending on them. Emre Hasegeli, with some kibitzing from me and Andres Freund Discussion: https://postgr.es/m/CANDwggLe1Gc1OrRqvPfGE=kM9K0FSfia0hbeFCEmwabhLz95AA@mail.gmail.com
* Fix bug in Tid scan.Fujii Masao2020-02-07
| | | | | | | | | | | | | | | | | | | | | | | Commit 147e3722f7 changed Tid scan so that it calls table_beginscan() and uses the scan option for seq scan. This change caused two issues. (1) The change caused Tid scan to take a predicate lock on the entire relation in serializable transaction even when relation-level lock is not necessary. This could lead to an unexpected serialization error. (2) The change caused Tid scan to increment the number of seq_scan in pg_stat_*_tables views even though it's not seq scan. This could confuse the users. This commit adds the scan option for Tid scan and makes Tid scan use it, to avoid those issues. Back-patch to v12, where the bug was introduced. Author: Tatsuhito Kasahara Reviewed-by: Kyotaro Horiguchi, Masahiko Sawada, Fujii Masao Discussion: https://postgr.es/m/CAP0=ZVKy+gTbFmB6X_UW0pP3WaeJ-fkUWHoD-pExS=at3CY76g@mail.gmail.com
* Refactor hash_agg_entry_size().Jeff Davis2020-02-06
| | | | | | Consolidate the calculations for hash table size estimation. This will help with upcoming Hash Aggregation work that will add additional call sites.
* Add leader_pid to pg_stat_activityMichael Paquier2020-02-06
| | | | | | | | | | | | | | | This new field tracks the PID of the group leader used with parallel query. For parallel workers and the leader, the value is set to the PID of the group leader. So, for the group leader, the value is the same as its own PID. Note that this reflects what PGPROC stores in shared memory, so as leader_pid is NULL if a backend has never been involved in parallel query. If the backend is using parallel query or has used it at least once, the value is set until the backend exits. Author: Julien Rouhaud Reviewed-by: Sergei Kornilov, Guillaume Lelarge, Michael Paquier, Tomas Vondra Discussion: https://postgr.es/m/CAOBaU_Yy5bt0vTPZ2_LUM6cUcGeqmYNoJ8-Rgto+c2+w3defYA@mail.gmail.com
* Add declaration-level assertions for compile-time checksMichael Paquier2020-02-03
| | | | | | | | | | | Those new assertions can be used at file scope, outside of any function for compilation checks. This commit provides implementations for C and C++, and fallback implementations. Author: Peter Smith Reviewed-by: Andres Freund, Kyotaro Horiguchi, Dagfinn Ilmari Mannsåker, Michael Paquier Discussion: https://postgr.es/m/201DD0641B056142AC8C6645EC1B5F62014B8E8030@SYD1217
* Optimizations for integer to decimal output.Andrew Gierth2020-02-01
| | | | | | | | | | Using a lookup table of digit pairs reduces the number of divisions needed, and calculating the length upfront saves some work; these ideas are taken from the code previously committed for floats. David Fetter, reviewed by Kyotaro Horiguchi, Tels, and me. Discussion: https://postgr.es/m/20190924052620.GP31596%40fetter.org
* Fix not-quite-right string comparison in parse_jsonb_index_flags().Tom Lane2020-01-31
| | | | | | | | | | | | | This code would accept "strinX", where X is any 1-byte character, as meaning "string". Clearly it wasn't meant to do that. No back-patch, since this doesn't affect correct queries and there's some tiny chance we'd break somebody's incorrect query in a minor release. Report and patch by Dominik Czarnota. Discussion: https://postgr.es/m/CABEVAa1dU0mDCAfaT8WF2adVXTDsLVJy_izotg6ze_hh-cn8qQ@mail.gmail.com
* Clean up newlines following left parenthesesAlvaro Herrera2020-01-30
| | | | | | | | | | | | We used to strategically place newlines after some function call left parentheses to make pgindent move the argument list a few chars to the left, so that the whole line would fit under 80 chars. However, pgindent no longer does that, so the newlines just made the code vertically longer for no reason. Remove those newlines, and reflow some of those lines for some extra naturality. Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
* Remove excess parens in ereport() callsAlvaro Herrera2020-01-30
| | | | | | | Cosmetic cleanup, not worth backpatching. Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql Reviewed-by: Tom Lane, Michael Paquier
* Move jsonapi.c and jsonapi.h to src/common.Robert Haas2020-01-29
| | | | | | | | | | | To make this work, (1) makeJsonLexContextCstringLen now takes the encoding to be used as an argument; (2) check_stack_depth() is made to do nothing in frontend code, and (3) elog(ERROR, ...) is changed to pg_log_fatal + exit in frontend code. Mark Dilger, reviewed and slightly revised by me. Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
* Apply project best practices to switches over enum values.Tom Lane2020-01-27
| | | | | | | | | In the wake of 1f3a02173, assorted buildfarm members were warning about "control reaches end of non-void function" or the like. Do what we've done elsewhere: in place of a "default" switch case that will prevent the compiler from warning about unhandled enum values, put a catchall elog() after the switch. And return a dummy value to satisfy compilers that don't know elog() doesn't return.
* Move some code from jsonapi.c to jsonfuncs.c.Robert Haas2020-01-27
| | | | | | | | | | | | | | Specifically, move those functions that depend on ereport() from jsonapi.c to jsonfuncs.c, in preparation for allowing jsonapi.c to be used from frontend code. A few cases where elog(ERROR, ...) is used for can't-happen conditions are left alone; we can handle those in some other way in frontend code. Reviewed by Mark Dilger and Andrew Dunstan. Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
* Adjust pg_parse_json() so that it does not directly ereport().Robert Haas2020-01-27
| | | | | | | | | | | | | | | | | | | | | Instead, it now returns a value indicating either success or the type of error which occurred. The old behavior is still available by calling pg_parse_json_or_ereport(). If the new interface is used, an error can be thrown by passing the return value of pg_parse_json() to json_ereport_error(). pg_parse_json() can still elog() in can't-happen cases, but it seems like that issue is best handled separately. Adjust json_lex() and json_count_array_elements() to return an error code, too. This is all in preparation for making the backend's json parser available to frontend code. Reviewed and/or tested by Mark Dilger and Andrew Dunstan. Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
* Add functions gcd() and lcm() for integer and numeric types.Dean Rasheed2020-01-25
| | | | | | | | | These compute the greatest common divisor and least common multiple of a pair of numbers using the Euclidean algorithm. Vik Fearing, reviewed by Fabien Coelho. Discussion: https://postgr.es/m/adbd3e0b-e3f1-5bbc-21db-03caf1cef0f7@2ndquadrant.com
* Remove jsonapi.c's lex_accept().Robert Haas2020-01-24
| | | | | | | | | | | | | | | | At first glance, this function seems useful, but it actually increases the amount of code required rather than decreasing it. Inline the logic into the callers instead; most callers don't use the 'lexeme' argument for anything and as a result considerable simplification is possible. Along the way, fix the header comment for the nearby function lex_expect(), which mislabeled it as lex_accept(). Patch by me, reviewed by David Steele, Mark Dilger, and Andrew Dunstan. Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
* Split JSON lexer/parser from 'json' data type support.Robert Haas2020-01-24
| | | | | | | | | | | | | | | | | | | | Keep the code that pertains to the 'json' data type in json.c, but move the lexing and parsing code to a new file jsonapi.c, a name I chose because the corresponding prototypes are in jsonapi.h. This seems like a logical division, because the JSON lexer and parser are also used by the 'jsonb' data type, but the SQL-callable functions in json.c are a separate thing. Also, the new jsonapi.c file needs to include far fewer header files than json.c, which seems like a good sign that this is an appropriate place to insert an abstraction boundary. I took the opportunity to remove a few apparently-unneeded includes from json.c at the same time. Patch by me, reviewed by David Steele, Mark Dilger, and Andrew Dunstan. The previous commit was, too, but I forgot to note it in the commit message. Discussion: http://postgr.es/m/CA+TgmoYfOXhd27MUDGioVh6QtpD0C1K-f6ObSA10AWiHBAL5bA@mail.gmail.com
* Adjust src/include/utils/jsonapi.h so it's not backend-only.Robert Haas2020-01-24
| | | | | | | | | | | | | | | The major change here is that we no longer include jsonb.h into jsonapi.h. The reason that was necessary is that jsonapi.h included several prototypes functions in jsonfuncs.c that depend on the Jsonb type. Move those prototypes to a new header, jsonfuncs.h, and include it where needed. The other change is that JsonEncodeDateTime is now declared in json.h rather than jsonapi.h. Taken together, these steps eliminate all dependencies of jsonapi.h on backend-only data types and header files, so that it can potentially be included in frontend code.
* Fix an oversight in commit 4c70098ff.Tom Lane2020-01-23
| | | | | | | | | | | | | | | | | | | | I had supposed that the from_char_seq_search() call sites were all passing the constant arrays you'd expect them to pass ... but on looking closer, the one for DY format was passing the days[] array not days_short[]. This accidentally worked because the day abbreviations in English are all the same as the first three letters of the full day names. However, once we took out the "maximum comparison length" logic, it stopped working. As penance for that oversight, add regression test cases covering this, as well as every other switch case in DCH_from_char() that was not reached according to the code coverage report. Also, fold the DCH_RM and DCH_rm cases into one --- now that seq_search is case independent, there's no need to pass different comparison arrays for those cases. Back-patch, as the previous commit was.