aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/adt
Commit message (Collapse)AuthorAge
* Fix and simplify some usages of TimestampDifference().Tom Lane2020-11-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce TimestampDifferenceMilliseconds() to simplify callers that would rather have the difference in milliseconds, instead of the select()-oriented seconds-and-microseconds format. This gets rid of at least one integer division per call, and it eliminates some apparently-easy-to-mess-up arithmetic. Two of these call sites were in fact wrong: * pg_prewarm's autoprewarm_main() forgot to multiply the seconds by 1000, thus ending up with a delay 1000X shorter than intended. That doesn't quite make it a busy-wait, but close. * postgres_fdw's pgfdw_get_cleanup_result() thought it needed to compute microseconds not milliseconds, thus ending up with a delay 1000X longer than intended. Somebody along the way had noticed this problem but misdiagnosed the cause, and imposed an ad-hoc 60-second limit rather than fixing the units. This was relatively harmless in context, because we don't care that much about exactly how long this delay is; still, it's wrong. There are a few more callers of TimestampDifference() that don't have a direct need for seconds-and-microseconds, but can't use TimestampDifferenceMilliseconds() either because they do need microsecond precision or because they might possibly deal with intervals long enough to overflow 32-bit milliseconds. It might be worth inventing another API to improve that, but that seems outside the scope of this patch; so those callers are untouched here. Given the fact that we are fixing some bugs, and the likelihood that future patches might want to back-patch code that uses this new API, back-patch to all supported branches. Alexey Kondratov and Tom Lane Discussion: https://postgr.es/m/3b1c053a21c07c1ed5e00be3b2b855ef@postgrespro.ru
* Fix handling of BC years in to_date/to_timestamp.Tom Lane2020-09-30
| | | | | | | | | | | | | | | | | | | | | | | Previously, a conversion such as to_date('-44-02-01','YYYY-MM-DD') would result in '0045-02-01 BC', as the code attempted to interpret the negative year as BC, but failed to apply the correction needed for our internal handling of BC years. Fix the off-by-one problem. Also, arrange for the combination of a negative year and an explicit "BC" marker to cancel out and produce AD. This is how the negative-century case works, so it seems sane to do likewise. Continue to read "year 0000" as 1 BC. Oracle would throw an error, but we've accepted that case for a long time so I'm hesitant to change it in a back-patch. Per bug #16419 from Saeed Hubaishan. Back-patch to all supported branches. Dar Alathar-Yemen and Tom Lane Discussion: https://postgr.es/m/16419-d8d9db0a7553f01b@postgresql.org
* Move new LOCKTAG_DATABASE_FROZEN_IDS to end of enum LockTagType.Noah Misch2020-08-15
| | | | | | | | | | | Several PGXN modules reference LockTagType values; renumbering would force a recompile of those modules. Oversight in back-patch of today's commit 566372b3d6435639e4cc4476d79b8505a0297c87. Back-patch to released branches, v12 through 9.5. Reported by Tom Lane. Discussion: https://postgr.es/m/921383.1597523945@sss.pgh.pa.us
* Prevent concurrent SimpleLruTruncate() for any given SLRU.Noah Misch2020-08-15
| | | | | | | | | | | | | | | | | The SimpleLruTruncate() header comment states the new coding rule. To achieve this, add locktype "frozenid" and two LWLocks. This closes a rare opportunity for data loss, which manifested as "apparent wraparound" or "could not access status of transaction" errors. Data loss is more likely in pg_multixact, due to released branches' thin margin between multiStopLimit and multiWrapLimit. If a user's physical replication primary logged ": apparent wraparound" messages, the user should rebuild standbys of that primary regardless of symptoms. At less risk is a cluster having emitted "not accepting commands" errors or "must be vacuumed" warnings at some point. One can test a cluster for this data loss by running VACUUM FREEZE in every database. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
* Fix recently-introduced performance problem in ts_headline().Tom Lane2020-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new hlCover() algorithm that I introduced in commit c9b0c678d turns out to potentially take O(N^2) or worse time on long documents, if there are many occurrences of individual query words but few or no substrings that actually satisfy the query. (One way to hit this behavior is with a "common_word & rare_word" type of query.) This seems unavoidable given the original goal of checking every substring of the document, so we have to back off that idea. Fortunately, it seems unlikely that anyone would really want headlines spanning all of a long document, so we can avoid the worse-than-linear behavior by imposing a maximum length of substring that we'll consider. For now, just hard-wire that maximum length as a multiple of max_words times max_fragments. Perhaps at some point somebody will argue for exposing it as a ts_headline parameter, but I'm hesitant to make such a feature addition in a back-patched bug fix. I also noted that the hlFirstIndex() function I'd added in that commit was unnecessarily stupid: it really only needs to check whether a HeadlineWordEntry's item pointer is null or not. This wouldn't make all that much difference in typical cases with queries having just a few terms, but a cycle shaved is a cycle earned. In addition, add a CHECK_FOR_INTERRUPTS call in TS_execute_recurse. This ensures that hlCover's loop is cancellable if it manages to take a long time, and it may protect some other TS_execute callers as well. Back-patch to 9.6 as the previous commit was. I also chose to add the CHECK_FOR_INTERRUPTS call to 9.5. The old hlCover() algorithm seems to avoid the O(N^2) behavior, at least on the test case I tried, but nonetheless it's not very quick on a long document. Per report from Stephen Frost. Discussion: https://postgr.es/m/20200724160535.GW12375@tamriel.snowman.net
* Fix whitespacePeter Eisentraut2020-07-17
|
* Fix pg_current_logfile() to not emit a carriage return on Windows.Tom Lane2020-07-09
| | | | | | | | | | | | | | | | | | | | Due to not having our signals straight about CRLF vs. LF line termination, the output of pg_current_logfile() included a trailing \r on Windows. To fix, force the file descriptor it uses into text mode. While here, move a couple of local variable declarations to make the function's logic clearer. In v12 and v13, also back-patch the test added by 1c4e88e2f so that this function has some test coverage. However, the 004_logrotate.pl test script doesn't exist before v12, and it didn't seem worth adding to older branches just for this. Per report from Thomas Kellerer. Back-patch to v10 where this function was added. Discussion: https://postgr.es/m/412ae8da-76bb-640f-039a-f3513499e53d@gmx.net
* Fix "ignoring return value" complaints from commit 96d1f423f9Joe Conway2020-07-04
| | | | | | | | | | | | | | | The cfbot and some BF animals are complaining about the previous read_binary_file commit because of ignoring return value of ‘fread’. So let's make everyone happy by testing the return value even though not strictly needed. Reported by Justin Pryzby, and suggested patch by Tom Lane. Backpatched to v11 same as the previous commit. Reported-By: Justin Pryzby Reviewed-By: Tom Lane Discussion: https://postgr.es/m/flat/969b8d82-5bb2-5fa8-4eb1-f0e685c5d736%40joeconway.com Backpatch-through: 11
* Read until EOF vice stat-reported size in read_binary_fileJoe Conway2020-07-04
| | | | | | | | | | | | | | | | | | read_binary_file(), used by SQL functions pg_read_file() and friends, uses stat to determine file length to read, when not passed an explicit length as an argument. This is problematic, for example, if the file being read is a virtual file with a stat-reported length of zero. Arrange to read until EOF, or StringInfo data string lenth limit, is reached instead. Original complaint and patch by me, with significant review, corrections, advice, and code optimizations by Tom Lane. Backpatched to v11. Prior to that only paths relative to the data and log dirs were allowed for files, so no "zero length" files were reachable anyway. Reviewed-By: Tom Lane Discussion: https://postgr.es/m/flat/969b8d82-5bb2-5fa8-4eb1-f0e685c5d736%40joeconway.com Backpatch-through: 11
* Fix mishandling of NaN counts in numeric_[avg_]combine.Tom Lane2020-06-11
| | | | | | | | | | | | | | | | | | | | | | When merging two NumericAggStates, the code missed adding the new state's NaNcount unless its N was also nonzero; since those counts are independent, this is wrong. This would only have visible effect if some partial aggregate scans found only NaNs while earlier ones found only non-NaNs; then we could end up falsely deciding that there were no NaNs and fail to return a NaN final result as expected. That's pretty improbable, so it's no surprise this hasn't been reported from the field. Still, it's a bug. I didn't try to produce a regression test that would show the bug, but I did notice that these functions weren't being reached at all in our regression tests, so I improved the tests to at least exercise them. With these additions, I see pretty complete code coverage on the aggregation-related functions in numeric.c. Back-patch to 9.6 where this code was introduced. (I only added the improved test case as far back as v10, though, since the relevant part of aggregates.sql isn't there at all in 9.6.)
* Reject "23:59:60.nnn" in datetime input.Tom Lane2020-06-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It's intentional that we don't allow values greater than 24 hours, while we do allow "24:00:00" as well as "23:59:60" as inputs. However, the range check was miscoded in such a way that it would accept "23:59:60.nnn" with a nonzero fraction. For time or timetz, the stored result would then be greater than "24:00:00" which would fail dump/reload, not to mention possibly confusing other operations. Fix by explicitly calculating the result and making sure it does not exceed 24 hours. (This calculation is redundant with what will happen later in tm2time or tm2timetz. Maybe someday somebody will find that annoying enough to justify refactoring to avoid the duplication; but that seems too invasive for a back-patched bug fix, and the cost is probably unmeasurable anyway.) Note that this change also rejects such input as the time portion of a timestamp(tz) value. Back-patch to v10. The bug is far older, but to change this pre-v10 we'd need to ensure that the logic behaves sanely with float timestamps, which is possibly nontrivial due to roundoff considerations. Doesn't really seem worth troubling with. Per report from Christoph Berg. Discussion: https://postgr.es/m/20200520125807.GB296739@msg.df7cb.de
* Fix use-after-release mistake in currtid() and currtid2() for viewsMichael Paquier2020-06-01
| | | | | | | | | | This issue has been present since the introduction of this code as of a3519a2 from 2002, and has been found by buildfarm member prion that uses RELCACHE_FORCE_RELEASE via the tests introduced recently in e786be5. Discussion: https://postgr.es/m/20200601022055.GB4121@paquier.xyz Backpatch-through: 9.5
* Add CHECK_FOR_INTERRUPTS() to the repeat() functionJoe Conway2020-05-28
| | | | | | | | | | | The repeat() function loops for potentially a long time without ever checking for interrupts. This prevents, for example, a query cancel from interrupting until the work is all done. Fix by inserting a CHECK_FOR_INTERRUPTS() into the loop. Backpatch to all supported versions. Discussion: https://www.postgresql.org/message-id/flat/8692553c-7fe8-17d9-cbc1-7cddb758f4c6%40joeconway.com
* Fix the MSVC build for versions 2015 and later.Amit Kapila2020-05-14
| | | | | | | | | | | | | | | | | | | Visual Studio 2015 and later versions should still be able to do the same as Visual Studio 2012, but the declaration of locale_name is missing in _locale_t, causing the code compilation to fail, hence this falls back instead on to enumerating all system locales by using EnumSystemLocalesEx to find the required locale name.  If the input argument is in Unix-style then we can get ISO Locale name directly by using GetLocaleInfoEx() with LCType as LOCALE_SNAME. In passing, change the documentation references of the now obsolete links. Note that this problem occurs only with NLS enabled builds. Author: Juan José Santamaría Flecha, Davinder Singh and Amit Kapila Reviewed-by: Ranier Vilela and Amit Kapila Backpatch-through: 9.5 Discussion: https://postgr.es/m/CAHzhFSFoJEWezR96um4-rg5W6m2Rj9Ud2CNZvV4NWc9tXV7aXQ@mail.gmail.com
* Fix YA text phrase search bug.Tom Lane2020-05-07
| | | | | | | | | | | | | | | | | | | checkcondition_str() failed to report multiple matches for a prefix pattern correctly: it would dutifully merge the match positions, but then after exiting that loop, if the last prefix-matching word had had no suitable positions, it would report there were no matches. The upshot would be failing to recognize a match that the query should match. It looks like you need all of these conditions to see the bug: * a phrase search (else we don't ask for match position details) * a prefix search item (else we don't get to this code) * a weight restriction (else checkclass_str won't fail) Noted while investigating a problem report from Pavel Borisov, though this is distinct from the issue he was on about. Back-patch to 9.6 where phrase search was added.
* Get rid of trailing semicolons in C macro definitions.Tom Lane2020-05-01
| | | | | | | | | | | | | | | | | | | | | | | Writing a trailing semicolon in a macro is almost never the right thing, because you almost always want to write a semicolon after each macro call instead. (Even if there was some reason to prefer not to, pgindent would probably make a hash of code formatted that way; so within PG the rule should basically be "don't do it".) Thus, if we have a semi inside the macro, the compiler sees "something;;". Much of the time the extra empty statement is harmless, but it could lead to mysterious syntax errors at call sites. In perhaps an overabundance of neatnik-ism, let's run around and get rid of the excess semicolons whereever possible. The only thing worse than a mysterious syntax error is a mysterious syntax error that only happens in the back branches; therefore, backpatch these changes where relevant, which is most of them because most of these mistakes are old. (The lack of reported problems shows that this is largely a hypothetical issue, but still, it could bite us in some future patch.) John Naylor and Tom Lane Discussion: https://postgr.es/m/CACPNZCs0qWTqJ2QUSGJ07B7uvAvzMb-KbG2q+oo+J3tsWN5cqw@mail.gmail.com
* Fix full text search to handle NOT above a phrase search correctly.Tom Lane2020-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Queries such as '!(foo<->bar)' failed to find matching rows when implemented as a GiST or GIN index search. That's because of failing to handle phrase searches as tri-valued when considering a query without any position information for the target tsvector. We can only say that the phrase operator might match, not that it does match; and therefore its NOT also might match. The previous coding incorrectly inverted the approximate phrase result to decide that there was certainly no match. To fix, we need to make TS_phrase_execute return a real ternary result, and then bubble that up accurately in TS_execute. As long as we have to do that anyway, we can simplify the baroque things TS_phrase_execute was doing internally to manage tri-valued searching with only a bool as explicit result. For now, I left the externally-visible result of TS_execute as a plain bool. There do not appear to be any outside callers that need to distinguish a three-way result, given that they passed in a flag saying what to do in the absence of position data. This might need to change someday, but we wouldn't want to back-patch such a change. Although tsginidx.c has its own TS_execute_ternary implementation for use at upper index levels, that sadly managed to get this case wrong as well :-(. Fixing it is a lot easier fortunately. Per bug #16388 from Charles Offenbacher. Back-patch to 9.6 where phrase search was introduced. Discussion: https://postgr.es/m/16388-98cffba38d0b7e6e@postgresql.org
* Allow pg_read_all_stats to access all stats views againMagnus Hagander2020-04-20
| | | | | | | | | | | | The views pg_stat_progress_* had not gotten the memo that pg_read_all_stats is supposed to be able to read all statistics. Also make a pass over all text-returning pg_stat_xyz functions that could return "insufficient privilege" and make sure they also respect pg_read_all_status. Reported-by: Andrey M. Borodin Reviewed-by: Andrey M. Borodin, Kyotaro Horiguchi Discussion: https://postgr.es/m/13145F2F-8458-4977-9D2D-7B2E862E5722@yandex-team.ru
* Fix circle_in to accept "(x,y),r" as it's advertised to do.Tom Lane2020-04-07
| | | | | | | | | | | Our documentation describes four allowed input syntaxes for circles, but the regression tests tried only three ... with predictable consequences. Remarkably, this has been wrong since the circle datatype was added in 1997, but nobody noticed till now. David Zhang, with some help from me Discussion: https://postgr.es/m/332c47fa-d951-7574-b5cc-a8f7f7201202@highgo.ca
* Adjust bytea get_bit/set_bit to cope with bytea strings > 256MB.Tom Lane2020-04-07
| | | | | | | | | | | | | | | | | | | Since the existing bit number argument can't exceed INT32_MAX, it's not possible for these functions to manipulate bits beyond the first 256MB of a bytea value. However, it'd be good if they could do at least that much, and not fall over entirely for longer bytea values. Adjust the comparisons to be done in int64 arithmetic so that works. Also tweak the error reports to show sane values in case of overflow. Also add some test cases to improve the miserable code coverage of these functions. Apply patch to back branches only; HEAD has a better solution as of commit 26a944cf2. Extracted from a much larger patch by Movead Li Discussion: https://postgr.es/m/20200312115135445367128@highgo.ca
* Teach pg_ls_dir_files() to ignore ENOENT failures from stat().Tom Lane2020-03-31
| | | | | | | | | | | | | | | | Buildfarm experience shows that this function can fail with ENOENT if some other process unlinks a file between when we read the directory entry and when we try to stat() it. The problem is old but we had not noticed it until 085b6b667 added regression test coverage. To fix, just ignore ENOENT failures. There is one other case that this might hide: a symlink that points to nowhere. That seems okay though, at least better than erroring. Back-patch to v10 where this function was added, since the regression test cases were too. Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
* Avoid holding a directory FD open across assorted SRF calls.Tom Lane2020-03-16
| | | | | | | | | | | | | | | | | This extends the fixes made in commit 085b6b667 to other SRFs with the same bug, namely pg_logdir_ls(), pgrowlocks(), pg_timezone_names(), pg_ls_dir(), and pg_tablespace_databases(). Also adjust various comments and documentation to warn against expecting to clean up resources during a ValuePerCall SRF's final call. Back-patch to all supported branches, since these functions were all born broken. Justin Pryzby, with cosmetic tweaks by me Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
* Avoid holding a directory FD open across pg_ls_dir_files() calls.Tom Lane2020-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This coding technique is undesirable because (a) it leaks the FD for the rest of the transaction if the SRF is not run to completion, and (b) allocated FDs are a scarce resource, but multiple interleaved uses of the relevant functions could eat many such FDs. In v11 and later, a query such as "SELECT pg_ls_waldir() LIMIT 1" yields a warning about the leaked FD, and the only reason there's no warning in earlier branches is that fd.c didn't whine about such leaks before commit 9cb7db3f0. Even disregarding the warning, it wouldn't be too hard to run a backend out of FDs with careless use of these SQL functions. Hence, rewrite the function so that it reads the directory within a single call, returning the results as a tuplestore rather than via value-per-call mode. There are half a dozen other built-in SRFs with similar problems, but let's fix this one to start with, just to see if the buildfarm finds anything wrong with the code. In passing, fix bogus error report for stat() failure: it was whining about the directory when it should be fingering the individual file. Doubtless a copy-and-paste error. Back-patch to v10 where this function was added. Justin Pryzby, with cosmetic tweaks and test cases by me Discussion: https://postgr.es/m/20200308173103.GC1357@telsasoft.com
* Fix an oversight in commit 4c70098ff.Tom Lane2020-01-23
| | | | | | | | | | | | | | | | | | | | I had supposed that the from_char_seq_search() call sites were all passing the constant arrays you'd expect them to pass ... but on looking closer, the one for DY format was passing the days[] array not days_short[]. This accidentally worked because the day abbreviations in English are all the same as the first three letters of the full day names. However, once we took out the "maximum comparison length" logic, it stopped working. As penance for that oversight, add regression test cases covering this, as well as every other switch case in DCH_from_char() that was not reached according to the code coverage report. Also, fold the DCH_RM and DCH_rm cases into one --- now that seq_search is case independent, there's no need to pass different comparison arrays for those cases. Back-patch, as the previous commit was.
* Clean up formatting.c's logic for matching constant strings.Tom Lane2020-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | seq_search(), which is used to match input substrings to constants such as month and day names, had a lot of bizarre and unnecessary behaviors. It was mostly possible to avert our eyes from that before, but we don't want to duplicate those behaviors in the upcoming patch to allow recognition of non-English month and day names. So it's time to clean this up. In particular: * seq_search scribbled on the input string, which is a pretty dangerous thing to do, especially in the badly underdocumented way it was done here. Fortunately the input string is a temporary copy, but that was being made three subroutine levels away, making it something easy to break accidentally. The behavior is externally visible nonetheless, in the form of odd case-folding in error reports about unrecognized month/day names. The scribbling is evidently being done to save a few calls to pg_tolower, but that's such a cheap function (at least for ASCII data) that it's pretty pointless to worry about. In HEAD I switched it to be pg_ascii_tolower to ensure it is cheap in all cases; but there are corner cases in Turkish where this'd change behavior, so leave it as pg_tolower in the back branches. * seq_search insisted on knowing the case form (all-upper, all-lower, or initcap) of the constant strings, so that it didn't have to case-fold them to perform case-insensitive comparisons. This likewise seems like excessive micro-optimization, given that pg_tolower is certainly very cheap for ASCII data. It seems unsafe to assume that we know the case form that will come out of pg_locale.c for localized month/day names, so it's better just to define the comparison rule as "downcase all strings before comparing". (The choice between downcasing and upcasing is arbitrary so far as English is concerned, but it might not be in other locales, so follow citext's lead here.) * seq_search also had a parameter that'd cause it to report a match after a maximum number of characters, even if the constant string were longer than that. This was not actually used because no caller passed a value small enough to cut off a comparison. Replicating that behavior for localized month/day names seems expensive as well as useless, so let's get rid of that too. * from_char_seq_search used the maximum-length parameter to truncate the input string in error reports about not finding a matching name. This leads to rather confusing reports in many cases. Worse, it is outright dangerous if the input string isn't all-ASCII, because we risk truncating the string in the middle of a multibyte character. That'd lead either to delivering an illegible error message to the client, or to encoding-conversion failures that obscure the actual data problem. Get rid of that in favor of truncating at whitespace if any (a suggestion due to Alvaro Herrera). In addition to fixing these things, I const-ified the input string pointers of DCH_from_char and its subroutines, to make sure there aren't any other scribbling-on-input problems. The risk of generating a badly-encoded error message seems like enough of a bug to justify back-patching, so patch all supported branches. Discussion: https://postgr.es/m/29432.1579731087@sss.pgh.pa.us
* Fix edge-case crashes and misestimation in range containment selectivity.Tom Lane2020-01-12
| | | | | | | | | | | | | | | | | | | | | | | | When estimating the selectivity of "range_var <@ range_constant" or "range_var @> range_constant", if the upper (or respectively lower) bound of the range_constant was above the last bin of the range_var's histogram, the code would access uninitialized memory and potentially crash (though it seems the probability of a crash is quite low). Handle the endpoint cases explicitly to fix that. While at it, be more paranoid about the possibility of getting NaN or other silly results from the range type's subdiff function. And improve some comments. Ordinarily we'd probably add a regression test case demonstrating the bug in unpatched code. But it's too hard to get it to crash reliably because of the uninitialized-memory dependence, so skip that. Per bug #16122 from Adam Scott. It's been broken from the beginning, apparently, so backpatch to all supported branches. Diagnosis by Michael Paquier, patch by Andrey Borodin and Tom Lane. Discussion: https://postgr.es/m/16122-eb35bc248c806c15@postgresql.org
* Reimplement nullification of walsender timestampAlvaro Herrera2020-01-08
| | | | | | | | | | | | | | | Make the value null only at pg_stat_activity-output time, as suggested by Tom Lane, instead of messing with the internal state. This should appease buildfarm members with force_parallel_mode=regress, which are running parallel queries on logical replication walsenders. The fact that walsenders can run parallel queries should perhaps be studied more carefully, but for the moment let's get rid of the red blots in buildfarm. Backpatch to pg10, like the previous commit. Discussion: https://postgr.es/m/30804.1578438763@sss.pgh.pa.us
* Fix EXTRACT(ISOYEAR FROM timestamp) for years BC.Tom Lane2019-12-12
| | | | | | | | | The test cases added by commit 26ae3aa80 exposed an old oversight in timestamp[tz]_part: they didn't correct the result of date2isoyear() for BC years, so that we produced an off-by-one answer for such years. Fix that, and back-patch to all supported branches. Discussion: https://postgr.es/m/SG2PR06MB37762CAE45DB0F6CA7001EA9B6550@SG2PR06MB3776.apcprd06.prod.outlook.com
* Remove redundant function calls in timestamp[tz]_part().Tom Lane2019-12-12
| | | | | | | | | | | | | | | | | | | | The DTK_DOW/DTK_ISODOW and DTK_DOY switch cases in timestamp_part() and timestamptz_part() contained calls of timestamp2tm() that were fully redundant with the ones done just above the switch. This evidently crept in during commit 258ee1b63, which relocated that code from another place where the calls were indeed needed. Just delete the redundant calls. I (tgl) noted that our test coverage of these functions left quite a bit to be desired, so extend timestamp.sql and timestamptz.sql to cover all the branches. Back-patch to all supported branches, as the previous commit was. There's no real issue here other than some wasted cycles in some not-too-heavily-used code paths, but the test coverage seems valuable. Report and patch by Li Japin; test case adjustments by me. Discussion: https://postgr.es/m/SG2PR06MB37762CAE45DB0F6CA7001EA9B6550@SG2PR06MB3776.apcprd06.prod.outlook.com
* Allow access to child table statistics if user can read parent table.Tom Lane2019-11-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fix for CVE-2017-7484 disallowed use of pg_statistic data for planning purposes if the user would not be able to select the associated column and a non-leakproof function is to be applied to the statistics values. That turns out to disable use of pg_statistic data in some common cases involving inheritance/partitioning, where the user does have permission to select from the parent table that was actually named in the query, but not from a child table whose stats are needed. Since, in non-corner cases, the user *can* select the child table's data via the parent, this restriction is not actually useful from a security standpoint. Improve the logic so that we also check the permissions of the originally-named table, and allow access if select permission exists for that. When checking access to stats for a simple child column, we can map the child column number back to the parent, and perform this test exactly (including not allowing access if the child column isn't exposed by the parent). For expression indexes, the current logic just insists on whole-table select access, and this patch allows access if the user can select the whole parent table. In principle, if the child table has extra columns, this might allow access to stats on columns the user can't read. In practice, it's unlikely that the planner is going to do any stats calculations involving expressions that are not visible to the query, so we'll ignore that fine point for now. Perhaps someday we'll improve that logic to detect exactly which columns are used by an expression index ... but today is not that day. Back-patch to v11. The issue was created in 9.2 and up by the CVE-2017-7484 fix, but this patch depends on the append_rel_array[] planner data structure which only exists in v11 and up. In practice the issue is most urgent with partitioned tables, so fixing v11 and later should satisfy much of the practical need. Dilip Kumar and Amit Langote, with some kibitzing by me Discussion: https://postgr.es/m/3876.1531261875@sss.pgh.pa.us
* Defend against self-referential views in relation_is_updatable().Tom Lane2019-11-21
| | | | | | | | | | | | | | | | | | | | | While a self-referential view doesn't actually work, it's possible to create one, and it turns out that this breaks some of the information_schema views. Those views call relation_is_updatable(), which neglected to consider the hazards of being recursive. In older PG versions you get a "stack depth limit exceeded" error, but since v10 it'd recurse to the point of stack overrun and crash, because commit a4c35ea1c took out the expression_returns_set() call that was incidentally checking the stack depth. Since this function is only used by information_schema views, it seems like it'd be better to return "not updatable" than suffer an error. Hence, add tracking of what views we're examining, in just the same way that the nearby fireRIRrules() code detects self-referential views. I added a check_stack_depth() call too, just to be defensive. Per private report from Manuel Rigger. Back-patch to all supported versions.
* Provide statistics for hypothetical BRIN indexesMichael Paquier2019-11-21
| | | | | | | | | | | | | | | | | | | | | | Trying to use hypothetical indexes with BRIN currently fails when trying to access a relation that does not exist when looking for the statistics. With the current API, it is not possible to easily pass a value for pages_per_range down to the hypothetical index, so this makes use of the default value of BRIN_DEFAULT_PAGES_PER_RANGE, which should be fine enough in most cases. Being able to refine or enforce the hypothetical costs in more optimistic ways would require more refactoring by filling in the statistics when building IndexOptInfo in plancat.c. This would involve ABI breakages around the costing routines, something not fit for stable branches. This is broken since 7e534ad, so backpatch down to v10. Author: Julien Rouhaud, Heikki Linnakangas Reviewed-by: Álvaro Herrera, Tom Lane, Michael Paquier Discussion: https://postgr.es/m/CAOBaU_ZH0LKEA8VFCocr6Lpte1ab0b6FpvgS0y4way+RPSXfYg@mail.gmail.com Backpatch-through: 10
* Further fix dumping of views that contain just VALUES(...).Tom Lane2019-11-16
| | | | | | | | | | | | | | | | | | | | | | | It turns out that commit e9f1c01b7 missed a case: we must print a VALUES clause in long format if get_query_def is given a resultDesc that would require the query's output column name(s) to be different from what the bare VALUES clause would produce. This applies in case an ALTER ... RENAME COLUMN has been done to a view that formerly could be printed in simple format, as shown in the added regression test case. It also explains bug #16119 from Dmitry Telpt, because it turns out that (unlike CREATE VIEW) CREATE MATERIALIZED VIEW fails to apply any column aliases it's given to the stored ON SELECT rule. So to get them to be printed, we have to account for the resultDesc renaming. It might be worth changing the matview code so that it creates the ON SELECT rule with the correct aliases; but we'd still need these messy checks in get_simple_values_rte to handle the case of a subsequent column rename, so any such change would be just neatnik-ism not a bug fix. Like the previous patch, back-patch to all supported branches. Discussion: https://postgr.es/m/16119-e64823f30a45a754@postgresql.org
* Skip system attributes when applying mvdistinct statsTomas Vondra2019-11-16
| | | | | | | | | | | | | | | | | When estimating number of distinct groups, we failed to ignore system attributes when matching the group expressions to mvdistinct stats, causing failures like ERROR: negative bitmapset member not allowed Fix that by simply skipping anything that is not a regular attribute. Backpatch to PostgreSQL 10, where the extended stats were introduced. Bug: #16111 Reported-by: Tuomas Leikola Author: Tomas Vondra Backpatch-through: 10 Discussion: https://postgr.es/m/16111-687799584c3a7e73@postgresql.org
* Fix integer-overflow edge case detection in interval_mul and pgbench.Tom Lane2019-11-07
| | | | | | | | | | | | | | | | | | This patch adopts the overflow check logic introduced by commit cbdb8b4c0 into two more places. interval_mul() failed to notice if it computed a new microseconds value that was one more than INT64_MAX, and pgbench's double-to-int64 logic had the same sorts of edge-case problems that cbdb8b4c0 fixed in the core code. To make this easier to get right in future, put the guts of the checks into new macros in c.h, and add commentary about how to use the macros correctly. Back-patch to all supported branches, as we did with the previous fix. Yuya Watari Discussion: https://postgr.es/m/CAJ2pMkbkkFw2hb9Qb1Zj8d06EhWAQXFLy73St4qWv6aX=vqnjw@mail.gmail.com
* Catch invalid typlens in a couple of placesPeter Eisentraut2019-11-04
| | | | | | | | | | Rearrange the logic in record_image_cmp() and record_image_eq() to error out on unexpected typlens (either not supported there or completely invalid due to corruption). Barring corruption, this is not possible today but it seems more future-proof and robust to fix this. Reported-by: Peter Geoghegan <pg@bowt.ie>
* Avoid failure when selecting a namespace node in XMLTABLE.Tom Lane2019-10-25
| | | | | | | | | | | | | | | It appears that libxml2 doesn't bother to set the "children" field of an XML_NAMESPACE_DECL node to null; that field just contains garbage. In v10 and v11, this can result in a crash in XMLTABLE(). The rewrite done in commit 251cf2e27 fixed this, somewhat accidentally, in v12. We're not going to back-patch 251cf2e27, however. The case apparently doesn't have wide use, so rather than risk introducing other problems, just add a safety check to throw an error. Even though no bug manifests in v12/HEAD, add the relevant test case there too, to prevent future regressions. Chapman Flack (per private report)
* Fix bitshiftright()'s zero-padding some more.Tom Lane2019-10-04
| | | | | | | | | | | | | | | Commit 5ac0d9360 failed to entirely fix bitshiftright's habit of leaving one-bits in the pad space that should be all zeroes, because in a moment of sheer brain fade I'd concluded that only the code path used for not-a-multiple-of-8 shift distances needed to be fixed. Of course, a multiple-of-8 shift distance can also cause the problem, so we need to forcibly zero the extra bits in both cases. Per bug #16037 from Alexander Lakhin. As before, back-patch to all supported branches. Discussion: https://postgr.es/m/16037-1d1ebca564db54f4@postgresql.org
* Fix failure to zero-pad the result of bitshiftright().Tom Lane2019-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the bitstring length is not a multiple of 8, we'd shift the rightmost bits into the pad space, which must be zeroes --- bit_cmp, for one, depends on that. This'd lead to the result failing to compare equal to what it should compare equal to, as reported in bug #16013 from Daryl Waycott. This is, if memory serves, not the first such bug in the bitstring functions. In hopes of making it the last one, do a bit more work than minimally necessary to fix the bug: * Add assertion checks to bit_out() and varbit_out() to complain if they are given incorrectly-padded input. This will improve the odds that manual testing of any new patch finds problems. * Encapsulate the padding-related logic in macros to make it easier to use. Also, remove unnecessary padding logic from bit_or() and bitxor(). Somebody had already noted that we need not re-pad the result of bit_and() since the inputs are required to be the same length, but failed to extrapolate that to the other two. Also, move a comment block that once was near the head of varbit.c (but people kept putting other stuff in front of it), to put it in the header block. Note for the release notes: if anyone has inconsistent data as a result of saving the output of bitshiftright() in a table, it's possible to fix it with something like UPDATE mytab SET bitcol = ~(~bitcol) WHERE bitcol != ~(~bitcol); This has been broken since day one, so back-patch to all supported branches. Discussion: https://postgr.es/m/16013-c2765b6996aacae9@postgresql.org
* Fix memory leak with lower, upper and initcap with ICU-provided collationsMichael Paquier2019-09-03
| | | | | | | | | | | | | | The leak happens in str_tolower, str_toupper and str_initcap, which are used in several places including their equivalent SQL-level functions, and can only be triggered when using an ICU-provided collation when converting the input string. b615920 fixed a similar leak. Backpatch down 10 where ICU collations have been introduced. Author: Konstantin Knizhnik Discussion: https://postgr.es/m/94c0ad0a-cbc2-e4a3-7829-2bdeaf9146db@postgrespro.ru Backpatch-through: 10
* Restore json{b}_populate_record{set}'s ability to take type info from AS.Tom Lane2019-08-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the record argument is NULL and has no declared type more concrete than RECORD, we can't extract useful information about the desired rowtype from it. In this case, see if we're in FROM with an AS clause, and if so extract the needed rowtype info from AS. It worked like this before v11, but commit 37a795a60 removed the behavior, reasoning that it was undocumented, inefficient, and utterly not self-consistent. If you want to take type info from an AS clause, you should be using the json_to_record() family of functions not the json_populate_record() family. Also, it was already the case that the "populate" functions would fail for a null-valued RECORD input (with an unfriendly "record type has not been registered" error) when there wasn't an AS clause at hand, and it wasn't obvious that that behavior wasn't OK when there was one. However, it emerges that some people were depending on this to work, and indeed the rather off-point error message you got if you left off AS encouraged slapping on AS without switching to the json_to_record() family. Hence, put back the fallback behavior of looking for AS. While at it, improve the run-time error you get when there's no place to obtain type info; we can do a lot better than "record type has not been registered". (We can't, unfortunately, easily improve the parse-time error message that leads people down this path in the first place.) While at it, I refactored the code a bit to avoid duplicating the same logic in several different places. Per bug #15940 from Jaroslav Sivy. Back-patch to v11 where the current coding came in. (The pre-v11 deficiencies in this area aren't regressions, so we'll leave those branches alone.) Patch by me, based on preliminary analysis by Dmitry Dolgov. Discussion: https://postgr.es/m/15940-2ab76dc58ffb85b6@postgresql.org
* Fix planner's test for case-foldable characters in ILIKE with ICU.Tom Lane2019-08-12
| | | | | | | | | | | | | As coded, the ICU-collation path in pattern_char_isalpha() failed to consider regular ASCII letters to be case-varying. This led to like_fixed_prefix treating too much of an ILIKE pattern as being a fixed prefix, so that indexscans derived from an ILIKE clause might miss entries that they should find. Per bug #15892 from James Inform. This is an oversight in the original ICU patch (commit eccfef81e), so back-patch to v10 where that came in. Discussion: https://postgr.es/m/15892-e5d2bea3e8a04a1b@postgresql.org
* Fix some incorrect parsing of time with time zone stringsMichael Paquier2019-08-07
| | | | | | | | | | | | | | | | | | | When parsing a timetz string with a dynamic timezone abbreviation or a timezone not specified, it was possible to generate incorrect timestamps based on a date which uses some non-initialized variables if the input string did not specify fully a date to parse. This is already checked when a full timezone spec is included in the input string, but the two other cases mentioned above missed the same checks. This gets fixed by generating an error as this input is invalid, or in short when a date is not fully specified. Valgrind was complaining about this problem. Bug: #15910 Author: Alexander Lakhin Discussion: https://postgr.es/m/15910-2eba5106b9aa0c61@postgresql.org Backpatch-through: 9.4
* Require the schema qualification in pg_temp.type_name(arg).Noah Misch2019-08-05
| | | | | | | | | | | | Commit aa27977fe21a7dfa4da4376ad66ae37cb8f0d0b5 introduced this restriction for pg_temp.function_name(arg); do likewise for types created in temporary schemas. Programs that this breaks should add "pg_temp." schema qualification or switch to arg::type_name syntax. Back-patch to 9.4 (all supported versions). Reviewed by Tom Lane. Reported by Tom Lane. Security: CVE-2019-10208
* Tweak our special-case logic for the IANA "Factory" timezone.Tom Lane2019-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pg_timezone_names() tries to avoid showing the "Factory" zone in the view, mainly because that has traditionally had a very long "abbreviation" such as "Local time zone must be set--see zic manual page", so that showing it messes up psql's formatting of the whole view. Since tzdb version 2016g, IANA instead uses the abbreviation "-00", which is sane enough that there's no reason to discriminate against it. On the other hand, it emerges that FreeBSD and possibly other packagers are so wedded to backwards compatibility that they hack the IANA data to keep the old spelling --- and not just that old spelling, but even older spellings that IANA used back in the stone age. This caused the filter logic to fail to suppress "Factory" at all on such platforms, though the formatting problem is definitely real in that case. To solve both problems, get rid of the hard-wired assumption about exactly what Factory's abbreviation is, and instead reject abbreviations exceeding 31 characters. This will allow Factory to appear in the view if and only if it's using the modern abbreviation. In passing, simplify the code we add to zic.c to support "zic -P" to remove its now-obsolete hacks to not print the Factory zone's abbreviation. Unlike pg_timezone_names(), there's no reason for that code to support old/nonstandard timezone data. Since we generally prefer to keep timezone-related behavior the same in all branches, and since this is arguably a bug fix, back-patch to all supported branches. Discussion: https://postgr.es/m/3961.1564086915@sss.pgh.pa.us
* Fix loss of fractional digits for large values in cash_numeric().Tom Lane2019-07-26
| | | | | | | | | | | | | | | | Money values exceeding about 18 digits (depending on lc_monetary) could be inaccurately converted to numeric, due to select_div_scale() deciding it didn't need to compute any fractional digits. Force its hand by setting the dscale of one division input to equal the number of fractional digits we need. In passing, rearrange the logic to not do useless work in locales where money values are considered integral. Per bug #15925 from Slawomir Chodnicki. Back-patch to all supported branches. Discussion: https://postgr.es/m/15925-da9953e2674bb5c8@postgresql.org
* Fix error in commit e6feef57.Jeff Davis2019-07-18
| | | | | | | I was careless passing a datum directly to DATE_NOT_FINITE without calling DatumGetDateADT() first. Backpatch-through: 9.4
* Fix daterange canonicalization for +/- infinity.Jeff Davis2019-07-18
| | | | | | | | | | | | | | | | | | | The values 'infinity' and '-infinity' are a part of the DATE type itself, so a bound of the date 'infinity' is not the same as an unbounded/infinite range. However, it is still wrong to try to canonicalize such values, because adding or subtracting one has no effect. Fix by treating 'infinity' and '-infinity' the same as unbounded ranges for the purposes of canonicalization (but not other purposes). Backpatch to all versions because it is inconsistent with the documented behavior. Note that this could be an incompatibility for applications relying on the behavior contrary to the documentation. Author: Laurenz Albe Reviewed-by: Thomas Munro Discussion: https://postgr.es/m/77f24ea19ab802bc9bc60ddbb8977ee2d646aec1.camel%40cybertec.at Backpatch-through: 9.4
* Fix get_actual_variable_range() to cope with broken HOT chains.Tom Lane2019-07-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3ca930fc3 modified get_actual_variable_range() to use a new "SnapshotNonVacuumable" snapshot type for selecting tuples that it would consider valid. However, because that snapshot type can accept recently-dead tuples, this caused a bug when using a recently-created index: we might accept a recently-dead tuple that is an early member of a broken HOT chain and does not actually match the index entry. Then, the data extracted from the heap tuple would not necessarily be an endpoint value of the column; it could even be NULL, leading to get_actual_variable_range() itself reporting "found unexpected null value in index". Even without an error, this could lead to poor plan choices due to an erroneous notion of the endpoint value. We can improve matters by changing the code to use the index-only scan technique (which didn't exist when get_actual_variable_range was originally written). If any of the tuples in a HOT chain are live enough to satisfy SnapshotNonVacuumable, we take the data from the index entry, ignoring what is in the heap. This fixes the problem without changing the live-vs-dead-tuple behavior from what was intended by commit 3ca930fc3. A side benefit is that for static tables we might not have to touch the heap at all (when the extremal value is in an all-visible page). In addition, we can save some overhead by not having to create a complete ExecutorState, and we don't need to run FormIndexDatum, avoiding more cycles as well as the possibility of failure for indexes on expressions. (I'm not sure that this code would ever be used to determine the extreme value of an expression, in the current state of the planner; but it's definitely possible that lower-order columns of the selected index could be expressions. So one could construct perhaps-artificial examples in which the old code unexpectedly failed due to trying to compute an expression's value for a now-dead row.) Per report from Manuel Rigger. Back-patch to v11 where commit 3ca930fc3 came in. Discussion: https://postgr.es/m/CA+u7OA7W4NWEhCvftdV6_8bbm2vgypi5nuxfnSEJQqVKFSUoMg@mail.gmail.com
* Don't read fields of a misaligned ExpandedObjectHeader or AnyArrayType.Noah Misch2019-06-30
| | | | | | | | | | | UBSan complains about this. Instead, cast to a suitable type requiring only 4-byte alignment. DatumGetAnyArrayP() already assumes one can cast between AnyArrayType and ArrayType, so this doesn't introduce a new assumption. Back-patch to 9.5, where AnyArrayType was introduced. Reviewed by Tom Lane. Discussion: https://postgr.es/m/20190629210334.GA1244217@rfd.leadboat.com