aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils
Commit message (Collapse)AuthorAge
* Cope with inplace update making catcache stale during TOAST fetch.Noah Misch2024-06-27
| | | | | | | | | | | | | | | | This extends ad98fb14226ae6456fbaed7990ee7591cbe5efd2 to invals of inplace updates. Trouble requires an inplace update of a catalog having a TOAST table, so only pg_database was at risk. (The other catalog on which core code performs inplace updates, pg_class, has no TOAST table.) Trouble would require something like the inplace-inval.spec test. Consider GRANT ... ON DATABASE fetching a stale row from cache and discarding a datfrozenxid update that vac_truncate_clog() has already relied upon. Back-patch to v12 (all supported versions). Reviewed (in an earlier version) by Robert Haas. Discussion: https://postgr.es/m/20240114201411.d0@rfd.leadboat.com Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com
* Avoid crashing when a JIT-inlined backend function throws an error.Tom Lane2024-06-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | errfinish() assumes that the __FUNC__ and __FILE__ arguments it's passed are compile-time constant strings that can just be pointed to rather than physically copied. However, it's possible for LLVM to generate code in which those pointers point into a dynamically loaded code segment. If that segment gets unloaded before we're done with the ErrorData struct, we have dangling pointers that will lead to SIGSEGV. In simple cases that won't happen, because we won't unload LLVM code before end of transaction. But it's possible to happen if the error is thrown within end-of-transaction code run by _SPI_commit or _SPI_rollback, because since commit 2e517818f those functions clean up by ending the transaction and starting a new one. Rather than fixing this by adding pstrdup() overhead to every elog/ereport sequence, let's fix it by copying the risky pointers in CopyErrorData(). That solves it for _SPI_commit/_SPI_rollback because they use that function to preserve the error data across the transaction end/restart sequence; and it seems likely that any other code doing something similar would need to do that too. I'm suspicious that this behavior amounts to an LLVM bug (or a bug in our use of it?), because it implies that string constant references that should be pointer-equal according to a naive understanding of C semantics will sometimes not be equal. However, even if it is a bug and someday gets fixed, we'll have to cope with the current behavior for a long time to come. Report and patch by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/1565654.1719425368@sss.pgh.pa.us
* Fix parsing of ignored operators in websearch_to_tsquery().Tom Lane2024-06-13
| | | | | | | | | | | | | | | | | | | | | | | | | The manual says clearly that punctuation in the input of websearch_to_tsquery() is ignored, except for the special cases of dashes and quotes. However, this failed for cases like "(foo bar) or something", or in general an ISOPERATOR character in front of the "or". We'd switch back to WAITOPERAND state, then ignore the operator character while remaining in that state, and then reach the "or" in WAITOPERAND state which (intentionally) makes us treat it as data. The fix is simple enough: if we see an ISOPERATOR character while in WAITOPERATOR state, we have to skip it while staying in that state. (We don't need to worry about other punctuation characters: those will be consumed as though they were words, but then rejected by lexizing.) In v14 and up (since commit eb086056f) we can simplify the code a bit more too, because there is no longer a reason for the WAITOPERAND state to distinguish between quoted and unquoted operands. Per bug #18479 from Manos Emmanouilidis. Back-patch to all supported branches. Discussion: https://postgr.es/m/18479-d9b46e2fc242c33e@postgresql.org
* Detect more overflows in timestamp[tz]_pl_interval.Tom Lane2024-04-28
| | | | | | | | | | | | | | | | | In commit 25cd2d640 I (tgl) opined that "The additions of the months and microseconds fields could also overflow, of course. However, I believe we need no additional checks there; the existing range checks should catch such cases". This is demonstrably wrong however for the microseconds field, and given that discovery it seems prudent to be paranoid about the months addition as well. Report and patch by Joseph Koshakow. As before, back-patch to all supported branches. (However, the test case doesn't work before v15 because we didn't allow wider-than-int32 numbers in interval literals. A variant test could probably be built that fits within that restriction, but it didn't seem worth the trouble.) Discussion: https://postgr.es/m/CAAvxfHf77sRHKoEzUw9_cMYSpbpNS2C+J_+8Dq4+0oi8iKopeA@mail.gmail.com
* Fix type-checking of RECORD-returning functions in FROM, redux.Tom Lane2024-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2ed8f9a01 intended to institute a policy that if a RangeTblFunction has a coldeflist, then the function return type is certainly RECORD, and we should use the coldeflist as the source of truth about what the columns of the record type are. When the original function has been folded to a constant, inspection of the constant might give a different answer. This situation will lead to a tuple-type-mismatch error at execution, but up until that point we need to consistently believe the coldeflist, or we'll have problems from different bits of code reaching different conclusions. expandRTE didn't get that memo though, and would try to produce a tupdesc based on the constant in this situation, leading to an assertion failure. (Desultory testing suggests that non-assert builds often manage to give the expected error, although I also saw a "cache lookup failed for type 0" error, and it seems at least possible that a crash could happen.) Some other callers of get_expr_result_type and get_expr_result_tupdesc were also being incautious about this. While none of them seem to have actual bugs, they're working harder than necessary in this case, besides which it seems safest to have an explicit policy of not using those functions on an RTE with a coldeflist. Adjust the code accordingly, and add commentary to funcapi.c about this policy. Also fix an obsolete comment that claimed "get_expr_result_type() doesn't know how to extract type info from a RECORD constant". That hasn't been true since commit d57534740. Per bug #18422 from Alexander Lakhin. As with the previous commit, back-patch to all supported branches. Discussion: https://postgr.es/m/18422-89ca86c8eac5246d@postgresql.org
* Allow "make check"-style testing to work with musl C library.Tom Lane2024-03-26
| | | | | | | | | | | | | | | | | | | | | The musl dynamic linker saves a pointer to the process' environment value of LD_LIBRARY_PATH very early in startup. When we move/clobber the environment to make more room for ps status strings, we clobber that value and thereby prevent libraries from being found via LD_LIBRARY_PATH, which breaks the use of a temporary installation for testing purposes. To fix, stop collecting usable space for ps status if we notice that the variable we are about to clobber is LD_LIBRARY_PATH. This will result in some reduction in how long the ps status can be, but it's only likely to occur in temporary test contexts, so it doesn't seem like a big problem. In any case, we don't have to do it if we see we are on glibc, which surely is where the majority of our Linux testing is done. Thomas Munro, Bruce Momjian, and Tom Lane, per report from Wolfgang Walther. Back-patch to all supported branches, with the hope that we'll set up a buildfarm animal to test on this platform. Discussion: https://postgr.es/m/fddd1cd6-dc16-40a2-9eb5-d7fef2101488@technowledgy.de
* Backpatch missing check_stack_depth() to some recursive functionsAlexander Korotkov2024-03-11
| | | | | | | Backpatch changes from d57b7cc333, 75bcba6cbd to all supported branches per proposal of Egor Chindyaskin. Discussion: https://postgr.es/m/DE5FD776-A8CD-4378-BCFA-3BF30F1F6D60%40mail.ru
* Revert "Fix parallel-safety check of expressions and predicate for index builds"Michael Paquier2024-03-07
| | | | | | | | | | | This reverts commit eae7be600be7, following a discussion with Tom Lane, due to concerns that this impacts the decisions made by the planner for the number of workers spawned based on the inlining and const-folding of index expressions and predicate for cases that would have worked until this commit. Discussion: https://postgr.es/m/162802.1709746091@sss.pgh.pa.us Backpatch-through: 12
* Fix parallel-safety check of expressions and predicate for index buildsMichael Paquier2024-03-06
| | | | | | | | | | | | | | | | | | | | | | | | As coded, the planner logic that calculates the number of parallel workers to use for a parallel index build uses expressions and predicates from the relcache, which are flattened for the planner by eval_const_expressions(). As reported in the bug, an immutable parallel-unsafe function flattened in the relcache would become a Const, which would be considered as parallel-safe, even if the predicate or the expressions including the function are not safe in parallel workers. Depending on the expressions or predicate used, this could cause the parallel build to fail. Tests are included that check parallel index builds with parallel-unsafe predicate and expressions. Two routines are added to lsyscache.h to be able to retrieve expressions and predicate of an index from its pg_index data. Reported-by: Alexander Lakhin Author: Tender Wang Reviewed-by: Jian He, Michael Paquier Discussion: https://postgr.es/m/CAHewXN=UaAaNn9ruHDH3Os8kxLVmtWqbssnf=dZN_s9=evHUFA@mail.gmail.com Backpatch-through: 12
* Fix integer underflow in shared memory debuggingDaniel Gustafsson2024-02-29
| | | | | | | | | | | dsa_dump would print a large negative number instead of zero for segment bin 0. Fix by explicitly checking for underflow and add special case for bin 0. Backpatch to all supported versions. Author: Ian Ilyasov <ianilyasov@outlook.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/GV1P251MB1004E0D09D117D3CECF9256ECD502@GV1P251MB1004.EURP251.PROD.OUTLOOK.COM Backpatch-through: v12
* Doc: improve a couple of comments in postgresql.conf.sample.Tom Lane2024-02-15
| | | | | | | | | Clarify comments associated with max_parallel_workers and related settings. Per bug #18343 from Christopher Kline. Discussion: https://postgr.es/m/18343-3a5e903d1d3692ab@postgresql.org
* Remove race condition in pg_get_expr().Tom Lane2024-02-09
| | | | | | | | | | | | | | | | | | | | | | | Since its introduction, pg_get_expr() has intended to silently return NULL if called with an invalid relation OID, as can happen when scanning the catalogs concurrently with relation drops. However, there is a race condition: we check validity of the OID at the start, but it could get dropped just afterward, leading to failures. This is the cause of some intermittent instability we're seeing in a proposed new test case, and presumably it's a hazard in the field as well. We can fix this by AccessShareLock-ing the target relation for the duration of pg_get_expr(). Since we don't require any permissions on the target relation, this is semantically a bit undesirable. But it turns out that the set_relation_column_names() subroutine already takes a transient AccessShareLock on that relation, and has done since commit 2ffa740be in 2012. Given the lack of complaints about that, it seems like there should be no harm in holding the lock a bit longer. Back-patch to all supported branches. Discussion: https://postgr.es/m/31ddcc01-a71b-4e8c-9948-01d1c47293ca@eisentraut.org
* Fix wrong logic in TransactionIdInRecentPast()Alexander Korotkov2024-02-09
| | | | | | | | | | | | | | | | | | | The TransactionIdInRecentPast() should return false for all the transactions older than TransamVariables->oldestClogXid. However, the function contains a bug in comparison FullTransactionId to TransactionID allowing full transactions between nextXid - 2^32 and oldestClogXid - 2^31. This commit fixes TransactionIdInRecentPast() by turning the oldestClogXid into FullTransactionId first, then performing the comparison. Backpatch to all supported versions. Reported-by: Egor Chindyaskin Bug: 18212 Discussion: https://postgr.es/m/18212-547307f8adf57262%40postgresql.org Author: Karina Litskevich Reviewed-by: Kyotaro Horiguchi Backpatch-through: 12
* Translate ENOMEM to ERRCODE_OUT_OF_MEMORY in errcode_for_file_access().Tom Lane2024-02-02
| | | | | | | | | | Previously you got ERRCODE_INTERNAL_ERROR, which seems inappropriate, especially given that we're trying to avoid emitting that in reachable cases. Alexander Kuzmenkov Discussion: https://postgr.es/m/CALzhyqzgQph0BY8-hFRRGdHhF8CoqmmDHW9S=hMZ-HMzLxRqDQ@mail.gmail.com
* Fix incompatibilities with libxml2 >= 2.12.0.Tom Lane2024-01-29
| | | | | | | | | | | | | | | | | | | | | | libxml2 changed the required signature of error handler callbacks to make the passed xmlError struct "const". This is causing build failures on buildfarm member caiman, and no doubt will start showing up in the field quite soon. Add a version check to adjust the declaration of xml_errorHandler() according to LIBXML_VERSION. 2.12.x also produces deprecation warnings for contrib/xml2/xpath.c's assignment to xmlLoadExtDtdDefaultValue. I see no good reason for that to still be there, seeing that we disabled external DTDs (at a lower level) years ago for security reasons. Let's just remove it. Back-patch to all supported branches, since they might all get built with newer libxml2 once it gets a bit more popular. (The back branches produce another deprecation warning about xpath.c's use of xmlSubstituteEntitiesDefault(). We ought to consider whether to back-patch all or part of commit 65c5864d7 to silence that. It's less urgent though, since it won't break the buildfarm.) Discussion: https://postgr.es/m/1389505.1706382262@sss.pgh.pa.us
* Detect Julian-date overflow in timestamp[tz]_pl_interval.Tom Lane2024-01-26
| | | | | | | | | | | | | | | | | | | | We perform addition of the days field of an interval via arithmetic on the Julian-date representation of the timestamp's date. This step is subject to int32 overflow, and we also should not let the Julian date become very negative, for fear of weird results from j2date. (In the timestamptz case, allow a Julian date of -1 to pass, since it might convert back to zero after timezone rotation.) The additions of the months and microseconds fields could also overflow, of course. However, I believe we need no additional checks there; the existing range checks should catch such cases. The difficulty here is that j2date's magic modular arithmetic could produce something that looks like it's in-range. Per bug #18313 from Christian Maurer. This has been wrong for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/18313-64d2c8952d81e84b@postgresql.org
* Re-pgindent catcache.c after previous commit.Tom Lane2024-01-13
| | | | | Discussion: https://postgr.es/m/1393953.1698353013@sss.pgh.pa.us Discussion: https://postgr.es/m/CAGjhLkOoBEC9mLsnB42d3CO1vcMx71MLSEuigeABbQ8oRdA6gw@mail.gmail.com
* Cope with catcache entries becoming stale during detoasting.Tom Lane2024-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've long had a policy that any toasted fields in a catalog tuple should be pulled in-line before entering the tuple in a catalog cache. However, that requires access to the catalog's toast table, and we'll typically do AcceptInvalidationMessages while opening the toast table. So it's possible that the catalog tuple is outdated by the time we finish detoasting it. Since no cache entry exists yet, we can't mark the entry stale during AcceptInvalidationMessages, and instead we'll press forward and build an apparently-valid cache entry. The upshot is that we have a race condition whereby an out-of-date entry could be made in a backend's catalog cache, and persist there indefinitely causing indeterminate misbehavior. To fix, use the existing systable_recheck_tuple code to recheck whether the catalog tuple is still up-to-date after we finish detoasting it. If not, loop around and restart the process of searching the catalog and constructing cache entries from the top. The case is rare enough that this shouldn't create any meaningful performance penalty, even in the SearchCatCacheList case where we need to tear down and reconstruct the whole list. Indeed, the case is so rare that AFAICT it doesn't occur during our regression tests, and there doesn't seem to be any easy way to build a test that would exercise it reliably. To allow testing of the retry code paths, add logic (in USE_ASSERT_CHECKING builds only) that randomly pretends that the recheck failed about one time out of a thousand. This is enough to ensure that we'll pass through the retry paths during most regression test runs. By adding an extra level of looping, this commit creates a need to reindent most of SearchCatCacheMiss and SearchCatCacheList. I'll do that separately, to allow putting those changes in .git-blame-ignore-revs. Patch by me; thanks to Alexander Lakhin for having built a test case to prove the bug is real, and to Xiaoran Wang for review. Back-patch to all supported branches. Discussion: https://postgr.es/m/1393953.1698353013@sss.pgh.pa.us Discussion: https://postgr.es/m/CAGjhLkOoBEC9mLsnB42d3CO1vcMx71MLSEuigeABbQ8oRdA6gw@mail.gmail.com
* Fix failure to verify PGC_[SU_]BACKEND GUCs in pg_file_settings view.Tom Lane2023-12-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set_config_option() bails out early if it detects that the option to be set is PGC_BACKEND or PGC_SU_BACKEND class and we're reading the config file in a postmaster child; we don't want to apply any new value in such a case. That's fine as far as it goes, but it fails to consider the requirements of the pg_file_settings view: for that, we need to check validity of the value even though we have no intention to apply it. Because we didn't, even very silly values for affected GUCs would be reported as valid by the view. There are only half a dozen such GUCs, which perhaps explains why this got overlooked for so long. Fix by continuing when changeVal is false; this parallels the logic in some other early-exit paths. Also, the check added by commit 924bcf4f1 to prevent GUC changes in parallel workers seems a few bricks shy of a load: it's evidently assuming that ereport(elevel, ...) won't return. Make sure we bail out if it does. The lack of trouble reports suggests that this is only a latent bug, i.e. parallel workers don't actually reach here with elevel < ERROR. (Per the code coverage report, we never reach here at all in the regression suite.) But we clearly don't want to risk proceeding if that does happen. Per report from Rıdvan Korkmaz. These are ancient bugs, so back-patch to all supported branches. Discussion: https://postgr.es/m/2089235.1703617353@sss.pgh.pa.us
* Guard against overflow in interval_mul() and interval_div().Dean Rasheed2023-11-18
| | | | | | | | | | | | | | | | | | | | Commits 146604ec43 and a898b409f6 added overflow checks to interval_mul(), but not to interval_div(), which contains almost identical code, and so is susceptible to the same kinds of overflows. In addition, those checks did not catch all possible overflow conditions. Add additional checks to the "cascade down" code in interval_mul(), and copy all the overflow checks over to the corresponding code in interval_div(), so that they both generate "interval out of range" errors, rather than returning bogus results. Given that these errors are relatively easy to hit, back-patch to all supported branches. Per bug #18200 from Alexander Lakhin, and subsequent investigation. Discussion: https://postgr.es/m/18200-5ea288c7b2d504b1%40postgresql.org
* Ensure we use the correct spelling of "ensure"David Rowley2023-11-10
| | | | | | | | | We seem to have accidentally used "insure" in a few places. Correct that. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv0biqrhA3pMhu40aDsj343mTsD75khKnHsLqR8P04f=Q@mail.gmail.com Backpatch-through: 12, oldest supported version
* Detect integer overflow while computing new array dimensions.Tom Lane2023-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | array_set_element() and related functions allow an array to be enlarged by assigning to subscripts outside the current array bounds. While these places were careful to check that the new bounds are allowable, they neglected to consider the risk of integer overflow in computing the new bounds. In edge cases, we could compute new bounds that are invalid but get past the subsequent checks, allowing bad things to happen. Memory stomps that are potentially exploitable for arbitrary code execution are possible, and so is disclosure of server memory. To fix, perform the hazardous computations using overflow-detecting arithmetic routines, which fortunately exist in all still-supported branches. The test cases added for this generate (after patching) errors that mention the value of MaxArraySize, which is platform-dependent. Rather than introduce multiple expected-files, use psql's VERBOSITY parameter to suppress the printing of the message text. v11 psql lacks that parameter, so omit the tests in that branch. Our thanks to Pedro Gallegos for reporting this problem. Security: CVE-2023-5869
* Set GUC "is_superuser" in all processes that set AuthenticatedUserId.Noah Misch2023-11-06
| | | | | | | | | It was always false in single-user mode, in autovacuum workers, and in background workers. This had no specifically-identified security consequences, but non-core code or future work might make it security-relevant. Back-patch to v11 (all supported versions). Jelte Fennema-Nio. Reported by Jelte Fennema-Nio.
* Be more wary about NULL values for GUC string variables.Tom Lane2023-11-02
| | | | | | | | | | | | | get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN has a NULL boot_val. Nosing around found a couple of other places that seemed insufficiently cautious about NULL string values, although those are likely unreachable in practice. Add some commentary defining the expectations for NULL values of string variables, in hopes of forestalling future additions of more such bugs. Xing Guo, Aleksander Alekseev, Tom Lane Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
* doc: 1-byte varlena headers can be used for user PLAIN storageBruce Momjian2023-10-31
| | | | | | | | | | | | This also updates some C comments. Reported-by: suchithjn22@gmail.com Discussion: https://postgr.es/m/167336599095.2667301.15497893107226841625@wrigleys.postgresql.org Author: Laurenz Albe (doc patch) Backpatch-through: 11
* Dodge a compiler bug affecting timetz_zone/timetz_izone.Tom Lane2023-10-20
| | | | | | | | | | | | | This avoids a compiler bug occurring in AIX's xlc, even in pretty late-model revisions. Buildfarm testing has now confirmed that only 64-bit xlc is affected. Although we are contemplating dropping support for xlc in v17, it's still supported in the back branches, so we need this fix. Back-patch of code changes from HEAD commit 19fa97731. (The test cases were already back-patched, in 4a427b82c et al.) Discussion: https://postgr.es/m/CA+hUKGK=DOC+hE-62FKfZy=Ybt5uLkrg3zCZD-jFykM-iPn8yw@mail.gmail.com
* Avoid calling proc_exit() in processes forked by system().Nathan Bossart2023-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The SIGTERM handler for the startup process immediately calls proc_exit() for the duration of the restore_command, i.e., a call to system(). This system() call forks a new process to execute the shell command, and this child process inherits the parent's signal handlers. If both the parent and child processes receive SIGTERM, both will attempt to call proc_exit(). This can end badly. For example, both processes will try to remove themselves from the PGPROC shared array. To fix this problem, this commit adds a check in StartupProcShutdownHandler() to see whether MyProcPid == getpid(). If they match, this is the parent process, and we can proc_exit() like before. If they do not match, this is a child process, and we just emit a message to STDERR (in a signal safe manner) and _exit(), thereby skipping any problematic exit callbacks. This commit also adds checks in proc_exit(), ProcKill(), and AuxiliaryProcKill() that verify they are not being called within such child processes. Suggested-by: Andres Freund Reviewed-by: Thomas Munro, Andres Freund Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13 Backpatch-through: 11
* Acquire ControlFileLock in relevant SQL functions.Thomas Munro2023-10-16
| | | | | | | | | | | | | | Commit dc7d70ea added functions that read the control file, but didn't acquire ControlFileLock. With unlucky timing, file systems that have weak interlocking like ext4 and ntfs could expose partially overwritten contents, and the checksum would fail. Back-patch to all supported releases. Reviewed-by: David Steele <david@pgmasters.net> Reviewed-by: Anton A. Melnikov <aamelnikov@inbox.ru> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/20221123014224.xisi44byq3cf5psi%40awork3.anarazel.de
* Fix datalen calculation in tsvectorrecv().Tom Lane2023-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After receiving position data for a lexeme, tsvectorrecv() advanced its "datalen" value by (npos+1)*sizeof(WordEntry) where the correct calculation is (npos+1)*sizeof(WordEntryPos). This accidentally failed to render the constructed tsvector invalid, but it did result in leaving some wasted space approximately equal to the space consumed by the position data. That could have several bad effects: * Disk space is wasted if the received tsvector is stored into a table as-is. * A legal tsvector could get rejected with "maximum total lexeme length exceeded" if the extra space pushes it over the MAXSTRPOS limit. * In edge cases, the finished tsvector could be assigned a length larger than the allocated size of its palloc chunk, conceivably leading to SIGSEGV when the tsvector gets copied somewhere else. The odds of a field failure of this sort seem low, though valgrind testing could probably have found this. While we're here, let's express the calculation as "sizeof(uint16) + npos * sizeof(WordEntryPos)" to avoid the type pun implicit in the "npos + 1" formulation. It's not wrong given that WordEntryPos had better be 2 bytes to avoid padding problems, but it seems clearer this way. Report and patch by Denis Erokhin. Back-patch to all supported versions. Discussion: https://postgr.es/m/009801d9f2d9$f29730c0$d7c59240$@datagile.ru
* Don't crash if cursor_to_xmlschema is used on a non-data-returning Portal.Tom Lane2023-09-18
| | | | | | | | | | | | | | | | cursor_to_xmlschema() assumed that any Portal must have a tupDesc, which is not so. Add a defensive check. It's plausible that this mistake occurred because of the rather poorly chosen name of the lookup function SPI_cursor_find(), which in such cases is returning something that isn't very much like a cursor. Add some documentation to try to forestall future errors of the same ilk. Report and patch by Boyu Yang (docs changes by me). Back-patch to all supported branches. Discussion: https://postgr.es/m/dd343010-c637-434c-a8cb-418f53bda3b8.yangboyu.yby@alibaba-inc.com
* Track nesting depth correctly when drilling down into RECORD Vars.Tom Lane2023-09-15
| | | | | | | | | | | | | | | | | | expandRecordVariable() failed to adjust the parse nesting structure correctly when recursing to inspect an outer-level Var. This could result in assertion failures or core dumps in corner cases. Likewise, get_name_for_var_field() failed to adjust the deparse namespace stack correctly when recursing to inspect an outer-level Var. In this case the likely result was a "bogus varno" error while deparsing a view. Per bug #18077 from Jingzhou Fu. Back-patch to all supported branches. Richard Guo, with some adjustments by me Discussion: https://postgr.es/m/18077-b9db97c6e0ab45d8@postgresql.org
* Fix get_expr_result_type() to find field names for RECORD Consts.Tom Lane2023-09-15
| | | | | | | | | | | | | | | | | This is a back-patch of commit d57534740 ("Fix EXPLAIN of SEARCH BREADTH FIRST with a constant initial value") into pre-v14 branches. At the time I'd thought it was not needed in branches that lack the SEARCH/CYCLE feature, but that was just a failure of imagination. It's possible to demonstrate "record type has not been registered" failures in older branches too, during deparsing of views that contain references to fields of composite constants. Back-patch only the code changes, as the test cases added by d57534740 all require SEARCH/CYCLE syntax. A suitable test case will be added in the upcoming fix for bug #18077. Discussion: https://postgr.es/m/17644-3bd1f3036d6d7a16@postgresql.org Discussion: https://postgr.es/m/3607145.1694803130@sss.pgh.pa.us
* Allow extracting fields from a ROW() expression in more cases.Tom Lane2023-09-15
| | | | | | | | | | | | | | | | | | | | | | Teach get_expr_result_type() to manufacture a tuple descriptor directly from a RowExpr node. If the RowExpr has type RECORD, this is the only way to get a tupdesc for its result, since even if the rowtype has been blessed, we don't have its typmod available at this point. (If the RowExpr has some named composite type, we continue to let the existing code handle it, since the RowExpr might well not have the correct column names embedded in it.) This fixes assorted corner cases illustrated by the added regression tests. This is a back-patch of the v13-era commit 8b7a0f1d1 into previous branches. At the time I'd judged it not important enough to back-patch, but the upcoming fix for bug #18077 includes a test case that depends on this working correctly; and 8b7a0f1d1 has now aged long enough to have good confidence that it won't break anything. Discussion: https://postgr.es/m/10872.1572202006@sss.pgh.pa.us Discussion: https://postgr.es/m/3607145.1694803130@sss.pgh.pa.us
* Revert "Improve error message on snapshot import in snapmgr.c"Michael Paquier2023-09-14
| | | | | | | | | | This reverts commit a0d87bcd9b57, following a remark from Andres Frend that the new error can be triggered with an incorrect SET TRANSACTION SNAPSHOT command without being really helpful for the user as it uses the internal file name. Discussion: https://postgr.es/m/20230914020724.hlks7vunitvtbbz4@awork3.anarazel.de Backpatch-through: 11
* Improve error message on snapshot import in snapmgr.cMichael Paquier2023-09-14
| | | | | | | | | | | | | | | | | When a snapshot file fails to be read in ImportSnapshot(), it would issue an ERROR as "invalid snapshot identifier" when opening a stream for it in read-only mode. This error message is reworded to be the same as all the other messages used in this case on failure, which is useful when debugging this area. Thinko introduced by bb446b689b66 where snapshot imports have been added. A backpatch down to 11 is done as this can improve any work related to snapshot imports in older branches. Author: Bharath Rupireddy Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/CALj2ACWmr=3KdxDkm8h7Zn1XxBoF6hdzq8WQyMn2y1OL5RYFrg@mail.gmail.com Backpatch-through: 11
* Fix exception safety bug in typcache.c.Thomas Munro2023-09-13
| | | | | | | | | | | | | | | | If an out-of-memory error was thrown at an unfortunate time, ensure_record_cache_typmod_slot_exists() could leak memory and leave behind a global state that produced an infinite loop on the next call. Fix by merging RecordCacheArray and RecordIdentifierArray into a single array. With only one allocation or re-allocation, there is no intermediate state. Back-patch to all supported releases. Reported-by: "James Pang (chaolpan)" <chaolpan@cisco.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/PH0PR11MB519113E738814BDDA702EDADD6EFA%40PH0PR11MB5191.namprd11.prod.outlook.com
* Fix out-of-bound read in gtsvector_picksplit()Michael Paquier2023-09-04
| | | | | | | | | | | | | | This could lead to an imprecise choice when splitting an index page of a GiST index on a tsvector, deciding which entries should remain on the old page and which entries should move to a new page. This is wrong since tsearch2 has been moved into core with commit 140d4ebcb46e, so backpatch all the way down. This error has been spotted by valgrind. Author: Alexander Lakhin Discussion: https://postgr.es/m/17950-6c80a8d2b94ec695@postgresql.org Backpatch-through: 11
* Avoid unnecessary plancache revalidation of utility statements.Tom Lane2023-08-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revalidation of a plancache entry (after a cache invalidation event) requires acquiring a snapshot. Normally that is harmless, but not if the cached statement is one that needs to run without acquiring a snapshot. We were already aware of that for TransactionStmts, but for some reason hadn't extrapolated to the other statements that PlannedStmtRequiresSnapshot() knows mustn't set a snapshot. This can lead to unexpected failures of commands such as SET TRANSACTION ISOLATION LEVEL. We can fix it in the same way, by excluding those command types from revalidation. However, we can do even better than that: there is no need to revalidate for any statement type for which parse analysis, rewrite, and plan steps do nothing interesting, which is nearly all utility commands. To mechanize this, invent a parser function stmt_requires_parse_analysis() that tells whether parse analysis does anything beyond wrapping a CMD_UTILITY Query around the raw parse tree. If that's what it does, then rewrite and plan will just skip the Query, so that it is not possible for the same raw parse tree to produce a different plan tree after cache invalidation. stmt_requires_parse_analysis() is basically equivalent to the existing function analyze_requires_snapshot(), except that for obscure reasons that function omits ReturnStmt and CallStmt. It is unclear whether those were oversights or intentional. I have not been able to demonstrate a bug from not acquiring a snapshot while analyzing these commands, but at best it seems mighty fragile. It seems safer to acquire a snapshot for parse analysis of these commands too, which allows making stmt_requires_parse_analysis and analyze_requires_snapshot equivalent. In passing this fixes a second bug, which is that ResetPlanCache would exclude ReturnStmts and CallStmts from revalidation. That's surely *not* safe, since they contain parsable expressions. Per bug #18059 from Pavel Kulakov. Back-patch to all supported branches. Discussion: https://postgr.es/m/18059-79c692f036b25346@postgresql.org
* Fix overly strict Assert in jsonpath codeDavid Rowley2023-08-02
| | | | | | | | | | | | | This was failing for queries which try to get the .type() of a jpiLikeRegex. For example: select jsonb_path_query('["string", "string"]', '($[0] like_regex ".{7}").type()'); Reported-by: Alexander Kozhemyakin Bug: #18035 Discussion: https://postgr.es/m/18035-64af5cdcb5adf2a9@postgresql.org Backpatch-through: 12, where SQL/JSON path was added.
* Add indisreplident to fields refreshed by RelationReloadIndexInfo()Michael Paquier2023-07-14
| | | | | | | | | | | | | | | | | | | RelationReloadIndexInfo() is a fast-path used for index reloads in the relation cache, and it has always forgotten about updating indisreplident, which is something that would happen after an index is selected for a replica identity. This can lead to incorrect cache information provided when executing a command in a transaction context that updates indisreplident. None of the code paths currently on HEAD that need to check upon pg_index.indisreplident fetch its value from the relation cache, always relying on a fresh copy on the syscache. Unfortunately, this may not be the case of out-of-core code, that could see out-of-date value. Author: Shruthi Gowda Reviewed-by: Robert Haas, Dilip Kumar, Michael Paquier Discussion: https://postgr.es/m/CAASxf_PBcxax0wW-3gErUyftZ0XrCs3Lrpuhq4-Z3Fak1DoW7Q@mail.gmail.com Backpatch-through: 11
* Handle DROP DATABASE getting interruptedAndres Freund2023-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, when DROP DATABASE got interrupted in the wrong moment, the removal of the pg_database row would also roll back, even though some irreversible steps have already been taken. E.g. DropDatabaseBuffers() might have thrown out dirty buffers, or files could have been unlinked. But we continued to allow connections to such a corrupted database. To fix this, mark databases invalid with an in-place update, just before starting to perform irreversible steps. As we can't add a new column in the back branches, we use pg_database.datconnlimit = -2 for this purpose. An invalid database cannot be connected to anymore, but can still be dropped. Unfortunately we can't easily add output to psql's \l to indicate that some database is invalid, it doesn't fit in any of the existing columns. Add tests verifying that a interrupted DROP DATABASE is handled correctly in the backend and in various tools. Reported-by: Evgeny Morozov <postgresql3@realityexists.net> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm7p6@awork3.anarazel.de Discussion: https://postgr.es/m/20230314174521.74jl6ffqsee5mtug@awork3.anarazel.de Backpatch: 11-, bug present in all supported versions
* Re-bin segment when memory pages are freed.Thomas Munro2023-07-04
| | | | | | | | | | | | | | | | | | | | | It's OK to be lazy about re-binning memory segments when allocating, because that can only leave segments in a bin that's too high. We'll search higher bins if necessary while allocating next time, and also eventually re-bin, so no memory can become unreachable that way. However, when freeing memory, the largest contiguous range of free pages might go up, so we should re-bin eagerly to make sure we don't leave the segment in a bin that is too low for get_best_segment() to find. The re-binning code is moved into a function of its own, so it can be called whenever free pages are returned to the segment's free page map. Back-patch to all supported releases. Author: Dongming Liu <ldming101@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier version) Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CAL1p7e8LzB2LSeAXo2pXCW4%2BRya9s0sJ3G_ReKOU%3DAjSUWjHWQ%40mail.gmail.com
* Avoid Assert failure when processing empty statement in aborted xact.Tom Lane2023-06-21
| | | | | | | | | | | | | | | | | | | | | exec_parse_message() wants to create a cached plan in all cases, including for empty input. The empty-input path does not have a test for being in an aborted transaction, making it possible that plancache.c will fail due to trying to do database lookups even though there's no real work to do. One solution would be to throw an aborted-transaction error in this path too, but it's not entirely clear whether the lack of such an error was intentional or whether some clients might be relying on non-error behavior. Instead, let's hack plancache.c so that it treats empty statements with the same logic it already had for transaction control commands, ensuring that it can soldier through even in an already-aborted transaction. Per bug #17983 from Alexander Lakhin. Back-patch to all supported branches. Discussion: https://postgr.es/m/17983-da4569fcb878672e@postgresql.org
* In array_position()/array_positions(), beware of empty input array.Tom Lane2023-05-04
| | | | | | | | | | | | These functions incautiously fetched the array's first lower bound even when the array is zero-dimensional, thus fetching the word after the allocated array space. While almost always harmless, with very bad luck this could result in SIGSEGV. Fix by adding an early exit for empty input. Per bug #17920 from Alexander Lakhin. Discussion: https://postgr.es/m/17920-f7c228c627b6d02e%40postgresql.org
* Fix oversights in array manipulation.Tom Lane2023-03-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nested-arrays code path in ExecEvalArrayExpr() used palloc to allocate the result array, whereas every other array-creating function has used palloc0 since 18c0b4ecc. This mostly works, but unused bits past the end of the nulls bitmap may end up undefined. That causes valgrind complaints with -DWRITE_READ_PARSE_PLAN_TREES, and could cause planner misbehavior as cited in 18c0b4ecc. There seems no very good reason why we should strive to avoid palloc0 in just this one case, so fix it the easy way with s/palloc/palloc0/. While looking at that I noted that we also failed to check for overflow of "nbytes" and "nitems" while summing the sizes of the sub-arrays, potentially allowing a crash due to undersized output allocation. For "nbytes", follow the policy used by other array-munging code of checking for overflow after each addition. (As elsewhere, the last addition of the array's overhead space doesn't need an extra check, since palloc itself will catch a value between 1Gb and 2Gb.) For "nitems", there's no very good reason to sum the inputs at all, since we can perfectly well use ArrayGetNItems' result instead of ignoring it. Per discussion of this bug, also remove redundant zeroing of the nulls bitmap in array_set_element and array_set_slice. Patch by Alexander Lakhin and myself, per bug #17858 from Alexander Lakhin; thanks also to Richard Guo. These bugs are a dozen years old, so back-patch to all supported branches. Discussion: https://postgr.es/m/17858-8fd287fd3663d051@postgresql.org
* Work around spurious compiler warning in inet operatorsAndres Freund2023-03-16
| | | | | | | | | | | | | | | | | | | | | | | | | | gcc 12+ has complaints like the following: ../../../../../pgsql/src/backend/utils/adt/network.c: In function 'inetnot': ../../../../../pgsql/src/backend/utils/adt/network.c:1893:34: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=] 1893 | pdst[nb] = ~pip[nb]; | ~~~~~~~~~^~~~~~~~~~ ../../../../../pgsql/src/include/utils/inet.h:27:23: note: at offset -1 into destination object 'ipaddr' of size 16 27 | unsigned char ipaddr[16]; /* up to 128 bits of address */ | ^~~~~~ ../../../../../pgsql/src/include/utils/inet.h:27:23: note: at offset -1 into destination object 'ipaddr' of size 16 This is due to a compiler bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104986 It has been a year since the bug has been reported without getting fixed. As the warnings are verbose and use of gcc 12 is becoming more common, it seems worth working around the bug. Particularly because a simple reformulation of the loop condition fixes the issue and isn't any less readable. Author: Tom Lane <tgl@sss.pgh.pa.us> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/144536.1648326206@sss.pgh.pa.us Backpatch: 11-
* Fix corner case bug in numeric to_char() some more.Tom Lane2023-03-14
| | | | | | | | | | | | | | | The band-aid applied in commit f0bedf3e4 turns out to still need some work: it made sure we didn't set Np->last_relevant too small (to the left of the decimal point), but it didn't prevent setting it too large (off the end of the partially-converted string). This could result in fetching data beyond the end of the allocated space, which with very bad luck could cause a SIGSEGV, though I don't see any hazard of interesting memory disclosure. Per bug #17839 from Thiago Nunes. The bug's pretty ancient, so back-patch to all supported versions. Discussion: https://postgr.es/m/17839-aada50db24d7b0da@postgresql.org
* Fix JSON error reporting for many cases of erroneous string values.Tom Lane2023-03-13
| | | | | | | | | | | | | | | | | | | | | | | The majority of error exit cases in json_lex_string() failed to set lex->token_terminator, causing problems for the error context reporting code: it would see token_terminator less than token_start and do something more or less nuts. In v14 and up the end result could be as bad as a crash in report_json_context(). Older versions accidentally avoided that fate; but all versions produce error context lines that are far less useful than intended, because they'd stop at the end of the prior token instead of continuing to where the actually-bad input is. To fix, invent some macros that make it less notationally painful to do the right thing. Also add documentation about what the function is actually required to do; and in >= v14, add an assertion in report_json_context about token_terminator being sufficiently far advanced. Per report from Nikolay Shaplov. Back-patch to all supported versions. Discussion: https://postgr.es/m/7332649.x5DLKWyVIX@thinkpad-pgpro
* Fix corruption due to vacuum_defer_cleanup_age underflowing 64bit xidsAndres Freund2023-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When vacuum_defer_cleanup_age is bigger than the current xid, including the epoch, the subtraction of vacuum_defer_cleanup_age would lead to a wrapped around xid. While that normally is not a problem, the subsequent conversion to a 64bit xid results in a 64bit-xid very far into the future. As that xid is used as a horizon to detect whether rows versions are old enough to be removed, that allows removal of rows that are still visible (i.e. corruption). If vacuum_defer_cleanup_age was never changed from the default, there is no chance of this bug occurring. This bug was introduced in dc7420c2c92. A lesser version of it exists in 12-13, introduced by fb5344c969a, affecting only GiST. The 12-13 version of the issue can, in rare cases, lead to pages in a gist index getting recycled too early, potentially causing index entries to be found multiple times. The fix is fairly simple - don't allow vacuum_defer_cleanup_age to retreat further than FirstNormalTransactionId. Patches to make similar bugs easier to find, by adding asserts to the 64bit xid infrastructure, have been proposed, but are not suitable for backpatching. Currently there are no tests for vacuum_defer_cleanup_age. A patch introducing infrastructure to make writing a test easier has been posted to the list. Reported-by: Michail Nikolaev <michail.nikolaev@gmail.com> Reviewed-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20230108002923.cyoser3ttmt63bfn@awork3.anarazel.de Backpatch: 12-, but impact/fix is smaller for 12-13
* Avoid fetching one past the end of translate()'s "to" parameter.Tom Lane2023-03-01
| | | | | | | | | | | | | | | | | This is usually harmless, but if you were very unlucky it could provoke a segfault due to the "to" string being right up against the end of memory. Found via valgrind testing (so we might've found it earlier, except that our regression tests lacked any exercise of translate()'s deletion feature). Fix by switching the order of the test-for-end-of-string and advance-pointer steps. While here, compute "to_ptr + tolen" just once. (Smarter compilers might figure that out for themselves, but let's just make sure.) Report and fix by Daniil Anisimov, in bug #17816. Discussion: https://postgr.es/m/17816-70f3d2764e88a108@postgresql.org