aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Address set of issues with errno handlingMichael Paquier2018-06-25
| | | | | | | | | | | | | | | | | | | | System calls mixed up in error code paths are causing two issues which several code paths have not correctly handled: 1) For write() calls, sometimes the system may return less bytes than what has been written without errno being set. Some paths were careful enough to consider that case, and assumed that errno should be set to ENOSPC, other calls missed that. 2) errno generated by a system call is overwritten by other system calls which may succeed once an error code path is taken, causing what is reported to the user to be incorrect. This patch uses the brute-force approach of correcting all those code paths. Some refactoring could happen in the future, but this is let as future work, which is not targeted for back-branches anyway. Author: Michael Paquier Reviewed-by: Ashutosh Sharma Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz
* Fix partial aggregation for variance(int4) and related aggregates.Tom Lane2018-06-21
| | | | | | | | | | | | | A typo in numeric_poly_combine caused bogus results for queries using it, but of course would only manifest if parallel aggregation is performed. Reported by Rajkumar Raghuwanshi. David Rowley did the diagnosis and the fix; I editorialized rather heavily on his regression test additions. Back-patch to v10 where the breakage was introduced (by 9cca11c91). Discussion: https://postgr.es/m/CAKcux6nU4E2x8nkSBpLOT2DPvQ5LviJ3SGyAN6Sz7qDH4G4+Pw@mail.gmail.com
* Fix mishandling of sortgroupref labels while splitting SRF targetlists.Tom Lane2018-06-21
| | | | | | | | | | | | | | | | split_pathtarget_at_srfs() neglected to worry about sortgroupref labels in the intermediate PathTargets it constructs. I think we'd supposed that their labeling didn't matter, but it does at least for the case that GroupAggregate/GatherMerge nodes appear immediately under the ProjectSet step(s). This results in "ERROR: ORDER/GROUP BY expression not found in targetlist" during create_plan(), as reported by Rajkumar Raghuwanshi. To fix, make this logic track the sortgroupref labeling of expressions, not just their contents. This also restores the pre-v10 behavior that separate GROUP BY expressions will be kept distinct even if they are textually equal(). Discussion: https://postgr.es/m/CAKcux6=1_Ye9kx8YLBPmJs_xE72PPc6vNi5q2AOHowMaCWjJ2w@mail.gmail.com
* Accept TEXT and CDATA nodes in XMLTABLE's column_expression.Alvaro Herrera2018-06-20
| | | | | | | | | | | Column expressions that match TEXT or CDATA nodes must return the contents of the nodes themselves, not the content of non-existing children (i.e. the empty string). Author: Markus Winand Reported-by: Markus Winand Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/0684A598-002C-42A2-AE12-F024A324EAE4@winand.at
* Clarify use of temporary tables within partition treesMichael Paquier2018-06-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since their introduction, partition trees have been a bit lossy regarding temporary relations. Inheritance trees respect the following patterns: 1) a child relation can be temporary if the parent is permanent. 2) a child relation can be temporary if the parent is temporary. 3) a child relation cannot be permanent if the parent is temporary. 4) The use of temporary relations also imply that when both parent and child need to be from the same sessions. Partitions share many similar patterns with inheritance, however the handling of the partition bounds make the situation a bit tricky for case 1) as the partition code bases a lot of its lookup code upon PartitionDesc which does not really look after relpersistence. This causes for example a temporary partition created by session A to be visible by another session B, preventing this session B to create an extra partition which overlaps with the temporary one created by A with a non-intuitive error message. There could be use-cases where mixing permanent partitioned tables with temporary partitions make sense, but that would be a new feature. Partitions respect 2), 3) and 4) already. It is a bit depressing to see those error checks happening in MergeAttributes() whose purpose is different, but that's left as future refactoring work. Back-patch down to 10, which is where partitioning has been introduced, except that default partitions do not apply there. Documentation also includes limitations related to the use of temporary tables with partition trees. Reported-by: David Rowley Author: Amit Langote, Michael Paquier Reviewed-by: Ashutosh Bapat, Amit Langote, Michael Paquier Discussion: https://postgr.es/m/CAKJS1f94Ojk0og9GMkRHGt8wHTW=ijq5KzJKuoBoqWLwSVwGmw@mail.gmail.com
* Prevent hard failures of standbys caused by recycled WAL segmentsMichael Paquier2018-06-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a standby's WAL receiver stops reading WAL from a WAL stream, it writes data to the current WAL segment without having priorily zero'ed the page currently written to, which can cause the WAL reader to read junk data from a past recycled segment and then it would try to get a record from it. While sanity checks in place provide most of the protection needed, in some rare circumstances, with chances increasing when a record header crosses a page boundary, then the startup process could fail violently on an allocation failure, as follows: FATAL: invalid memory alloc request size XXX This is confusing for the user and also unhelpful as this requires in the worst case a manual restart of the instance, impacting potentially the availability of the cluster, and this also makes WAL data look like it is in a corrupted state. The chances of seeing failures are higher if the connection between the standby and its root node is unstable, causing WAL pages to be written in the middle. A couple of approaches have been discussed, like zero-ing new WAL pages within the WAL receiver itself but this has the disadvantage of impacting performance of any existing instances as this breaks the sequential writes done by the WAL receiver. This commit deals with the problem with a more simple approach, which has no performance impact without reducing the detection of the problem: if a record is found with a length higher than 1GB for backends, then do not try any allocation and report a soft failure which will force the standby to retry reading WAL. It could be possible that the allocation call passes and that an unnecessary amount of memory is allocated, however follow-up checks on records would just fail, making this allocation short-lived anyway. This patch owes a great deal to Tsunakawa Takayuki for reporting the failure first, and then discussing a couple of potential approaches to the problem. Backpatch down to 9.5, which is where palloc_extended has been introduced. Reported-by: Tsunakawa Takayuki Reviewed-by: Tsunakawa Takayuki Author: Michael Paquier Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8B57AD@G01JPEXMBYT05
* Fail BRIN control functions during recovery explicitlyAlvaro Herrera2018-06-14
| | | | | | | | | | | They already fail anyway, but prior to this patch they raise an ugly error message about a lock that cannot be acquired. This just improves the message. Author: Masahiko Sawada Reported-by: Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoBZau4g4_NUf3BKNd=CdYK+xaPdtJCzvOC1TxGdTiJx_Q@mail.gmail.com Reviewed-by: Kuntal Ghosh, Alexander Korotkov, Simon Riggs, Michaël Paquier, Álvaro Herrera
* Fix bugs in vacuum of shared rels, by keeping their relcache entries current.Andres Freund2018-06-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When vacuum processes a relation it uses the corresponding relcache entry's relfrozenxid / relminmxid as a cutoff for when to remove tuples etc. Unfortunately for nailed relations (i.e. critical system catalogs) bugs could frequently lead to the corresponding relcache entry being stale. This set of bugs could cause actual data corruption as vacuum would potentially not remove the correct row versions, potentially reviving them at a later point. After 699bf7d05c some corruptions in this vein were prevented, but the additional error checks could also trigger spuriously. Examples of such errors are: ERROR: found xmin ... from before relfrozenxid ... and ERROR: found multixact ... from before relminmxid ... To be caused by this bug the errors have to occur on system catalog tables. The two bugs are: 1) Invalidations for nailed relations were ignored, based on the theory that the relcache entry for such tables doesn't change. Which is largely true, except for fields like relfrozenxid etc. This means that changes to relations vacuumed in other sessions weren't picked up by already existing sessions. Luckily autovacuum doesn't have particularly longrunning sessions. 2) For shared *and* nailed relations, the shared relcache init file was never invalidated while running. That means that for such tables (e.g. pg_authid, pg_database) it's not just already existing sessions that are affected, but even new connections are as well. That explains why the reports usually were about pg_authid et. al. To fix 1), revalidate the rd_rel portion of a relcache entry when invalid. This implies a bit of extra complexity to deal with bootstrapping, but it's not too bad. The fix for 2) is simpler, simply always remove both the shared and local init files. Author: Andres Freund Reviewed-By: Alvaro Herrera Discussion: https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com Backpatch: 9.3-
* Fix access to just-closed relcache entry.Tom Lane2018-06-11
| | | | | | | | | | | | It might be impossible for this to cause a problem in non-debug builds, since there'd be no opportunity for the relcache entry to get recycled before the fetch. It blows up nicely with -DRELCACHE_FORCE_RELEASE plus valgrind, though. Evidently introduced by careless refactoring in commit f0e44751d. Back-patch accordingly. Discussion: https://postgr.es/m/27543.1528758304@sss.pgh.pa.us
* Teach SHOW ALL to honor pg_read_all_settings membershipAlvaro Herrera2018-06-08
| | | | | | | | | | | | | | | | | Also, fix the pg_settings view to display source filename and line number when invoked by a pg_read_all_settings member. This addition by me (Álvaro). Also, fix wording of the comment in GetConfigOption regarding the restriction it implements, renaming the parameter for extra clarity. Noted by Michaël. These were all oversight in commit 25fff40798fc; backpatch to pg10, where that commit first appeared. Author: Laurenz Albe Reviewed-by: Michaël Paquier, Álvaro Herrera Discussion: https://postgr.es/m/1519917758.6586.8.camel@cybertec.at
* Fix typoPeter Eisentraut2018-06-08
|
* Fix obsolete comment.Heikki Linnakangas2018-06-07
| | | | | | | | The 'orig_slot' argument was removed in commit c0a8ae7be392, but that commit forgot to update the comment. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/194ac4bf-7b4a-c887-bf26-bc1a85ea995a@lab.ntt.co.jp
* Fix function code in error reportAlvaro Herrera2018-06-06
| | | | | | | | | | | | | This bug causes a lseek() failure to be reported as a "could not open" failure in the error message, muddling bug reports. I introduced this copy-and-pasteo in commit 78e122010422. Noticed while reviewing code for bug report #15221, from lily liang. In version 10 the affected function is only used by multixact.c and commit_ts, and only in corner-case circumstances, neither of which are involved in the reported bug (a pg_subtrans failure.) Author: Álvaro Herrera
* Fix objectaddress.c code for publication relations.Tom Lane2018-05-24
| | | | | | | | | | | | | | | | | | | | getObjectDescription and getObjectIdentity failed to schema-qualify the name of the published table, which is bad in getObjectDescription and unforgivable in getObjectIdentity. Actually, getObjectIdentity failed to emit the table's name at all unless "objname" output is requested, which accidentally works for some (all?) extant callers but is clearly not the intended API. Somebody had also not gotten the memo that the output of getObjectIdentity is not to be translated. To fix getObjectDescription, I made it call getRelationDescription, which required refactoring the translatable string for the case, but is more future-proof in case we ever publish relations that aren't plain tables. While at it, I made the English output look like "publication of table X in publication Y"; the added "of" seems to me to make it read much better. Back-patch to v10 where publications were introduced. Discussion: https://postgr.es/m/20180522.182020.114074746.horiguchi.kyotaro@lab.ntt.co.jp
* Properly schema-qualify additional object types in getObjectDescription().Tom Lane2018-05-24
| | | | | | | | | | | | | | | | | Collations, conversions, extended statistics objects (in >= v10), and all four types of text search objects have schema-qualified names. getObjectDescription() ignored that and would emit just the base name of the object, potentially producing wrong or at least highly misleading output. Fix it to add the schema name whenever the object is not "visible" in the current search path, as is the rule for other schema-qualifiable object types. Although in common situations the output won't change, this seems to me (tgl) to be a bug worthy of back-patching, hence do so. Kyotaro Horiguchi, per a complaint from me Discussion: https://postgr.es/m/20180522.182020.114074746.horiguchi.kyotaro@lab.ntt.co.jp
* Widen COPY FROM's current-line-number counter from 32 to 64 bits.Tom Lane2018-05-22
| | | | | | | | | | | | | | | Because the code for the HEADER option skips a line when this counter is zero, a very long COPY FROM WITH HEADER operation would drop a line every 2^32 lines. A lesser but still unfortunate problem is that errors would show a wrong input line number for errors occurring beyond the 2^31'st input line. While such large input streams seemed impractical when this code was first written, they're not any more. Widening the counter (and some associated variables) to uint64 should be enough to prevent problems for the foreseeable future. David Rowley Discussion: https://postgr.es/m/CAKJS1f88yh-6wwEfO6QLEEvH3BEugOq2QX1TOja0vCauoynmOQ@mail.gmail.com
* Fix SQL:2008 FETCH FIRST syntax to allow parameters.Andrew Gierth2018-05-21
| | | | | | | | | | | | | | | | | | | OFFSET <x> ROWS FETCH FIRST <y> ROWS ONLY syntax is supposed to accept <simple value specification>, which includes parameters as well as literals. When this syntax was added all those years ago, it was done inconsistently, with <x> and <y> being different subsets of the standard syntax. Rectify that by making <x> and <y> accept the same thing, and allowing either a (signed) numeric literal or a c_expr there, which allows for parameters, variables, and parenthesized arbitrary expressions. Per bug #15200 from Lukas Eder. Backpatch all the way, since this has been broken from the start. Discussion: https://postgr.es/m/877enz476l.fsf@news-spur.riddles.org.uk Discussion: http://postgr.es/m/152647780335.27204.16895288237122418685@wrigleys.postgresql.org
* Fix unsafe usage of strerror(errno) within ereport().Tom Lane2018-05-21
| | | | | | | | This is the converse of the unsafe-usage-of-%m problem: the reason ereport/elog provide that format code is mainly to dodge the hazard of errno getting changed before control reaches functions within the arguments of the macro. I only found one instance of this hazard, but it's been there since 9.4 :-(.
* Fix error message on short read of pg_controlMagnus Hagander2018-05-18
| | | | | Instead of saying "error: success", indicate that we got a working read but it was too short.
* Fix misprocessing of equivalence classes involving record_eq().Tom Lane2018-05-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | canonicalize_ec_expression() is supposed to agree with coerce_type() as to whether a RelabelType should be inserted to make a subexpression be valid input for the operators of a given opclass. However, it did the wrong thing with named-composite-type inputs to record_eq(): it put in a RelabelType to RECORDOID, which the parser doesn't. In some cases this was harmless because all code paths involving a particular equivalence class did the same thing, but in other cases this would result in failing to recognize a composite-type expression as being a member of an equivalence class that it actually is a member of. The most obvious bad effect was to fail to recognize that an index on a composite column could provide the sort order needed for a mergejoin on that column, as reported by Teodor Sigaev. I think there might be other, subtler, cases that result in misoptimization. It also seems possible that an unwanted RelabelType would sometimes get into an emitted plan --- but because record_eq and friends don't examine the declared type of their input expressions, that would not create any visible problems. To fix, just treat RECORDOID as if it were a polymorphic type, which in some sense it is. We might want to consider formalizing that a bit more someday, but for the moment this seems to be the only place where an IsPolymorphicType() test ought to include RECORDOID as well. This has been broken for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/a6b22369-e3bf-4d49-f59d-0c41d3551e81@sigaev.ru
* Fix type checking for support functions of parallel VARIADIC aggregates.Tom Lane2018-05-15
| | | | | | | | | | | | | | | | | | | | | The impact of VARIADIC on the combine/serialize/deserialize support functions of an aggregate wasn't thought through carefully. There is actually no impact, because variadicity isn't passed through to these functions (and it doesn't seem like it would need to be). However, lookup_agg_function was mistakenly told to check things as though it were passed through. The net result was that it was impossible to declare an aggregate that had both VARIADIC input and parallelism support functions. In passing, fix a runtime check in nodeAgg.c for the combine function's strictness to make its error message agree with the creation-time check. The previous message was actually backwards, and it doesn't seem like there's a good reason to have two versions of this message text anyway. Back-patch to 9.6 where parallel aggregation was introduced. Alexey Bashtanov; message fix by me Discussion: https://postgr.es/m/f86dde87-fef4-71eb-0480-62754aaca01b@imap.cc
* Update time zone data files to tzdata release 2018e.Tom Lane2018-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | | DST law changes in North Korea. Redefinition of "daylight savings" in Ireland, as well as for some past years in Namibia and Czechoslovakia. Additional historical corrections for Czechoslovakia. With this change, the IANA database models Irish timekeeping as following "standard time" in summer, and "daylight savings" in winter, so that the daylight savings offset is one hour behind standard time not one hour ahead. This does not change their UTC offset (+1:00 in summer, 0:00 in winter) nor their timezone abbreviations (IST in summer, GMT in winter), though now "IST" is more correctly read as "Irish Standard Time" not "Irish Summer Time". However, the "is_dst" column in the pg_timezone_names view will now be true in winter and false in summer for the Europe/Dublin zone. Similar changes were made for Namibia between 1994 and 2017, and for Czechoslovakia between 1946 and 1947. So far as I can find, no Postgres internal logic cares about which way tm_isdst is reported; in particular, since commit b2cbced9e we do not rely on it to decide how to interpret ambiguous timestamps during DST transitions. So I don't think this change will affect any Postgres behavior other than the timezone-view outputs. Discussion: https://postgr.es/m/30996.1525445902@sss.pgh.pa.us
* Translation updatesPeter Eisentraut2018-05-07
| | | | | Source-Git-URL: git://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: 468bfbb8c2a61a4f396a8efdbf2b661c9afac3c2
* Fix scenario where streaming standby gets stuck at a continuation record.Heikki Linnakangas2018-05-05
| | | | | | | | | | | | | | | | | | | | | If a continuation record is split so that its first half has already been removed from the master, and is only present in pg_wal, and there is a recycled WAL segment in the standby server that looks like it would contain the second half, recovery would get stuck. The code in XLogPageRead() incorrectly started streaming at the beginning of the WAL record, even if we had already read the first page. Backpatch to 9.4. In principle, older versions have the same problem, but without replication slots, there was no straightforward mechanism to prevent the master from recycling old WAL that was still needed by standby. Without such a mechanism, I think it's reasonable to assume that there's enough slack in how many old segments are kept around to not run into this, or you have a WAL archive. Reported by Jonathon Nelson. Analysis and patch by Kyotaro HORIGUCHI, with some extra comments by me. Discussion: https://www.postgresql.org/message-id/CACJqAM3xVz0JY1XFDKPP%2BJoJAjoGx%3DGNuOAshEDWCext7BFvCQ%40mail.gmail.com
* Don't mark pages all-visible spuriouslyAlvaro Herrera2018-05-04
| | | | | | | | | | | | | | | | | | | | | | | | | Dan Wood diagnosed a long-standing problem that pages containing tuples that are locked by multixacts containing live lockers may spuriously end up as candidates for getting their all-visible flag set. This has the long-term effect that multixacts remain unfrozen; this may previously pass undetected, but since commit XYZ it would be reported as "ERROR: found multixact 134100944 from before relminmxid 192042633" because when a later vacuum tries to freeze the page it detects that a multixact that should have gotten frozen, wasn't. Dan proposed a (correct) patch that simply sets a variable to its correct value, after a bogus initialization. But, per discussion, it seems better coding to avoid the bogus initializations altogether, since they could give rise to more bugs later. Therefore this fix rewrites the logic a little bit to avoid depending on the bogus initializations. This bug was part of a family introduced in 9.6 by commit a892234f830e; later, commit 38e9f90a227d fixed most of them, but this one was unnoticed. Authors: Dan Wood, Pavan Deolasee, Álvaro Herrera Reviewed-by: Masahiko Sawada, Pavan Deolasee, Álvaro Herrera Discussion: https://postgr.es/m/84EBAC55-F06D-4FBE-A3F3-8BDA093CE3E3@amazon.com
* Add HOLD_INTERRUPTS section into FinishPreparedTransaction.Teodor Sigaev2018-05-03
| | | | | | | | | | | | | If an interrupt arrives in the middle of FinishPreparedTransaction and any callback decide to call CHECK_FOR_INTERRUPTS (e.g. RemoveTwoPhaseFile can write a warning with ereport, which checks for interrupts) then it's possible to leave current GXact undeleted. Backpatch to all supported branches Stas Kelvich Discussion: ihttps://www.postgresql.org/message-id/3AD85097-A3F3-4EBA-99BD-C38EDF8D2949@postgrespro.ru
* Revert back-branch changes in power()'s behavior for NaN inputs.Tom Lane2018-05-02
| | | | | | | | | Per discussion, the value of fixing these bugs in the back branches doesn't outweigh the downsides of changing corner-case behavior in a minor release. Hence, revert commits 217d8f3a1 and 4d864de48 in the v10 branch and the corresponding commits in 9.3-9.6. Discussion: https://postgr.es/m/75DB81BEEA95B445AE6D576A0A5C9E936A73E741@BPXM05GP.gisp.nec.co.jp
* Fix bogus code for extracting extended-statistics data from syscache.Tom Lane2018-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | statext_dependencies_load and statext_ndistinct_load were not up to snuff, in addition to being randomly different from each other. In detail: * Deserialize the fetched bytea value before releasing the syscache entry, not after. This mistake causes visible regression test failures when running with -DCATCACHE_FORCE_RELEASE. Since it's not exposed by -DCLOBBER_CACHE_ALWAYS, I think there may be no production hazard here at present, but it's at least a latent bug. * Use DatumGetByteaPP not DatumGetByteaP to save a detoasting cycle for short stats values; the deserialize function has to be, and is, prepared for short-header values since its other caller uses PP. * Use a test-and-elog for null stats values in both functions, rather than a test-and-elog in one case and an Assert in the other. Perhaps Asserts would be sufficient in both cases, but I don't see a good argument for them being different. * Minor cosmetic changes to make these functions more visibly alike. Backpatch to v10 where this code came in. Amit Langote, minor additional hacking by me Discussion: https://postgr.es/m/1349aabb-3a1f-6675-9fc0-65e2ce7491dd@lab.ntt.co.jp
* Avoid wrong results for power() with NaN input on more platforms.Tom Lane2018-04-29
| | | | | | | | | | | | | | | Buildfarm results show that the modern POSIX rule that 1 ^ NaN = 1 is not honored on *BSD until relatively recently, and really old platforms don't believe that NaN ^ 0 = 1 either. (This is unsurprising, perhaps, since SUSv2 doesn't require either behavior.) In hopes of getting to platform independent behavior, let's deal with all the NaN-input cases explicitly in dpow(). Note that numeric_power() doesn't know either of these special cases. But since that behavior is platform-independent, I think it should be addressed separately, and probably not back-patched. Discussion: https://postgr.es/m/75DB81BEEA95B445AE6D576A0A5C9E936A73E741@BPXM05GP.gisp.nec.co.jp
* Avoid wrong results for power() with NaN input on some platforms.Tom Lane2018-04-29
| | | | | | | | | | | | | | | Per spec, the result of power() should be NaN if either input is NaN. It appears that on some versions of Windows, the libc function does return NaN, but it also sets errno = EDOM, confusing our code that attempts to work around shortcomings of other platforms. Hence, add guard tests to avoid substituting a wrong result for the right one. It's been like this for a long time (and the odd behavior only appears in older MSVC releases, too) so back-patch to all supported branches. Dang Minh Huong, reviewed by David Rowley Discussion: https://postgr.es/m/75DB81BEEA95B445AE6D576A0A5C9E936A73E741@BPXM05GP.gisp.nec.co.jp
* Remove outdated comment on how to set logtape's read buffer size.Heikki Linnakangas2018-04-27
| | | | | | | Commit b75f467b6e removed the LogicalTapeAssignReadBufferSize() function, but forgot to update this comment. The read buffer size is an argument to LogicalTapeRewindForRead() now. Doesn't seem worth going into the details in the file header comment, so remove the outdated sentence altogether.
* Fix handling of partition bounds for boolean partitioning columns.Tom Lane2018-04-23
| | | | | | | | | | | | | | | | | Previously, you could partition by a boolean column as long as you spelled the bound values as string literals, for instance FOR VALUES IN ('t'). The trouble with this is that ruleutils.c printed that as FOR VALUES IN (TRUE), which is reasonable syntax but wasn't accepted by the grammar. That results in dump-and-reload failures for such cases. Apply a minimal fix that just causes TRUE and FALSE to be converted to strings 'true' and 'false'. This is pretty grotty, but it's too late for a more principled fix in v11 (to say nothing of v10). We should revisit the whole issue of how partition bound values are parsed for v12. Amit Langote Discussion: https://postgr.es/m/e05c5162-1103-7e37-d1ab-6de3e0afaf70@lab.ntt.co.jp
* Fix race conditions when an event trigger is added concurrently with DDL.Tom Lane2018-04-20
| | | | | | | | | | | | | | | | | | | | | | | EventTriggerTableRewrite crashed if there were table_rewrite triggers present, but there had not been when the calling command started. EventTriggerDDLCommandEnd called ddl_command_end triggers if present, even if there had been no such triggers when the calling command started, which would lead to a failure in pg_event_trigger_ddl_commands. In both cases, fix by doing nothing; it's better to wait till the next command when things will be properly initialized. In passing, remove an elog(DEBUG1) call that might have seemed interesting four years ago but surely isn't today. We found this because of intermittent failures in the buildfarm. Thanks to Alvaro Herrera and Andrew Gierth for analysis. Back-patch to 9.5; some of this code exists before that, but the specific hazards we need to guard against don't. Discussion: https://postgr.es/m/5767.1523995174@sss.pgh.pa.us
* Change more places to be less trusting of RestrictInfo.is_pushed_down.Tom Lane2018-04-20
| | | | | | | | | | | | | | | | | | | | | On further reflection, commit e5d83995e didn't go far enough: pretty much everywhere in the planner that examines a clause's is_pushed_down flag ought to be changed to use the more complicated behavior where we also check the clause's required_relids. Otherwise we could make incorrect decisions about whether, say, a clause is safe to use as a hash clause. Some (many?) of these places are safe as-is, either because they are never reached while considering a parameterized path, or because there are additional checks that would reject a pushed-down clause anyway. However, it seems smarter to just code them all the same way rather than rely on easily-broken reasoning of that sort. In support of that, invent a new macro RINFO_IS_PUSHED_DOWN that should be used in place of direct tests on the is_pushed_down flag. Like the previous patch, back-patch to all supported branches. Discussion: https://postgr.es/m/f8128b11-c5bf-3539-48cd-234178b2314d@proxel.se
* Fix incorrect handling of join clauses pushed into parameterized paths.Tom Lane2018-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases a clause attached to an outer join can be pushed down into the outer join's RHS even though the clause is not degenerate --- this can happen if we choose to make a parameterized path for the RHS. If the clause ends up attached to a lower outer join, we'd misclassify it as being a "join filter" not a plain "filter" condition at that node, leading to wrong query results. To fix, teach extract_actual_join_clauses to examine each join clause's required_relids, not just its is_pushed_down flag. (The latter now seems vestigial, or at least in need of rethinking, but we won't do anything so invasive as redefining it in a bug-fix patch.) This has been wrong since we introduced parameterized paths in 9.2, though it's evidently hard to hit given the lack of previous reports. The test case used here involves a lateral function call, and I think that a lateral reference may be required to get the planner to select a broken plan; though I wouldn't swear to that. In any case, even if LATERAL is needed to trigger the bug, it still affects all supported branches, so back-patch to all. Per report from Andreas Karlsson. Thanks to Andrew Gierth for preliminary investigation. Discussion: https://postgr.es/m/f8128b11-c5bf-3539-48cd-234178b2314d@proxel.se
* Better fix for deadlock hazard in CREATE INDEX CONCURRENTLY.Tom Lane2018-04-18
| | | | | | | | | | | | | | Commit 54eff5311 did not account for the possibility that we'd have a transaction snapshot due to default_transaction_isolation being set high enough to require one. The transaction snapshot is enough to hold back our advertised xmin and thus risk deadlock anyway. The only way to get rid of that snap is to start a new transaction, so let's do that instead. Also throw in an assert checking that we really have gotten to a state where no xmin is being advertised. Back-patch to 9.4, like the previous commit. Discussion: https://postgr.es/m/CAMkU=1ztk3TpQdcUNbxq93pc80FrXUjpDWLGMeVBDx71GHNwZQ@mail.gmail.com
* Fix broken collation-aware searches in SP-GiST text opclass.Tom Lane2018-04-16
| | | | | | | | | | | | | | | | | | | | | | | | | spg_text_leaf_consistent() supposed that it should compare only Min(querylen, entrylen) bytes of the two strings, and then deal with any excess bytes in one string or the other by assuming the longer string is greater if the prefixes are equal. Quite aside from the fact that that's just wrong in some locales (e.g., 'ch' is not less than 'd' in cs_CZ), it also risked passing incomplete multibyte characters to strcoll(), with ensuing bad results. Instead, just pass the full strings to varstr_cmp, and let it decide what to do about unequal-length strings. Fortunately, this error doesn't imply any index corruption, it's just that searches might return the wrong set of entries. Per report from Emre Hasegeli, though this is not his patch. Thanks to Peter Geoghegan for review and discussion. This code was born broken, so back-patch to all supported branches. In HEAD, I failed to resist the temptation to do a bit of cosmetic cleanup/pgindent'ing on 710d90da1, too. Discussion: https://postgr.es/m/CAE2gYzzb6K51VnTq5i5p52z+j9p2duEa-K1T3RrC_GQEynAKEg@mail.gmail.com
* Fix bogus affix-merging code.Tom Lane2018-04-12
| | | | | | | | | | | | | | NISortAffixes() compared successive compound affixes incorrectly, thus possibly failing to merge identical affixes, or (less likely) merging ones that shouldn't be merged. The user-visible effects of this are unclear, to me anyway. Per bug #15150 from Alexander Lakhin. It's been broken for a long time, so back-patch to all supported branches. Arthur Zakirov Discussion: https://postgr.es/m/152353327780.31225.13445405496721177988@wrigleys.postgresql.org
* Use the right memory context for partkey's FmgrInfoAlvaro Herrera2018-04-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were using CurrentMemoryContext to put the partsupfunc fmgr_info into, which isn't right, because we want the PartitionKey as a whole to be in the isolated Relation->rd_partkeycxt context. This can cause a crash with user-defined support functions in the operator classes used by partitioning keys. (Maybe this can cause problems with core-supplied opclasses too, not sure.) This is demonstrably broken in Postgres 10, too, but the initial proposed fix runs afoul of a problem discussed back when 8a0596cb656e ("Get rid of copy_partition_key") reorganized that code: namely that it is possible to jump out of RelationBuildPartitionKey because of some error and leave a dangling memory context child of CacheMemoryContext. Also, while reviewing this I noticed that the removed-in-pg11 copy_partition_key was doing something wrong, unfixed in pg10, namely doing memcpy() on the FmgrInfo, which is bogus (should be doing fmgr_info_copy). Therefore, in branch pg10, the sane fix seems to be to backpatch both the aforementioned 8a0596cb656e and its followup be2343221fb7 ("Protect against hypothetical memory leaks in RelationGetPartitionKey"), so do that, then apply the fmgr_info memcxt bugfix on top. Add a test case exercising btree-based custom operator classes, which causes a crash prior to this fix. This is not a security problem, because in order to create an operator class you need superuser privileges anyway. Authors: Álvaro Herrera and Amit Langote Reported and diagnosed by: Amit Langote Discussion: https://postgr.es/m/3041e853-b1dd-a0c6-ff21-7cc5633bffd0@lab.ntt.co.jp
* Ignore nextOid when replaying an ONLINE checkpoint.Tom Lane2018-04-11
| | | | | | | | | | | | | | | The nextOid value is from the start of the checkpoint and may well be stale compared to values from more recent XLOG_NEXTOID records. Previously, we adopted it anyway, allowing the OID counter to go backwards during a crash. While this should be harmless, it contributed to the severity of the bug fixed in commit 0408e1ed5, by allowing duplicate TOAST OIDs to be assigned immediately following a crash. Without this error, that issue would only have arisen when TOAST objects just younger than a multiple of 2^32 OIDs were deleted and then not vacuumed in time to avoid a conflict. Pavan Deolasee Discussion: https://postgr.es/m/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-=h4OLosXHf9w@mail.gmail.com
* Do not select new object OIDs that match recently-dead entries.Tom Lane2018-04-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When selecting a new OID, we take care to avoid picking one that's already in use in the target table, so as not to create duplicates after the OID counter has wrapped around. However, up to now we used SnapshotDirty when scanning for pre-existing entries. That ignores committed-dead rows, so that we could select an OID matching a deleted-but-not-yet-vacuumed row. While that mostly worked, it has two problems: * If recently deleted, the dead row might still be visible to MVCC snapshots, creating a risk for duplicate OIDs when examining the catalogs within our own transaction. Such duplication couldn't be visible outside the object-creating transaction, though, and we've heard few if any field reports corresponding to such a symptom. * When selecting a TOAST OID, deleted toast rows definitely *are* visible to SnapshotToast, and will remain so until vacuumed away. This leads to a conflict that will manifest in errors like "unexpected chunk number 0 (expected 1) for toast value nnnnn". We've been seeing reports of such errors from the field for years, but the cause was unclear before. The fix is simple: just use SnapshotAny to search for conflicting rows. This results in a slightly longer window before object OIDs can be recycled, but that seems unlikely to create any large problems. Pavan Deolasee Discussion: https://postgr.es/m/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-=h4OLosXHf9w@mail.gmail.com
* Allocate enough shared string memory for stats of auxiliary processes.Heikki Linnakangas2018-04-11
| | | | | | | | | | | | | | This fixes a bug whereby the st_appname, st_clienthostname, and st_activity_raw fields for auxiliary processes point beyond the end of their respective shared memory segments. As a result, the application_name of a backend might show up as the client hostname of an auxiliary process. Backpatch to v10, where this bug was introduced, when the auxiliary processes were added to the array. Author: Edmund Horner Reviewed-by: Michael Paquier Discussion: https://www.postgresql.org/message-id/CAMyN-kA7aOJzBmrYFdXcc7Z0NmW%2B5jBaf_m%3D_-77uRNyKC9r%3DA%40mail.gmail.com
* Make local copy of client hostnames in backend status array.Heikki Linnakangas2018-04-11
| | | | | | | | | | | | | The other strings, application_name and query string, were snapshotted to local memory in pgstat_read_current_status(), but we forgot to do that for client hostnames. As a result, the client hostname would appear to change in the local copy, if the client disconnected. Backpatch to all supported versions. Author: Edmund Horner Reviewed-by: Michael Paquier Discussion: https://www.postgresql.org/message-id/CAMyN-kA7aOJzBmrYFdXcc7Z0NmW%2B5jBaf_m%3D_-77uRNyKC9r%3DA%40mail.gmail.com
* Fix incorrect close() call in dsm_impl_mmap().Tom Lane2018-04-10
| | | | | | | | | | | | One improbable error-exit path in this function used close() where it should have used CloseTransientFile(). This is unlikely to be hit in the field, and I think the consequences wouldn't be awful (just an elog(LOG) bleat later). But a bug is a bug, so back-patch to 9.4 where this code came in. Pan Bian Discussion: https://postgr.es/m/152056616579.4966.583293218357089052@wrigleys.postgresql.org
* Fix and improve pg_atomic_flag fallback implementation.Andres Freund2018-04-06
| | | | | | | | | | | | | | | | | | | | The atomics fallback implementation for pg_atomic_flag was broken, returning the inverted value from pg_atomic_test_set_flag(). This was unnoticed because a) atomic flags were unused until recently b) the test code wasn't run when the fallback implementation was in use (because it didn't allow to test for some edge cases). Fix the bug, and improve the fallback so it has the same behaviour as the non-fallback implementation in the problematic edge cases. That breaks ABI compatibility in the back branches when fallbacks are in use, but given they were broken until now... Author: Andres Freund Reported-by: Daniel Gustafsson Discussion: https://postgr.es/m/FB948276-7B32-4B77-83E6-D00167F8EEB4@yesql.se https://postgr.es/m/20180406233854.uni2h3mbnveczl32@alap3.anarazel.de Backpatch: 9.5-, where the atomics abstraction was introduced.
* Enforce child constraints during COPY TO a partitioned table.Robert Haas2018-04-06
| | | | | | | | | | | | | | The previous coding inadvertently checked the constraints for the partitioned table rather than the target partition, which could lead to data in a partition that fails to satisfy some constraint on that partition. This problem seems to date back to when table partitioning was introduced; prior to that, there was only one target table for a COPY, so the problem didn't occur, and the code just didn't get updated. Etsuro Fujita, reviewed by Amit Langote and Ashutosh Bapat Discussion: https://postgr.es/message-id/5ABA4074.1090500%40lab.ntt.co.jp
* Fix actual and potential double-frees around tuplesort usage.Tom Lane2018-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tuplesort_gettupleslot() passed back tuples allocated in the tuplesort's own memory context, even when the caller was responsible to free them. This created a double-free hazard, because some callers might destroy the tuplesort object (via tuplesort_end) before trying to clean up the last returned tuple. To avoid this, change the API to specify that the tuple is allocated in the caller's memory context. v10 and HEAD already did things that way, but in 9.5 and 9.6 this is a live bug that can demonstrably cause crashes with some grouping-set usages. In 9.5 and 9.6, this requires doing an extra tuple copy in some cases, which is unfortunate. But the amount of refactoring needed to avoid it seems excessive for a back-patched change, especially since the cases where an extra copy happens are less performance-critical. Likewise change tuplesort_getdatum() to return pass-by-reference Datums in the caller's context not the tuplesort's context. There seem to be no live bugs among its callers, but clearly the same sort of situation could happen in future. For other tuplesort fetch routines, continue to allocate the memory in the tuplesort's context. This is a little inconsistent with what we now do for tuplesort_gettupleslot() and tuplesort_getdatum(), but that's preferable to adding new copy overhead in the back branches where it's clearly unnecessary. These other fetch routines provide the weakest possible guarantees about tuple memory lifespan from v10 on, anyway, so this actually seems more consistent overall. Adjust relevant comments to reflect these API redefinitions. Arguably, we should change the pre-9.5 branches as well, but since there are no known failure cases there, it seems not worth the risk. Peter Geoghegan, per report from Bernd Helmle. Reviewed by Kyotaro Horiguchi; thanks also to Andreas Seltenreich for extracting a self-contained test case. Discussion: https://postgr.es/m/1512661638.9720.34.camel@oopsware.de
* Fix thinko in commentAlvaro Herrera2018-03-26
| | | | | | | | | | | | The listed numbers disagreed with the ones being used in the symbols; but instead of just fixing the numbers in the comment, use the symbolic name instead, which seems clearer. This has been wrong all along, so apply back to 9.5 where BRIN was introduced. Reported-by: Tomas Vondra Discussion: https://postgr.es/m/5ff514f2-8b1e-6366-b11c-8e2ed442562d@2ndquadrant.com
* Fix typoAlvaro Herrera2018-03-26
|
* Fix make rules that generate multiple output files.Tom Lane2018-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For years, our makefiles have correctly observed that "there is no correct way to write a rule that generates two files". However, what we did is to provide empty rules that "generate" the secondary output files from the primary one, and that's not right either. Depending on the details of the creating process, the primary file might end up timestamped later than one or more secondary files, causing subsequent make runs to consider the secondary file(s) out of date. That's harmless in a plain build, since make will just re-execute the empty rule and nothing happens. But it's fatal in a VPATH build, since make will expect the secondary file to be rebuilt in the build directory. This would manifest as "file not found" failures during VPATH builds from tarballs, if we were ever unlucky enough to ship a tarball with apparently out-of-date secondary files. (It's not clear whether that has ever actually happened, but it definitely could.) To ensure that secondary output files have timestamps >= their primary's, change our makefile convention to be that we provide a "touch $@" action not an empty rule. Also, make sure that this rule actually gets invoked during a distprep run, else the hazard remains. It's been like this a long time, so back-patch to all supported branches. In HEAD, I skipped the changes in src/backend/catalog/Makefile, because those rules are due to get replaced soon in the bootstrap data format patch, and there seems no need to create a merge issue for that patch. If for some reason we fail to land that patch in v11, we'll need to back-fill the changes in that one makefile from v10. Discussion: https://postgr.es/m/18556.1521668179@sss.pgh.pa.us