aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Fix spinlock implementation for some !solaris sparc platforms.Andres Freund2014-09-09
| | | | | | | | | | | | | | | | | Some Sparc CPUs can be run in various coherence models, ranging from RMO (relaxed) over PSO (partial) to TSO (total). Solaris has always run CPUs in TSO mode while in userland, but linux didn't use to and the various *BSDs still don't. Unfortunately the sparc TAS/S_UNLOCK were only correct under TSO. Fix that by adding the necessary memory barrier instructions. On sparcv8+, which should be all relevant CPUs, these are treated as NOPs if the current consistency model doesn't require the barriers. Discussion: 20140630222854.GW26930@awork2.anarazel.de Will be backpatched to all released branches once a few buildfarm cycles haven't shown up problems. As I've no access to sparc, this is blindly written.
* Fix segmentation fault that an empty prepared statement could cause.Fujii Masao2014-09-05
| | | | | | Back-patch to all supported branches. Per bug #11335 from Haruka Takatsuka
* Fix outdated commentAlvaro Herrera2014-08-22
|
* Fix bogus return macros in range_overright_internal().Tom Lane2014-08-16
| | | | | | | | | PG_RETURN_BOOL() should only be used in functions following the V1 SQL function API. This coding accidentally fails to fail since letting the compiler coerce the Datum representation of bool back to plain bool does give the right answer; but that doesn't make it a good idea. Back-patch to older branches just to avoid unnecessary code divergence.
* Fix failure to follow the directions when "init" fork was added.Fujii Masao2014-08-11
| | | | | | | | | | Specifically this commit updates forkname_to_number() so that the HINT message includes "init" fork, and also adds the description of "init" fork into pg_relation_size() document. This is a part of the commit 2d00190495b22e0d0ba351b2cda9c95fb2e3d083 which has fixed the same oversight in master and 9.4. Back-patch to 9.1 where "init" fork was added.
* Fix conversion of domains to JSON in 9.3 and 9.2.Tom Lane2014-08-09
| | | | | | | | | | | | | | | | | | | | | | | In commit 0ca6bda8e7501947c05f30c127f6d12ff90b5a64, I rewrote the json.c code that decided how to convert SQL data types into JSON values, so that it no longer relied on typcategory which is a pretty untrustworthy guide to the output format of user-defined datatypes. However, I overlooked the fact that CREATE DOMAIN inherits typcategory from the base type, so that the old coding did have the desirable property of treating domains like their base types --- but only in some cases, because not all its decisions turned on typcategory. The version of the patch that went into 9.4 and up did a getBaseType() call to ensure that domains were always treated like their base types, but I omitted that from the older branches, because it would result in a behavioral change for domains over json or hstore; a change that's arguably a bug fix, but nonetheless a change that users had not asked for. What I overlooked was that this meant that domains over numerics and boolean were no longer treated like their base types, and that we *did* get a complaint about, ie bug #11103 from David Grelaud. So let's do the getBaseType() call in the older branches as well, to restore their previous behavior in these cases. That means 9.2 and 9.3 will now make these decisions just like 9.4. We could probably kluge things to still ignore the domain's base type if it's json etc, but that seems a bit silly.
* Reject duplicate column names in foreign key referenced-columns lists.Tom Lane2014-08-09
| | | | | | | | | | | | | Such cases are disallowed by the SQL spec, and even if we wanted to allow them, the semantics seem ambiguous: how should the FK columns be matched up with the columns of a unique index? (The matching could be significant in the presence of opclasses with different notions of equality, so this issue isn't just academic.) However, our code did not previously reject such cases, but instead would either fail to match to any unique index, or generate a bizarre opclass-lookup error because of sloppy thinking in the index-matching code. David Rowley
* pg_upgrade: prevent oid conflicts with new-cluster TOAST tablesBruce Momjian2014-08-07
| | | | | | | | Previously, TOAST tables only required in the new cluster could cause oid conflicts if they were auto-numbered and a later conflicting oid had to be assigned. Backpatch through 9.3
* Avoid wholesale autovacuuming when autovacuum is nominally off.Tom Lane2014-07-30
| | | | | | | | | | | | | When autovacuum is nominally off, we will still launch autovac workers to vacuum tables that are at risk of XID wraparound. But after we'd done that, an autovac worker would proceed to autovacuum every table in the targeted database, if they meet the usual thresholds for autovacuuming. This is at best pretty unexpected; at worst it delays response to the wraparound threat. Fix it so that if autovacuum is nominally off, we *only* do forced vacuums and not any other work. Per gripe from Andrey Zhidenkov. This has been like this all along, so back-patch to all supported branches.
* Fix mishandling of background worker PGPROCs in EXEC_BACKEND builds.Robert Haas2014-07-30
| | | | | | | | | | InitProcess() relies on IsBackgroundWorker to decide whether the PGPROC for a new backend should be taken from ProcGlobal's freeProcs or from bgworkerFreeProcs. In EXEC_BACKEND builds, InitProcess() is called sooner than in non-EXEC_BACKEND builds, and IsBackgroundWorker wasn't getting initialized soon enough. Report by Noah Misch. Diagnosis and fix by me.
* Treat 2PC commit/abort the same as regular xacts in recovery.Heikki Linnakangas2014-07-29
| | | | | | | | | | | | | | | | | There were several oversights in recovery code where COMMIT/ABORT PREPARED records were ignored: * pg_last_xact_replay_timestamp() (wasn't updated for 2PC commits) * recovery_min_apply_delay (2PC commits were applied immediately) * recovery_target_xid (recovery would not stop if the XID used 2PC) The first of those was reported by Sergiy Zuban in bug #11032, analyzed by Tom Lane and Andres Freund. The bug was always there, but was masked before commit d19bd29f07aef9e508ff047d128a4046cc8bc1e2, because COMMIT PREPARED always created an extra regular transaction that was WAL-logged. Backpatch to all supported versions (older versions didn't have all the features and therefore didn't have all of the above bugs).
* Avoid access to already-released lock in LockRefindAndRelease.Robert Haas2014-07-24
| | | | Spotted by Tom Lane.
* Re-enable error for "SELECT ... OFFSET -1".Tom Lane2014-07-22
| | | | | | | | | | | The executor has thrown errors for negative OFFSET values since 8.4 (see commit bfce56eea45b1369b7bb2150a150d1ac109f5073), but in a moment of brain fade I taught the planner that OFFSET with a constant negative value was a no-op (commit 1a1832eb085e5bca198735e5d0e766a3cb61b8fc). Reinstate the former behavior by only discarding OFFSET with a value of exactly 0. In passing, adjust a planner comment that referenced the ancient behavior. Back-patch to 9.3 where the mistake was introduced.
* Check block number against the correct fork in get_raw_page().Tom Lane2014-07-22
| | | | | | | | | | | | | | | | | | get_raw_page tried to validate the supplied block number against RelationGetNumberOfBlocks(), which of course is only right when accessing the main fork. In most cases, the main fork is longer than the others, so that the check was too weak (allowing a lower-level error to be reported, but no real harm to be done). However, very small tables could have an FSM larger than their heap, in which case the mistake prevented access to some FSM pages. Per report from Torsten Foertsch. In passing, make the bad-block-number error into an ereport not elog (since it's certainly not an internal error); and fix sloppily maintained comment for RelationGetNumberOfBlocksInFork. This has been wrong since we invented relation forks, so back-patch to all supported branches.
* Reject out-of-range numeric timezone specifications.Tom Lane2014-07-21
| | | | | | | | | | | | | | | In commit 631dc390f49909a5c8ebd6002cfb2bcee5415a9d, we started to handle simple numeric timezone offsets via the zic library instead of the old CTimeZone/HasCTZSet kluge. However, we overlooked the fact that the zic code will reject UTC offsets exceeding a week (which seems a bit arbitrary, but not because it's too tight ...). This led to possibly setting session_timezone to NULL, which results in crashes in most timezone-related operations as of 9.4, and crashes in a small number of places even before that. So check for NULL return from pg_tzset_offset() and report an appropriate error message. Per bug #11014 from Duncan Gillis. Back-patch to all supported branches, like the previous patch. (Unfortunately, as of today that no longer includes 8.4.)
* Adjust cutoff points in newly-added sanity tests.Tom Lane2014-07-21
| | | | Per recommendation from Andres.
* Defend against bad relfrozenxid/relminmxid/datfrozenxid/datminmxid values.Tom Lane2014-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit a61daa14d56867e90dc011bbba52ef771cea6770, we fixed pg_upgrade so that it would install sane relminmxid and datminmxid values, but that does not cure the problem for installations that were already pg_upgraded to 9.3; they'll initially have "1" in those fields. This is not a big problem so long as 1 is "in the past" compared to the current nextMultiXact counter. But if an installation were more than halfway to the MXID wrap point at the time of upgrade, 1 would appear to be "in the future" and that would effectively disable tracking of oldest MXIDs in those tables/databases, until such time as the counter wrapped around. While in itself this isn't worse than the situation pre-9.3, where we did not manage MXID wraparound risk at all, the consequences of premature truncation of pg_multixact are worse now; so we ought to make some effort to cope with this. We discussed advising users to fix the tracking values manually, but that seems both very tedious and very error-prone. Instead, this patch adopts two amelioration rules. First, a relminmxid value that is "in the future" is allowed to be overwritten with a full-table VACUUM's actual freeze cutoff, ignoring the normal rule that relminmxid should never go backwards. (This essentially assumes that we have enough defenses in place that wraparound can never occur anymore, and thus that a value "in the future" must be corrupt.) Second, if we see any "in the future" values then we refrain from truncating pg_clog and pg_multixact. This prevents loss of clog data until we have cleaned up all the broken tracking data. In the worst case that could result in considerable clog bloat, but in practice we expect that relfrozenxid-driven freezing will happen soon enough to fix the problem before clog bloat becomes intolerable. (Users could do manual VACUUM FREEZEs if not.) Note that this mechanism cannot save us if there are already-wrapped or already-truncated-away MXIDs in the table; it's only capable of dealing with corrupt tracking values. But that's the situation we have with the pg_upgrade bug. For consistency, apply the same rules to relfrozenxid/datfrozenxid. There are not known mechanisms for these to get messed up, but if they were, the same tactics seem appropriate for fixing them.
* Translation updatesPeter Eisentraut2014-07-21
|
* Partial fix for dropped columns in functions returning composite.Tom Lane2014-07-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a view has a function-returning-composite in FROM, and there are some dropped columns in the underlying composite type, ruleutils.c printed junk in the column alias list for the reconstructed FROM entry. Before 9.3, this was prevented by doing get_rte_attribute_is_dropped tests while printing the column alias list; but that solution is not currently available to us for reasons I'll explain below. Instead, check for empty-string entries in the alias list, which can only exist if that column position had been dropped at the time the view was made. (The parser fills in empty strings to preserve the invariant that the aliases correspond to physical column positions.) While this is sufficient to handle the case of columns dropped before the view was made, we have still got issues with columns dropped after the view was made. In particular, the view could contain Vars that explicitly reference such columns! The dependency machinery really ought to refuse the column drop attempt in such cases, as it would do when trying to drop a table column that's explicitly referenced in views. However, we currently neglect to store dependencies on columns of composite types, and fixing that is likely to be too big to be back-patchable (not to mention that existing views in existing databases would not have the needed pg_depend entries anyway). So I'll leave that for a separate patch. Pre-9.3, ruleutils would print such Vars normally (with their original column names) even though it suppressed their entries in the RTE's column alias list. This is certainly bogus, since the printed view definition would fail to reload, but at least it didn't crash. However, as of 9.3 the printed column alias list is tightly tied to the names printed for Vars; so we can't treat columns as dropped for one purpose and not dropped for the other. This is why we can't just put back the get_rte_attribute_is_dropped test: it results in an assertion failure if the view in fact contains any Vars referencing the dropped column. Once we've got dependencies preventing such cases, we'll probably want to do it that way instead of relying on the empty-string test used here. This fix turned up a very ancient bug in outfuncs/readfuncs, namely that T_String nodes containing empty strings were not dumped/reloaded correctly: the node was printed as "<>" which is read as a string value of <>. Since (per SQL) we disallow empty-string identifiers, such nodes don't occur normally, which is why we'd not noticed. (Such nodes aren't used for literal constants, just identifiers.) Per report from Marc Schablewski. Back-patch to 9.3 which is where the rule printing behavior changed. The dangling-variable case is broken all the way back, but that's not what his complaint is about.
* Fix two low-probability memory leaks in regular expression parsing.Tom Lane2014-07-18
| | | | | | | | | | | | | | | | | If pg_regcomp failed after having invoked markst/cleanst, it would leak any "struct subre" nodes it had created. (We've already detected all regex syntax errors at that point, so the only likely causes of later failure would be query cancel or out-of-memory.) To fix, make sure freesrnode knows the difference between the pre-cleanst and post-cleanst cleanup procedures. Add some documentation of this less-than-obvious point. Also, newlacon did the wrong thing with an out-of-memory failure from realloc(), so that the previously allocated array would be leaked. Both of these are pretty low-probability scenarios, but a bug is a bug, so patch all the way back. Per bug #10976 from Arthur O'Dwyer.
* Fix bugs in SP-GiST search with range type's -|- (adjacent) operator.Heikki Linnakangas2014-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The consistent function contained several bugs: * The "if (which2) { ... }" block was broken. It compared the argument's lower bound against centroid's upper bound, while it was supposed to compare the argument's upper bound against the centroid's lower bound (the comment was correct, code was wrong). Also, it cleared bits in the "which1" variable, while it was supposed to clear bits in "which2". * If the argument's upper bound was equal to the centroid's lower bound, we descended to both halves (= all quadrants). That's unnecessary, searching the right quadrants is sufficient. This didn't lead to incorrect query results, but was clearly wrong, and slowed down queries unnecessarily. * In the case that argument's lower bound is adjacent to the centroid's upper bound, we also don't need to visit all quadrants. Per similar reasoning as previous point. * The code where we compare the previous centroid with the current centroid should match the code where we compare the current centroid with the argument. The point of that code is to redo the calculation done in the previous level, to see if we were supposed to traverse left or right (or up or down), and if we actually did. If we moved in the different direction, then we know there are no matches for bound. Refactor the code and adds comments to make it more readable and easier to reason about. Backpatch to 9.3 where SP-GiST support for range types was introduced.
* Fix REASSIGN OWNED for text search objectsAlvaro Herrera2014-07-15
| | | | | | | | | | | | | | | | | | | | | Trying to reassign objects owned by a user that had text search dictionaries or configurations used to fail with: ERROR: unexpected classid 3600 or ERROR: unexpected classid 3602 Fix by adding cases for those object types in a switch in pg_shdepend.c. Both REASSIGN OWNED and text search objects go back all the way to 8.1, so backpatch to all supported branches. In 9.3 the alter-owner code was made generic, so the required change in recent branches is pretty simple; however, for 9.2 and older ones we need some additional reshuffling to enable specifying objects by OID rather than name. Text search templates and parsers are not owned objects, so there's no change required for them. Per bug #9749 reported by Michal Novotný
* Fix bug with whole-row references to append subplans.Tom Lane2014-07-11
| | | | | | | | | | | | | | ExecEvalWholeRowVar incorrectly supposed that it could "bless" the source TupleTableSlot just once per query. But if the input is coming from an Append (or, perhaps, other cases?) more than one slot might be returned over the query run. This led to "record type has not been registered" errors when a composite datum was extracted from a non-blessed slot. This bug has been there a long time; I guess it escaped notice because when dealing with subqueries the planner tends to expand whole-row Vars into RowExprs, which don't have the same problem. It is possible to trigger the problem in all active branches, though, as illustrated by the added regression test.
* Don't assume a subquery's output is unique if there's a SRF in its tlist.Tom Lane2014-07-08
| | | | | | | | | | | | | While the x output of "select x from t group by x" can be presumed unique, this does not hold for "select x, generate_series(1,10) from t group by x", because we may expand the set-returning function after the grouping step. (Perhaps that should be re-thought; but considering all the other oddities involved with SRFs in targetlists, it seems unlikely we'll change it.) Put a check in query_is_distinct_for() so it's not fooled by such cases. Back-patch to all supported branches. David Rowley
* Add some errdetail to checkRuleResultList().Tom Lane2014-07-02
| | | | | | | | | | | | | | | | | | | This function wasn't originally thought to be really user-facing, because converting a table to a view isn't something we expect people to do manually. So not all that much effort was spent on the error messages; in particular, while the code will complain that you got the column types wrong it won't say exactly what they are. But since we repurposed the code to also check compatibility of rule RETURNING lists, it's definitely user-facing. It now seems worthwhile to add errdetail messages showing exactly what the conflict is when there's a mismatch of column names or types. This is prompted by bug #10836 from Matthias Raffelsieper, which might have been forestalled if the error message had reported the wrong column type as being "record". Per Alvaro's advice, back-patch to branches before 9.4, but resist the temptation to rephrase any existing strings there. Adding new strings is not really a translation degradation; anyway having the info presented in English is better than not having it at all.
* Have multixact be truncated by checkpoint, not vacuumAlvaro Herrera2014-06-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of truncating pg_multixact at vacuum time, do it only at checkpoint time. The reason for doing it this way is twofold: first, we want it to delete only segments that we're certain will not be required if there's a crash immediately after the removal; and second, we want to do it relatively often so that older files are not left behind if there's an untimely crash. Per my proposal in http://www.postgresql.org/message-id/20140626044519.GJ7340@eldon.alvh.no-ip.org we now execute the truncation in the checkpointer process rather than as part of vacuum. Vacuum is in only charge of maintaining in shared memory the value to which it's possible to truncate the files; that value is stored as part of checkpoints also, and so upon recovery we can reuse the same value to re-execute truncate and reset the oldest-value-still-safe-to-use to one known to remain after truncation. Per bug reported by Jeff Janes in the course of his tests involving bug #8673. While at it, update some comments that hadn't been updated since multixacts were changed. Backpatch to 9.3, where persistency of pg_multixact files was introduced by commit 0ac5ad5134f2.
* Don't allow relminmxid to go backwards during VACUUM FULLAlvaro Herrera2014-06-27
| | | | | | | | | | | | | We were allowing a table's pg_class.relminmxid value to move backwards when heaps were swapped by VACUUM FULL or CLUSTER. There is a similar protection against relfrozenxid going backwards, which we neglected to clone when the multixact stuff was rejiggered by commit 0ac5ad5134f276. Backpatch to 9.3, where relminmxid was introduced. As reported by Heikki in http://www.postgresql.org/message-id/52401AEA.9000608@vmware.com
* Fix broken Assert() introduced by 8e9a16ab8f7f0e58Alvaro Herrera2014-06-27
| | | | | | | | | | | | | | | | Don't assert MultiXactIdIsRunning if the multi came from a tuple that had been share-locked and later copied over to the new cluster by pg_upgrade. Doing that causes an error to be raised unnecessarily: MultiXactIdIsRunning is not open to the possibility that its argument came from a pg_upgraded tuple, and all its other callers are already checking; but such multis cannot, obviously, have transactions still running, so the assert is pointless. Noticed while investigating the bogus pg_multixact/offsets/0000 file left over by pg_upgrade, as reported by Andres Freund in http://www.postgresql.org/message-id/20140530121631.GE25431@alap3.anarazel.de Backpatch to 9.3, as the commit that introduced the buglet.
* Back-patch "Fix EquivalenceClass processing for nested append relations".Tom Lane2014-06-26
| | | | | | | | | | | | | | | | When we committed a87c729153e372f3731689a7be007bc2b53f1410, we somehow failed to notice that it didn't merely improve plan quality for expression indexes; there were very closely related cases that failed outright with "could not find pathkey item to sort". The failing cases seem to be those where the planner was already capable of selecting a MergeAppend plan, and there was inheritance involved: the lack of appropriate eclass child members would prevent prepare_sort_from_pathkeys() from succeeding on the MergeAppend's child plan nodes for inheritance child tables. Accordingly, back-patch into 9.1 through 9.3, along with an extra regression test case covering the problem. Per trouble report from Michael Glaesemann.
* Fix handling of nested JSON objects in json_populate_recordset and friends.Tom Lane2014-06-24
| | | | | | | | | | | | | | | | | | | | | populate_recordset_object_start() improperly created a new hash table (overwriting the link to the existing one) if called at nest levels greater than one. This resulted in previous fields not appearing in the final output, as reported by Matti Hameister in bug #10728. In 9.4 the problem also affects json_to_recordset. This perhaps missed detection earlier because the default behavior is to throw an error for nested objects: you have to pass use_json_as_text = true to see the problem. In addition, fix query-lifespan leakage of the hashtable created by json_populate_record(). This is pretty much the same problem recently fixed in dblink: creating an intended-to-be-temporary context underneath the executor's per-tuple context isn't enough to make it go away at the end of the tuple cycle, because MemoryContextReset is not MemoryContextResetAndDeleteChildren. Michael Paquier and Tom Lane
* Don't allow foreign tables with OIDs.Heikki Linnakangas2014-06-24
| | | | | | | | | | | | | | The syntax doesn't let you specify "WITH OIDS" for foreign tables, but it was still possible with default_with_oids=true. But the rest of the system, including pg_dump, isn't prepared to handle foreign tables with OIDs properly. Backpatch down to 9.1, where foreign tables were introduced. It's possible that there are databases out there that already have foreign tables with OIDs. There isn't much we can do about that, but at least we can prevent them from being created in the future. Patch by Etsuro Fujita, reviewed by Hadi Moshayedi.
* Do all-visible handling in lazy_vacuum_page() outside its critical section.Andres Freund2014-06-20
| | | | | | | | | | | | | | | | | | | | | | | | Since fdf9e21196a lazy_vacuum_page() rechecks the all-visible status of pages in the second pass over the heap. It does so inside a critical section, but both visibilitymap_test() and heap_page_is_all_visible() perform operations that should not happen inside one. The former potentially performs IO and both potentially do memory allocations. To fix, simply move all the all-visible handling outside the critical section. Doing so means that the PD_ALL_VISIBLE on the page won't be included in the full page image of the HEAP2_CLEAN record anymore. But that's fine, the flag will be set by the HEAP2_VISIBLE logged later. Backpatch to 9.3 where the problem was introduced. The bug only came to light due to the assertion added in 4a170ee9 and isn't likely to cause problems in production scenarios. The worst outcome is a avoidable PANIC restart. This also gets rid of the difference in the order of operations between master and standby mentioned in 2a8e1ac5. Per reports from David Leverton and Keith Fiske in bug #10533.
* Avoid leaking memory while evaluating arguments for a table function.Tom Lane2014-06-19
| | | | | | | | | | | | | ExecMakeTableFunctionResult evaluated the arguments for a function-in-FROM in the query-lifespan memory context. This is insignificant in simple cases where the function relation is scanned only once; but if the function is in a sub-SELECT or is on the inside of a nested loop, any memory consumed during argument evaluation can add up quickly. (The potential for trouble here had been foreseen long ago, per existing comments; but we'd not previously seen a complaint from the field about it.) To fix, create an additional temporary context just for this purpose. Per an example from MauMau. Back-patch to all active branches.
* Fix ancient encoding error in hungarian.stop.Tom Lane2014-06-10
| | | | | | | | | | | | | | | | | When we grabbed this file off the Snowball project's website, we mistakenly supposed that it was in LATIN1 encoding, but evidently it was actually in LATIN2. This resulted in ő (o-double-acute, U+0151, which is code 0xF5 in LATIN2) being misconverted into õ (o-tilde, U+00F5), as complained of in bug #10589 from Zoltán Sörös. We'd have messed up u-double-acute too, but there aren't any of those in the file. Other characters used in the file have the same codes in LATIN1 and LATIN2, which no doubt helped hide the problem for so long. The error is not only ours: the Snowball project also was confused about which encoding is required for Hungarian. But dealing with that will require source-code changes that I'm not at all sure we'll wish to back-patch. Fixing the stopword file seems reasonably safe to back-patch however.
* Fix infinite loop when splitting inner tuples in SPGiST text indexes.Tom Lane2014-06-09
| | | | | | | | | | | | | | | | | | | | | | | Previously, the code used a node label of zero both for strings that contain no bytes beyond the inner tuple's prefix, and for cases where an "allTheSame" inner tuple has to be split to allow a string with a different next byte to be inserted into it. Failing to distinguish these cases meant that if a string ending with the current prefix needed to be inserted into an allTheSame tuple, we got into an infinite loop, because after splitting the tuple we'd descend into the child allTheSame tuple and then find we need to split again. To fix, instead use -1 and -2 as the node labels for these two cases. This requires widening the node label type from "char" to int2, but fortunately SPGiST stores all pass-by-value node label types in their Datum representation, which means that this change is transparently upward compatible so far as the on-disk representation goes. We continue to recognize zero as a dummy node label for reading purposes, but will not attempt to push new index entries down into such a label, so that the loop won't occur even when dealing with an existing index. Per report from Teodor Sigaev. Back-patch to 9.2 where the faulty code was introduced.
* Wrap multixact/members correctly during extension, take 2Alvaro Herrera2014-06-09
| | | | | | | | In a50d97625497b7 I already changed this, but got it wrong for the case where the number of members is larger than the number of entries that fit in the last page of the last segment. As reported by Serge Negodyuck in a followup to bug #8673.
* Add defenses against running with a wrong selection of LOBLKSIZE.Tom Lane2014-06-05
| | | | | | | | | | | | | | | | | | | | | It's critical that the backend's idea of LOBLKSIZE match the way data has actually been divided up in pg_largeobject. While we don't provide any direct way to adjust that value, doing so is a one-line source code change and various people have expressed interest recently in changing it. So, just as with TOAST_MAX_CHUNK_SIZE, it seems prudent to record the value in pg_control and cross-check that the backend's compiled-in setting matches the on-disk data. Also tweak the code in inv_api.c so that fetches from pg_largeobject explicitly verify that the length of the data field is not more than LOBLKSIZE. Formerly we just had Asserts() for that, which is no protection at all in production builds. In some of the call sites an overlength data value would translate directly to a security-relevant stack clobber, so it seems worth one extra runtime comparison to be sure. In the back branches, we can't change the contents of pg_control; but we can still make the extra checks in inv_api.c, which will offer some amount of protection against running with the wrong value of LOBLKSIZE.
* Fix longstanding bug in HeapTupleSatisfiesVacuum().Andres Freund2014-06-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HeapTupleSatisfiesVacuum() didn't properly discern between DELETE_IN_PROGRESS and INSERT_IN_PROGRESS for rows that have been inserted in the current transaction and deleted in a aborted subtransaction of the current backend. At the very least that caused problems for CLUSTER and CREATE INDEX in transactions that had aborting subtransactions producing rows, leading to warnings like: WARNING: concurrent delete in progress within table "..." possibly in an endless, uninterruptible, loop. Instead of treating *InProgress xmins the same as *IsCurrent ones, treat them as being distinct like the other visibility routines. As implemented this separatation can cause a behaviour change for rows that have been inserted and deleted in another, still running, transaction. HTSV will now return INSERT_IN_PROGRESS instead of DELETE_IN_PROGRESS for those. That's both, more in line with the other visibility routines and arguably more correct. The latter because a INSERT_IN_PROGRESS will make callers look at/wait for xmin, instead of xmax. The only current caller where that's possibly worse than the old behaviour is heap_prune_chain() which now won't mark the page as prunable if a row has concurrently been inserted and deleted. That's harmless enough. As a cautionary measure also insert a interrupt check before the gotos in IndexBuildHeapScan() that lead to the uninterruptible loop. There are other possible causes, like a row that several sessions try to update and all fail, for repeated loops and the cost of doing so in the retry case is low. As this bug goes back all the way to the introduction of subtransactions in 573a71a5da backpatch to all supported releases. Reported-By: Sandro Santilli
* Set the process latch when processing recovery conflict interrupts.Andres Freund2014-06-03
| | | | | | | | | | | | | | | | | | | | | | | | Because RecoveryConflictInterrupt() didn't set the process latch anything using the latter to wait for events didn't get notified about recovery conflicts. Most latch users are never the target of recovery conflicts, which explains the lack of reports about this until now. Since 9.3 two possible affected users exist though: The sql callable pg_sleep() now uses latches to wait and background workers are expected to use latches in their main loop. Both would currently wait until the end of WaitLatch's timeout. Fix by adding a SetLatch() to RecoveryConflictInterrupt(). It'd also be possible to fix the issue by having each latch user set set_latch_on_sigusr1. That seems failure prone and though, as most of these callsites won't often receive recovery conflicts and thus will likely only be tested against normal query cancels et al. It'd also be unnecessarily verbose. Backpatch to 9.1 where latches were introduced. Arguably 9.3 would be sufficient, because that's where pg_sleep() was converted to waiting on the latch and background workers got introduced; but there could be user level code making use of the latch pre 9.3.
* Revert "Fix bogus %name-prefix option syntax in all our Bison files."Tom Lane2014-05-28
| | | | | | | | | | | | This reverts commit ece7aa8b0f57d92577055a88579555df895eb929. It turns out that the %name-prefix syntax without "=" does not work at all in pre-2.4 Bison. We are not prepared to make such a large jump in minimum required Bison version just to suppress a warning message in a version hardly any developers are using yet. When 3.0 gets more popular, we'll figure out a way to deal with this. In the meantime, BISONFLAGS=-Wno-deprecated is recommendable for anyone using 3.0 who doesn't want to see the warning.
* Fix bogus %name-prefix option syntax in all our Bison files.Tom Lane2014-05-28
| | | | | | | | | | | | %name-prefix doesn't use an "=" sign according to the Bison docs, but it silently accepted one anyway, until Bison 3.0. This was originally a typo of mine in commit 012abebab1bc72043f3f670bf32e91ae4ee04bd2, and we seem to have slavishly copied the error into all the other grammar files. Per report from Vik Fearing; analysis by Peter Eisentraut. Back-patch to all active branches, since somebody might try to build a back branch with up-to-date tools.
* Ensure cleanup in case of early errors in streaming base backupsMagnus Hagander2014-05-28
| | | | | | | | | | Move the code that sends the initial status information as well as the calculation of paths inside the ENSURE_ERROR_CLEANUP block. If this code failed, we would "leak" a counter of number of concurrent backups, thereby making the system always believe it was in backup mode. This could happen if the sending failed (which it probably never did given that the small amount of data to send would never cause a flush). It is very low risk, but all operations after do_pg_start_backup should be protected.
* Prevent auto_explain from changing the output of a user's EXPLAIN.Tom Lane2014-05-20
| | | | | | | | | | | | | | | | | | Commit af7914c6627bcf0b0ca614e9ce95d3f8056602bf, which introduced the EXPLAIN (TIMING) option, for some reason coded explain.c to look at planstate->instrument->need_timer rather than es->timing to decide whether to print timing info. However, the former flag might get set as a result of contrib/auto_explain wanting timing information. We certainly don't want activation of auto_explain to change user-visible statement behavior, so fix that. Also fix an independent bug introduced in the same patch: in the code path for a never-executed node with a machine-friendly output format, if timing was selected, it would fail to print the Actual Rows and Actual Loops items. Per bug #10404 from Tomonari Katsumata. Back-patch to 9.2 where the faulty code was introduced.
* Use 0-based numbering in comments about backup blocks.Heikki Linnakangas2014-05-19
| | | | | | | | | The macros and functions that work with backup blocks in the redo function use 0-based numbering, so let's use that consistently in the function that generates the records too. Makes it so much easier to compare the generation and replay functions. Backpatch to 9.0, where we switched from 1-based to 0-based numbering.
* Initialize tsId and dbId fields in WAL record of COMMIT PREPARED.Heikki Linnakangas2014-05-16
| | | | | | | | | | | | | | | | Commit dd428c79 added dbId and tsId to the xl_xact_commit struct but missed that prepared transaction commits reuse that struct. Fix that. Because those fields were left unitialized, replaying a commit prepared WAL record in a hot standby node would fail to remove the relcache init file. That can lead to "could not open file" errors on the standby. Relcache init file only needs to be removed when a system table/index is rewritten in the transaction using two phase commit, so that should be rare in practice. In HEAD, the incorrect dbId/tsId values are also used for filtering in logical replication code, causing the transaction to always be filtered out. Analysis and fix by Andres Freund. Backpatch to 9.0 where hot standby was introduced.
* Fix unportable setvbuf() usage in initdb.Tom Lane2014-05-15
| | | | | | | | | | | | In yesterday's commit 2dc4f011fd61501cce507be78c39a2677690d44b, I tried to force buffering of stdout/stderr in initdb to be what it is by default when the program is run interactively on Unix (since that's how most manual testing is done). This tripped over the fact that Windows doesn't support _IOLBF mode. We dealt with that a long time ago in syslogger.c by falling back to unbuffered mode on Windows. Export that solution in port.h and use it in initdb. Back-patch to 8.4, like the previous commit.
* Handle duplicate XIDs in txid_snapshot.Heikki Linnakangas2014-05-15
| | | | | | | | | | The proc array can contain duplicate XIDs, when a transaction is just being prepared for two-phase commit. To cope, remove any duplicates in txid_current_snapshot(). Also ignore duplicates in the input functions, so that if e.g. you have an old pg_dump file that already contains duplicates, it will be accepted. Report and fix by Jan Wieck. Backpatch to all supported versions.
* Fix race condition in preparing a transaction for two-phase commit.Heikki Linnakangas2014-05-15
| | | | | | | | | | | | | | | | | | | | To lock a prepared transaction's shared memory entry, we used to mark it with the XID of the backend. When the XID was no longer active according to the proc array, the entry was implicitly considered as not locked anymore. However, when preparing a transaction, the backend's proc array entry was cleared before transfering the locks (and some other state) to the prepared transaction's dummy PGPROC entry, so there was a window where another backend could finish the transaction before it was in fact fully prepared. To fix, rewrite the locking mechanism of global transaction entries. Instead of an XID, just have simple locked-or-not flag in each entry (we store the locking backend's backend id rather than a simple boolean, but that's just for debugging purposes). The backend is responsible for explicitly unlocking the entry, and to make sure that that happens, install a callback to unlock it on abort or process exit. Backpatch to all supported versions.
* Code review for recent changes in relcache.c.Tom Lane2014-05-14
| | | | | | | | | | | | | | | | | | | | | | | rd_replidindex should be managed the same as rd_oidindex, and rd_keyattr and rd_idattr should be managed like rd_indexattr. Omissions in this area meant that the bitmapsets computed for rd_keyattr and rd_idattr would be leaked during any relcache flush, resulting in a slow but permanent leak in CacheMemoryContext. There was also a tiny probability of relcache entry corruption if we ran out of memory at just the wrong point in RelationGetIndexAttrBitmap. Otherwise, the fields were not zeroed where expected, which would not bother the code any AFAICS but could greatly confuse anyone examining the relcache entry while debugging. Also, create an API function RelationGetReplicaIndex rather than letting non-relcache code be intimate with the mechanisms underlying caching of that value (we won't even mention the memory leak there). Also, fix a relcache flush hazard identified by Andres Freund: RelationGetIndexAttrBitmap must not assume that rd_replidindex stays valid across index_open. The aspects of this involving rd_keyattr date back to 9.3, so back-patch those changes.
* Get rid of bogus dependency on typcategory in to_json() and friends.Tom Lane2014-05-09
| | | | | | | | | | | | | | | | | | | These functions were relying on typcategory to identify arrays and composites, which is not reliable and not the normal way to do it. Using typcategory to identify boolean, numeric types, and json itself is also pretty questionable, though the code in those cases didn't seem to be at risk of anything worse than wrong output. Instead, use the standard lsyscache functions to identify arrays and composites, and rely on a direct check of the type OID for the other cases. In HEAD, also be sure to look through domains so that a domain is treated the same as its base type for conversions to JSON. However, this is a small behavioral change; given the lack of field complaints, we won't back-patch it. In passing, refactor so that there's only one copy of the code that decides which conversion strategy to apply, not multiple copies that could (and have) gotten out of sync.