aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* WAL-log the extension of a new empty MV heap which is being populated.Kevin Grittner2013-03-06
| | | | | | | | | | This page with no tuples is used to distinguish an MV containing a zero-row resultset of its backing query from an MV which has not been populated by its backing query. Unless WAL-logged, recovery and hot standby don't work correctly with what should be an empty but scannable materialized view. Fixes bugs reported by Fujii Masao in testing MVs on hot standby.
* Fix to_char() to use ASCII-only case-folding rules where appropriate.Tom Lane2013-03-05
| | | | | | | | | | | | | | formatting.c used locale-dependent case folding rules in some code paths where the result isn't supposed to be locale-dependent, for example to_char(timestamp, 'DAY'). Since the source data is always just ASCII in these cases, that usually didn't matter ... but it does matter in Turkish locales, which have unusual treatment of "i" and "I". To confuse matters even more, the misbehavior was only visible in UTF8 encoding, because in single-byte encodings we used pg_toupper/pg_tolower which don't have locale-specific behavior for ASCII characters. Fix by providing intentionally ASCII-only case-folding functions and using these where appropriate. Per bug #7913 from Adnan Dursun. Back-patch to all active branches, since it's been like this for a long time.
* Fix overflow check in tm2timestamp (this time for sure).Tom Lane2013-03-04
| | | | | | I fixed this code back in commit 841b4a2d5, but didn't think carefully enough about the behavior near zero, which meant it improperly rejected 1999-12-31 24:00:00. Per report from Magnus Hagander.
* Remove accidentally-committed .orig file.Kevin Grittner2013-03-04
|
* Fix map_sql_value_to_xml_value() to treat domains like their base types.Tom Lane2013-03-03
| | | | | | | | | | | | This was already the case for domains over arrays, but not for domains over certain built-in types such as boolean. The special formatting rules for those types should apply to domains over them as well. Per discussion. While this is a bug fix, it's also a behavioral change that seems likely to trip up some applications. So no back-patch. Pavel Stehule
* Add a materialized view relations.Kevin Grittner2013-03-03
| | | | | | | | | | | | | | | | | | | | | | A materialized view has a rule just like a view and a heap and other physical properties like a table. The rule is only used to populate the table, references in queries refer to the materialized data. This is a minimal implementation, but should still be useful in many cases. Currently data is only populated "on demand" by the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements. It is expected that future releases will add incremental updates with various timings, and that a more refined concept of defining what is "fresh" data will be developed. At some point it may even be possible to have queries use a materialized in place of references to underlying tables, but that requires the other above-mentioned features to be working first. Much of the documentation work by Robert Haas. Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja Security review by KaiGai Kohei, with a decision on how best to implement sepgsql still pending.
* Get rid of any toast table when converting a table to a view.Tom Lane2013-03-03
| | | | | | | | | | | | | Also make sure other fields of the view's pg_class entry are appropriate for a view; it shouldn't have relfrozenxid set for instance. This ancient omission isn't believed to have any serious consequences in versions 8.4-9.2, so no backpatch. But let's fix it before it does bite us in some serious way. It's just luck that the case doesn't cause problems for autovacuum. (It did cause problems in 8.3, but that's out of support.) Andres Freund
* Fix SQL function execution to be safe with long-lived FmgrInfos.Tom Lane2013-03-03
| | | | | | | | | | | | fmgr_sql had been designed on the assumption that the FmgrInfo it's called with has only query lifespan. This is demonstrably unsafe in connection with range types, as shown in bug #7881 from Andrew Gierth. Fix things so that we re-generate the function's cache data if the (sub)transaction it was made in is no longer active. Back-patch to 9.2. This might be needed further back, but it's not clear whether the case can realistically arise without range types, so for now I'll desist from back-patching further.
* Cannot use WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE.Heikki Linnakangas2013-02-27
| | | | | | | | | | In copy-out mode, the frontend should not send any messages until the backend has finished streaming, by sending a CopyDone message. I'm not sure if it would be legal for the client to send a new query before receiving the CopyDone message from the backend, but trying to support that would require bigger changes to the backend code structure. Fixes an assertion failure reported by Fujii Masao.
* Add support for piping COPY to/from an external program.Heikki Linnakangas2013-02-27
| | | | | | | | | | | | | | | | | | This includes backend "COPY TO/FROM PROGRAM '...'" syntax, and corresponding psql \copy syntax. Like with reading/writing files, the backend version is superuser-only, and in the psql version, the program is run in the client. In the passing, the psql \copy STDIN/STDOUT syntax is subtly changed: if you the stdin/stdout is quoted, it's now interpreted as a filename. For example, "\copy foo from 'stdin'" now reads from a file called 'stdin', not from standard input. Before this, there was no way to specify a filename called stdin, stdout, pstdin or pstdout. This creates a new function in pgport, wait_result_to_str(), which can be used to convert the exit status of a process, as returned by wait(3), to a human-readable string. Etsuro Fujita, reviewed by Amit Kapila.
* Add missing error check in regexp parser.Tom Lane2013-02-27
| | | | | | | | | | | | parseqatom() failed to check for an error return (NULL result) from its recursive call to parsebranch(), and in consequence could crash with a null-pointer dereference after an error return. This bug has been there since day one, but wasn't noticed before, probably because most error cases in parsebranch() didn't actually lead to returning NULL. Add the missing error check, and also tweak parsebranch() to exit in a less indirect fashion after a call to parseqatom() fails. Report by Tomasz Karlik, fix by me.
* Correct tense in log messagePeter Eisentraut2013-02-23
|
* Add quotes to messagesPeter Eisentraut2013-02-22
|
* Fix thinko in previous commit.Heikki Linnakangas2013-02-22
| | | | | | We must still initialize minRecoveryPoint if we start straight with archive recovery, e.g when recovering from a normal base backup taken with pg_start/stop_backup. Otherwise we never consider the system consistent.
* If recovery.conf is created after "pg_ctl stop -m i", do crash recovery.Heikki Linnakangas2013-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you create a base backup using an atomic filesystem snapshot, and try to perform PITR starting from that base backup, or if you just kill a master server and create recovery.conf to put it into standby mode, we don't know how far we need to recover before reaching consistency. Normally in crash recovery, we replay all the WAL present in pg_xlog, and assume that we're consistent after that. And normally in archive recovery, minRecoveryPoint, backupEndRequired, or backupEndPoint is set in the control file, indicating how far we need to replay to reach consistency. But if the server was previously up and running normally, and you kill -9 it or take an atomic filesystem snapshot, none of those fields are set in the control file. The solution is to perform crash recovery first, replaying all the WAL in pg_xlog. After that's done, we assume that the system is consistent like in normal crash recovery, and switch to archive recovery mode after that. Per report from Kyotaro HORIGUCHI. In his scenario, recovery.conf was created after "pg_ctl stop -m i". I'm not sure we need to support that exact scenario, but we should support backing up using a filesystem snapshot, which looks identical. This issue goes back to at least 9.0, where hot standby was introduced and we started to track when consistency is reached. In 9.1 and 9.2, we would open up for hot standby too early, and queries could briefly see an inconsistent state. But 9.2 made it more visible, as we started to PANIC if we see a reference to a non-existing page during recovery, if we've already reached consistency. This is a fairly big patch, so back-patch to 9.2 only, where the issue is more visible. We can consider back-patching further after this has received some more testing in 9.2 and master.
* Move relpath() to libpgcommonAlvaro Herrera2013-02-21
| | | | | | | This enables non-backend code, such as pg_xlogdump, to use it easily. The previous location, in src/backend/catalog/catalog.c, made that essentially impossible because that file depends on many backend-only facilities; so this needs to live separately.
* Remove useless variableAlvaro Herrera2013-02-21
| | | | Per Jeff Janes
* Add postgres_fdw contrib module.Tom Lane2013-02-21
| | | | | | | | There's still a lot of room for improvement, but it basically works, and we need this to be present before we can do anything much with the writable-foreign-tables patch. So let's commit it and get on with testing. Shigeru Hanada, reviewed by KaiGai Kohei and Tom Lane
* Fix yet another typo in comment.Heikki Linnakangas2013-02-20
| | | | Etsuro Fujita
* Split pgstat file in smaller piecesAlvaro Herrera2013-02-18
| | | | | | | | | | | | | | | | We now write one file per database and one global file, instead of having the whole thing in a single huge file. This reduces the I/O that must be done when partial data is required -- which is all the time, because each process only needs information on its own database anyway. Also, the autovacuum launcher does not need data about tables and functions in each database; having the global stats for all DBs is enough. Catalog version bumped because we have a new subdir under PGDATA. Author: Tomas Vondra. Some rework by Álvaro Testing by Jeff Janes Other discussion by Heikki Linnakangas, Tom Lane.
* Add ALTER ROLE ALL SET commandPeter Eisentraut2013-02-17
| | | | | | | | This generalizes the existing ALTER ROLE ... SET and ALTER DATABASE ... SET functionality to allow creating settings that apply to all users in all databases. reviewed by Pavel Stehule
* Better fix for "unarchived WAL files get deleted on crash recovery" bug.Heikki Linnakangas2013-02-15
| | | | | | | | | | Revert my earlier fix for the bug that unarchived WAL files get deleted on crash recovery, commit c9cc7e05c6d82a9781883a016c70d95aa4923122. We create a .done file for files streamed or restored from archive, so the WAL file recycling logic used during normal operation works just as well during archive recovery. Per Fujii Masao's suggestion.
* Force archive_status of .done for xlogs created by dearchival/replication.Simon Riggs2013-02-15
| | | | | | | | | This is a forward-patch of commit 6f4b8a4f4f7a2d683ff79ab59d3693714b965e3d, applied to 9.2 back in August. The plan was to do something else in master, but it looks like it's not going to happen, so let's just apply the 9.2 solution to master as well. Fujii Masao
* Don't delete unarchived WAL files during crash recovery.Heikki Linnakangas2013-02-15
| | | | | Bug reported by Jehan-Guillaume (ioguix) de Rorthais. This was introduced with the change to keep WAL files restored from archive in pg_xlog, in 9.2.
* Invent pre-commit/pre-prepare/pre-subcommit events for xact callbacks.Tom Lane2013-02-14
| | | | | | | | | | | | | | Currently it's only possible for loadable modules to get control during post-commit cleanup of a transaction. That doesn't work too well if they want to do something that could throw an error; for example, an FDW might need to issue a remote commit, which could well fail. To improve matters, extend the existing APIs for XactCallback and SubXactCallback functions to provide new pre-commit events for this purpose. The release notes will need to mention that existing callback functions should be checked to make sure they don't do something unwanted when one of the new event types occurs. In the examples within our source tree, contrib/sepgsql was fine but plpgsql had been a bit too cute.
* Fix CVE-2013-0255 properly.Tom Lane2013-02-13
| | | | | | | | | | | | Revert commit ab0f7b6089fd215f6ce6081e2e222c38d643a526 (in HEAD only) in favor of the proper solution, which is to declare enum_recv() correctly in the system catalogs. It should be declared to take type "internal" not "cstring". Also improve the type_sanity regression test, which should have caught this typo, so that it actually would. Most of the relevant checks on the signature of type I/O functions should not have been restricted to basetypes/pseudotypes, as they should apply to any type's I/O functions.
* Fix bogus when-to-deregister-from-listener-array logic.Tom Lane2013-02-13
| | | | | | | | | | | | | | | | | | Since a backend adds itself to the global listener array during Exec_ListenPreCommit, it's inappropriate for it to remove itself during Exec_UnlistenCommit or Exec_UnlistenAllCommit --- that leads to failure when committing a transaction that did UNLISTEN then LISTEN, since we end up not registered though we should be. (This leads to missing later notifications, or to Assert failures in assert-enabled builds.) Instead deal with deregistering at the bottom of AtCommit_Notify, when we know the final state of the listenChannels list. Also, simplify the representation of registration status by replacing the transient backendHasExecutedInitialListen flag with an amRegisteredListener flag. Per report from Greg Sabino Mullane. Back-patch to 9.0, where the problem was introduced during the LISTEN/NOTIFY rewrite.
* Update visibility map in the second phase of vacuum.Heikki Linnakangas2013-02-13
| | | | | | | | There's a high chance that a page becomes all-visible when the second phase of vacuum removes all the dead tuples on it, so it makes sense to check for that. Otherwise the visibility map won't get updated until the next vacuum. Pavan Deolasee, reviewed by Jeff Janes.
* Create libpgcommon, and move pg_malloc et al to itAlvaro Herrera2013-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | libpgcommon is a new static library to allow sharing code among the various frontend programs and backend; this lets us eliminate duplicate implementations of common routines. We avoid libpgport, because that's intended as a place for porting issues; per discussion, it seems better to keep them separate. The first use case, and the only implemented by this patch, is pg_malloc and friends, which many frontend programs were already using. At the same time, we can use this to provide palloc emulation functions for the frontend; this way, some palloc-using files in the backend can also be used by the frontend cleanly. To do this, we change palloc() in the backend to be a function instead of a macro on top of MemoryContextAlloc(). This was previously believed to cause loss of performance, but this implementation has been tweaked by Tom and Andres so that on modern compilers it provides a slight improvement over the previous one. This lets us clean up some places that were already with localized hacks. Most of the pg_malloc/palloc changes in this patch were authored by Andres Freund. Zoltán Böszörményi also independently provided a form of that. libpgcommon infrastructure was authored by Álvaro.
* Add noreturn attributes to some error reporting functionsPeter Eisentraut2013-02-12
|
* Support unlogged GiST index.Heikki Linnakangas2013-02-11
| | | | | | | | | | | | | | | The reason this wasn't supported before was that GiST indexes need an increasing sequence to detect concurrent page-splits. In a regular WAL- logged GiST index, the LSN of the page-split record is used for that purpose, and in a temporary index, we can get away with a backend-local counter. Neither of those methods works for an unlogged relation. To provide such an increasing sequence of numbers, create a "fake LSN" counter that is saved and restored across shutdowns. On recovery, unlogged relations are blown away, so the counter doesn't need to survive that either. Jeevan Chalke, based on discussions with Robert Haas, Tom Lane and me.
* Fix checkpoint after fast promotion.Heikki Linnakangas2013-02-11
| | | | | | | | | | The intention was to request a regular online checkpoint immediately after end of recovery, when performing "fast promotion". However, because the checkpoint was requested before other backends were allowed to write WAL, the checkpointer process performed a restartpoint rather than a checkpoint. Delay the RequestCheckPoint call until after recovery has truly ended, so that you get a real checkpoint.
* Include previous TLI in end-of-recovery and shutdown checkpoint records.Heikki Linnakangas2013-02-11
| | | | | | This isn't used for anything but a sanity check at the moment, but it could be highly valuable for debugging purposes. It could also be used to recreate timeline history by traversing WAL, which seems useful.
* Further cleanup of gistsplit.c.Tom Lane2013-02-10
| | | | | | | | | | | | | | | | | | | | After further reflection I was unconvinced that the existing coding is guaranteed to return valid union datums in every code path for multi-column indexes. Fix that by forcing a gistunionsubkey() call at the end of the recursion. Having done that, we can remove some clearly-redundant calls elsewhere. This should be a little faster for multi-column indexes (since the previous coding would uselessly do such a call for each column while unwinding the recursion), as well as much harder to break. Also, simplify the handling of cases where one side or the other of a primary split contains only don't-care tuples. The previous coding used a very ugly hack in removeDontCares() that essentially forced one random tuple to be treated as non-don't-care, providing a random initial choice of seed datum for the secondary split. It seems unlikely that that method will give better-than-random splits. Instead, treat such a split as degenerate and just let the next column determine the split, the same way that we handle fully degenerate cases where the two sides produce identical union datums.
* Remove useless picksplit-doesn't-support-secondary-split log spam.Tom Lane2013-02-10
| | | | | | | | | | | | | | This LOG message was put in over five years ago with the evident expectation that we'd make all GiST opclasses support secondary split directly. However, no such thing ever happened, and indeed the number of opclasses supporting it decreased to zero in 9.2. The reason is that improving on the default implementation isn't that easy --- the opclass-specific code that did exist, before 9.2, doesn't appear to have been any improvement over the default. Hence, remove the message altogether. There's certainly no point in nagging users about this in released branches, but I doubt that we'll ever implement complete opclass-specific support anyway.
* Remove vestigial secondary-split support in gist_box_picksplit().Tom Lane2013-02-10
| | | | | | | | | | | | | | | | | | | | | | | Not only is this implementation of secondary-split not better than the default implementation in gistsplit.c, it's actually worse. The gistsplit.c code at least looks to see if switching the left and right sides would make a better merge with the previously-split tuples, while this doesn't. In any case it's rather useless to support secondary split only in an edge case. There used to be more complete support for it here (in chooseLR()), but that was removed in commit 7f3bd86843e5aad84585a57d3f6b80db3c609916. It appears to me though that the chooseLR() code was really isomorphic to the default implementation, since it was still based on choosing the cheaper way of adding two sub-split vectors that had been chosen without regard to the primary split initially. I think an implementation of secondary split that could beat the default implementation would have to be pretty fully integrated into the split algorithm, not plastered on at the end. Back-patch to 9.2, but not further; previous branches have the chooseLR() code which I don't feel a great need to mess with. This is mainly so we just have two behaviors and not three among the various branches (IOW, this patch is cleanup for commit 7f3bd86843e5aad84585a57d3f6b80db3c609916's incomplete removal of secondary-split support).
* Document and clean up gistsplit.c.Tom Lane2013-02-10
| | | | | | | | | | Improve comments, rename some variables and functions, slightly simplify a couple of APIs, in an attempt to make this code readable by people other than its original author. Even though this is essentially just cosmetic, back-patch to all active branches, because otherwise it's going to make back-patching future fixes in this file very painful.
* Reduce log level of picksplit-doesn't-support-secondary-split whining.Tom Lane2013-02-09
| | | | | | This was agreed to back in 2007, but never actually done. Josh Hansen
* Add support for ALTER RULE ... RENAME TO.Tom Lane2013-02-08
| | | | Ali Dar, reviewed by Dean Rasheed.
* Simplify box_overlap computations.Tom Lane2013-02-08
| | | | | | | | | | | | Given the assumption that a box's high coordinates are not less than its low coordinates, the tests in box_ov() are overly complicated and can be reduced to about half as much work. Since many other functions in geo_ops.c rely on that assumption, there doesn't seem to be a good reason not to use it here. Per discussion of Alexander Korotkov's GiST fix, which was already using the simplified logic (in a non-fuzzy form, but the equivalence holds just as well for fuzzy).
* Fix gist_box_same and gist_point_consistent to handle fuzziness correctly.Tom Lane2013-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | While there's considerable doubt that we want fuzzy behavior in the geometric operators at all (let alone as currently implemented), nobody is stepping forward to redesign that stuff. In the meantime it behooves us to make sure that index searches agree with the behavior of the underlying operators. This patch fixes two problems in this area. First, gist_box_same was using fuzzy equality, but it really needs to use exact equality to prevent not-quite-identical upper index keys from being treated as identical, which for example would prevent an existing upper key from being extended by an amount less than epsilon. This would result in inconsistent indexes. (The next release notes will need to recommend that users reindex GiST indexes on boxes, polygons, circles, and points, since all four opclasses use gist_box_same.) Second, gist_point_consistent used exact comparisons for upper-page comparisons in ~= searches, when it needs to use fuzzy comparisons to ensure it finds all matches; and it used fuzzy comparisons for point <@ box searches, when it needs to use exact comparisons because that's what the <@ operator (rather inconsistently) does. The added regression test cases illustrate all three misbehaviors. Back-patch to all active branches. (8.4 did not have GiST point_ops, but it still seems prudent to apply the gist_box_same patch to it.) Alexander Korotkov, reviewed by Noah Misch
* Fix Xmax freeze conditionsAlvaro Herrera2013-02-08
| | | | | | | I broke this in 0ac5ad5134; previously, freezing a tuple marked with an IS_MULTI xmax was not necessary. Per brokenness report from Jeff Janes.
* Fix another typo in a commentMagnus Hagander2013-02-08
| | | | Noted by Thom Brown
* Fix typo in commentMagnus Hagander2013-02-08
| | | | Etsuro Fujita
* Fix performance issue in EXPLAIN (ANALYZE, TIMING OFF).Tom Lane2013-02-07
| | | | | | | | | | | | | | | | | | | Commit af7914c6627bcf0b0ca614e9ce95d3f8056602bf, which added the TIMING option to EXPLAIN, had an oversight: if the TIMING option is disabled then control in InstrStartNode() goes through an elog(DEBUG2) call, which typically does nothing but takes a noticeable amount of time to do it. Tweak the logic to avoid that. In HEAD, also change the elog(DEBUG2)'s in instrument.c to elog(ERROR). It's not very clear why they weren't like that to begin with, but this episode shows that not complaining more vociferously about misuse is likely to do little except allow bugs to remain hidden. While at it, adjust some code that was making possibly-dangerous assumptions about flag bits being in the rightmost byte of the instrument_options word. Problem reported by Pavel Stehule (via Tomas Vondra).
* Repair bugs in GiST page splitting code for multi-column indexes.Tom Lane2013-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When considering a non-last column in a multi-column GiST index, gistsplit.c tries to improve on the split chosen by the opclass-specific pickSplit function by considering penalties for the next column. However, there were two bugs in this code: it failed to recompute the union keys for the leftmost index columns, even though these might well change after reassigning tuples; and it included the old union keys in the recomputation for the columns it did recompute, so that those keys couldn't get smaller even if they should. The first problem could result in an invalid index in which searches wouldn't find index entries that are in fact present; the second would make the index less efficient to search. Both of these errors were caused by misuse of gistMakeUnionItVec, whose API was designed in a way that just begged such errors to be made. There is no situation in which it's safe or useful to compute the union keys for a subset of the index columns, and there is no caller that wants any previous union keys to be included in the computation; so the undocumented choice to treat the union keys as in/out rather than pure output parameters is a waste of code as well as being dangerous. Hence, rather than just making a minimal patch, I've changed the API of gistMakeUnionItVec to remove the "startkey" parameter (it now always processes all index columns) and treat the attr/isnull arrays as purely output parameters. In passing, also get rid of a couple of unnecessary and dangerous uses of static variables in gistutil.c. It's remarkable that the one in gistMakeUnionKey hasn't given us portability troubles before now, because in addition to posing a re-entrancy hazard, it was unsafely assuming that a static char[] array would have at least Datum alignment. Per investigation of a trouble report from Tomas Vondra. (There are also some bugs in contrib/btree_gist to be fixed, but that seems like material for a separate patch.) Back-patch to all supported branches.
* Fix possible failure to send final transaction counts to stats collector.Tom Lane2013-02-07
| | | | | | | | | | | | | Normally, we suppress sending a tabstats message to the collector unless there were some actual table stats to send. However, during backend exit we should force out the message if there are any transaction commit/abort counts to send, else the session's last few commit/abort counts will never get reported at all. We had logic for this, but the short-circuit test at the top of pgstat_report_stat() ignored the "force" flag, with the consequence that session-ending transactions that touched no database-local tables would not get counted. Seems to be an oversight in my commit 641912b4d17fd214a5e5bae4e7bb9ddbc28b144b, which added the "force" flag. That was back in 8.3, so back-patch to all supported versions.
* Rely only on checkpoint 1 at end of recovery.Simon Riggs2013-02-07
| | | | | | | Searching for checkpoint 2 (previous) is not correct in all cases. Bug report from Heikki Linnakangas
* Enable building with Microsoft Visual Studio 2012.Andrew Dunstan2013-02-06
| | | | | | Backpatch to release 9.2 Brar Piening and Noah Misch, reviewed by Craig Ringer.
* Split out list of XLog resource managersAlvaro Herrera2013-02-06
| | | | | | | | | | The new rmgrlist.h header, containing all necessary data about built-in resource managers, allows other pieces of code to access them. In particular, this allows a future pg_xlogdump program to extract rm_desc function pointers, without having to keep a duplicate list of them.