aboutsummaryrefslogtreecommitdiff
path: root/src/backend/commands
Commit message (Collapse)AuthorAge
...
* Fix table_rewrite event trigger for ALTER TYPE/SET DATA TYPE CASCADEAlvaro Herrera2015-02-27
| | | | | | | | | | | | | When a composite type being used in a typed table is modified by way of ALTER TYPE, a table rewrite occurs appearing to come from ALTER TYPE. The existing event_trigger.c code was unable to cope with that and raised a spurious error. The fix is just to accept that command tag for the event, and document this properly. Noted while fooling with deparsing of DDL commands. This appears to be an oversight in commit 618c9430a. Thanks to Mark Wong for documentation wording help.
* Reconsider when to wait for WAL flushes/syncrep during commit.Andres Freund2015-02-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now RecordTransactionCommit() waited for WAL to be flushed (if synchronous_commit != off) and to be synchronously replicated (if enabled), even if a transaction did not have a xid assigned. The primary reason for that is that sequence's nextval() did not assign a xid, but are worthwhile to wait for on commit. This can be problematic because sometimes read only transactions do write WAL, e.g. HOT page prune records. That then could lead to read only transactions having to wait during commit. Not something people expect in a read only transaction. This lead to such strange symptoms as backends being seemingly stuck during connection establishment when all synchronous replicas are down. Especially annoying when said stuck connection is the standby trying to reconnect to allow syncrep again... This behavior also is involved in a rather complicated <= 9.4 bug where the transaction started by catchup interrupt processing waited for syncrep using latches, but didn't get the wakeup because it was already running inside the same overloaded signal handler. Fix the issue here doesn't properly solve that issue, merely papers over the problems. In 9.5 catchup interrupts aren't processed out of signal handlers anymore. To fix all this, make nextval() acquire a top level xid, and only wait for transaction commit if a transaction both acquired a xid and emitted WAL records. If only a xid has been assigned we don't uselessly want to wait just because of writes to temporary/unlogged tables; if only WAL has been written we don't want to wait just because of HOT prunes. The xid assignment in nextval() is unlikely to cause overhead in real-world workloads. For one it only happens SEQ_LOG_VALS/32 values anyway, for another only usage of nextval() without using the result in an insert or similar is affected. Discussion: 20150223165359.GF30784@awork2.anarazel.de, 369698E947874884A77849D8FE3680C2@maumau, 5CF4ABBA67674088B3941894E22A0D25@maumau Per complaint from maumau and Thom Brown Backpatch all the way back; 9.0 doesn't have syncrep, but it seems better to be consistent behavior across all maintained branches.
* Support more commands in event triggersAlvaro Herrera2015-02-23
| | | | | | | | COMMENT, SECURITY LABEL, and GRANT/REVOKE now also fire ddl_command_start and ddl_command_end event triggers, when they operate on database-local objects. Reviewed-By: Michael Paquier, Andres Freund, Stephen Frost
* Get rid of multiple applications of transformExpr() to the same tree.Tom Lane2015-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | transformExpr() has for many years had provisions to do nothing when applied to an already-transformed expression tree. However, this was always ugly and of dubious reliability, so we'd be much better off without it. The primary historical reason for it was that gram.y sometimes returned multiple links to the same subexpression, which is no longer true as of my BETWEEN fixes. We'd also grown some lazy hacks in CREATE TABLE LIKE (failing to distinguish between raw and already-transformed index specifications) and one or two other places. This patch removes the need for and support for re-transforming already transformed expressions. The index case is dealt with by adding a flag to struct IndexStmt to indicate that it's already been transformed; which has some benefit anyway in that tablecmds.c can now Assert that transformation has happened rather than just assuming. The other main reason was some rather sloppy code for array type coercion, which can be fixed (and its performance improved too) by refactoring. I did leave transformJoinUsingClause() still constructing expressions containing untransformed operator nodes being applied to Vars, so that transformExpr() still has to allow Var inputs. But that's a much narrower, and safer, special case than before, since Vars will never appear in a raw parse tree, and they don't have any substructure to worry about. In passing fix some oversights in the patch that added CREATE INDEX IF NOT EXISTS (missing processing of IndexStmt.if_not_exists). These appear relatively harmless, but still sloppy coding practice.
* Use FLEXIBLE_ARRAY_MEMBER in a number of other places.Tom Lane2015-02-21
| | | | I think we're about done with this...
* Use FLEXIBLE_ARRAY_MEMBER in some more places.Tom Lane2015-02-20
| | | | | | Fix a batch of structs that are only visible within individual .c files. Michael Paquier
* Have TRUNCATE update pgstat tuple countersAlvaro Herrera2015-02-20
| | | | | | | | | | | | | | | | | | This works by keeping a per-subtransaction record of the ins/upd/del counters before the truncate, and then resetting them; this record is useful to return to the previous state in case the truncate is rolled back, either in a subtransaction or whole transaction. The state is propagated upwards as subtransactions commit. When the per-table data is sent to the stats collector, a flag indicates to reset the live/dead counters to zero as well. Catalog version bumped due to the change in pgstat format. Author: Alexander Shulgin Discussion: 1007.1207238291@sss.pgh.pa.us Discussion: 548F7D38.2000401@BlueTreble.com Reviewed-by: Álvaro Herrera, Jim Nasby
* Use FLEXIBLE_ARRAY_MEMBER in a bunch more places.Tom Lane2015-02-20
| | | | | | | | | | | | | | | | Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]". Aside from being more self-documenting, this should help prevent bogus warnings from static code analyzers and perhaps compiler misoptimizations. This patch is just a down payment on eliminating the whole problem, but it gets rid of a lot of easy-to-fix cases. Note that the main problem with doing this is that one must no longer rely on computing sizeof(the containing struct), since the result would be compiler-dependent. Instead use offsetof(struct, lastfield). Autoconf also warns against spelling that offsetof(struct, lastfield[0]). Michael Paquier, review and additional fixes by me.
* Fix EXPLAIN output for cases where parent table is excluded by constraints.Tom Lane2015-02-17
| | | | | | | | | | | | | | The previous coding in EXPLAIN always labeled a ModifyTable node with the name of the target table affected by its first child plan. When originally written, this was necessarily the parent table of the inheritance tree, so everything was unconfusing. But when we added NO INHERIT constraints, it became possible for the parent table to be deleted from the plan by constraint exclusion while still leaving child tables present. This led to the ModifyTable plan node being labeled with the first surviving child, which was deemed confusing. Fix it by retaining the parent table's RT index in a new field in ModifyTable. Etsuro Fujita, reviewed by Ashutosh Bapat and myself
* Introduce and use infrastructure for interrupt processing during client reads.Andres Freund2015-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now large swathes of backend code ran inside signal handlers while reading commands from the client, to allow for speedy reaction to asynchronous events. Most prominently shared invalidation and NOTIFY handling. That means that complex code like the starting/stopping of transactions is run in signal handlers... The required code was fragile and verbose, and is likely to contain bugs. That approach also severely limited what could be done while communicating with the client. As the read might be from within openssl it wasn't safely possible to trigger an error, e.g. to cancel a backend in idle-in-transaction state. We did that in some cases, namely fatal errors, nonetheless. Now that FE/BE communication in the backend employs non-blocking sockets and latches to block, we can quite simply interrupt reads from signal handlers by setting the latch. That allows us to signal an interrupted read, which is supposed to be retried after returning from within the ssl library. As signal handlers now only need to set the latch to guarantee timely interrupt processing, remove a fair amount of complicated & fragile code from async.c and sinval.c. We could now actually start to process some kinds of interrupts, like sinval ones, more often that before, but that seems better done separately. This work will hopefully allow to handle cases like being blocked by sending data, interrupting idle transactions and similar to be implemented without too much effort. In addition to allowing getting rid of ImmediateInterruptOK, that is. Author: Andres Freund Reviewed-By: Heikki Linnakangas
* Be more careful to not lose sync in the FE/BE protocol.Heikki Linnakangas2015-02-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If any error occurred while we were in the middle of reading a protocol message from the client, we could lose sync, and incorrectly try to interpret a part of another message as a new protocol message. That will usually lead to an "invalid frontend message" error that terminates the connection. However, this is a security issue because an attacker might be able to deliberately cause an error, inject a Query message in what's supposed to be just user data, and have the server execute it. We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other operations that could ereport(ERROR) in the middle of processing a message, but a query cancel interrupt or statement timeout could nevertheless cause it to happen. Also, the V2 fastpath and COPY handling were not so careful. It's very difficult to recover in the V2 COPY protocol, so we will just terminate the connection on error. In practice, that's what happened previously anyway, as we lost protocol sync. To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set whenever we're in the middle of reading a message. When it's set, we cannot safely ERROR out and continue running, because we might've read only part of a message. PqCommReadingMsg acts somewhat similarly to critical sections in that if an error occurs while it's set, the error handler will force the connection to be terminated, as if the error was FATAL. It's not implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted to PANIC in critical sections, because we want to be able to use PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes advantage of that to prevent an OOM error from terminating the connection. To prevent unnecessary connection terminations, add a holdoff mechanism similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel interrupts, but still allow die interrupts. The rules on which interrupts are processed when are now a bit more complicated, so refactor ProcessInterrupts() and the calls to it in signal handlers so that the signal handlers always call it if ImmediateInterruptOK is set, and ProcessInterrupts() can decide to not do anything if the other conditions are not met. Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund. Backpatch to all supported versions. Security: CVE-2015-0244
* Clean up range-table building in copy.cStephen Frost2015-01-28
| | | | | | | | | | | | | | | | | | | Commit 804b6b6db4dcfc590a468e7be390738f9f7755fb added the build of a range table in copy.c to initialize the EState es_range_table since it can be needed in error paths. Unfortunately, that commit didn't appreciate that some code paths might end up not initializing the rte which is used to build the range table. Fix that and clean up a couple others things along the way- build it only once and don't explicitly set it on the !is_from path as it doesn't make any sense there (cstate is palloc0'd, so this isn't an issue from an initializing standpoint either). The prior commit went back to 9.0, but this only goes back to 9.1 as prior to that the range table build happens immediately after building the RTE and therefore doesn't suffer from this issue. Pointed out by Robert.
* Fix column-privilege leak in error-message pathsStephen Frost2015-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | While building error messages to return to the user, BuildIndexValueDescription, ExecBuildSlotValueDescription and ri_ReportViolation would happily include the entire key or entire row in the result returned to the user, even if the user didn't have access to view all of the columns being included. Instead, include only those columns which the user is providing or which the user has select rights on. If the user does not have any rights to view the table or any of the columns involved then no detail is provided and a NULL value is returned from BuildIndexValueDescription and ExecBuildSlotValueDescription. Note that, for key cases, the user must have access to all of the columns for the key to be shown; a partial key will not be returned. Further, in master only, do not return any data for cases where row security is enabled on the relation and row security should be applied for the user. This required a bit of refactoring and moving of things around related to RLS- note the addition of utils/misc/rls.c. Back-patch all the way, as column-level privileges are now in all supported versions. This has been assigned CVE-2014-8161, but since the issue and the patch have already been publicized on pgsql-hackers, there's no point in trying to hide this commit.
* Fix volatile-safety issue in asyncQueueReadAllNotifications().Tom Lane2015-01-26
| | | | | | | | | | | | | | | | | The "pos" variable is modified within PG_TRY and then referenced within PG_CATCH, so for strict POSIX conformance it must be marked volatile. Superficially the code looked safe because pos's address was taken, which was sufficient to force it into memory ... but it's not sufficient to ensure that the compiler applies updates exactly where the program text says to. The volatility marking has to extend into a couple of subroutines too, but I think that's probably a good thing because the risk of out-of-order updates is mostly in those subroutines not asyncQueueReadAllNotifications() itself. In principle the compiler could have re-ordered operations such that an error could be thrown while "pos" had an incorrect value. It's unclear how real the risk is here, but for safety back-patch to all active branches.
* Clean up some mess in row-security patches.Tom Lane2015-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | Fix unsafe coding around PG_TRY in RelationBuildRowSecurity: can't change a variable inside PG_TRY and then use it in PG_CATCH without marking it "volatile". In this case though it seems saner to avoid that by doing a single assignment before entering the TRY block. I started out just intending to fix that, but the more I looked at the row-security code the more distressed I got. This patch also fixes incorrect construction of the RowSecurityPolicy cache entries (there was not sufficient care taken to copy pass-by-ref data into the cache memory context) and a whole bunch of sloppiness around the definition and use of pg_policy.polcmd. You can't use nulls in that column because initdb will mark it NOT NULL --- and I see no particular reason why a null entry would be a good idea anyway, so changing initdb's behavior is not the right answer. The internal value of '\0' wouldn't be suitable in a "char" column either, so after a bit of thought I settled on using '*' to represent ALL. Chasing those changes down also revealed that somebody wasn't paying attention to what the underlying values of ACL_UPDATE_CHR etc really were, and there was a great deal of lackadaiscalness in the catalogs.sgml documentation for pg_policy and pg_policies too. This doesn't pretend to be a complete code review for the row-security stuff, it just fixes the things that were in my face while dealing with the bugs in RelationBuildRowSecurity.
* Fix whitespacePeter Eisentraut2015-01-22
|
* adjust ACL owners for REASSIGN and ALTER OWNER TOBruce Momjian2015-01-22
| | | | | | | | | | | When REASSIGN and ALTER OWNER TO are used, both the object owner and ACL list should be changed from the old owner to the new owner. This patch fixes types, foreign data wrappers, and foreign servers to change their ACL list properly; they already changed owners properly. BACKWARD INCOMPATIBILITY? Report by Alexey Bashtanov
* Use abbreviated keys for faster sorting of text datums.Robert Haas2015-01-19
| | | | | | | | | | | | | | | | | | | | | This commit extends the SortSupport infrastructure to allow operator classes the option to provide abbreviated representations of Datums; in the case of text, we abbreviate by taking the first few characters of the strxfrm() blob. If the abbreviated comparison is insufficent to resolve the comparison, we fall back on the normal comparator. This can be much faster than the old way of doing sorting if the first few bytes of the string are usually sufficient to resolve the comparison. There is the potential for a performance regression if all of the strings to be sorted are identical for the first 8+ characters and differ only in later positions; therefore, the SortSupport machinery now provides an infrastructure to abort the use of abbreviation if it appears that abbreviation is producing comparatively few distinct keys. HyperLogLog, a streaming cardinality estimator, is included in this commit and used to make that determination for text. Peter Geoghegan, reviewed by me.
* Show sort ordering options in EXPLAIN output.Tom Lane2015-01-16
| | | | | | | | | | | | Up to now, EXPLAIN has contented itself with printing the sort expressions in a Sort or Merge Append plan node. This patch improves that by annotating the sort keys with COLLATE, DESC, USING, and/or NULLS FIRST/LAST whenever nondefault sort ordering options are used. The output is now a reasonably close approximation of an ORDER BY clause equivalent to the plan's ordering. Marius Timmer, Lukas Kreft, and Arne Scheffer; reviewed by Mike Blackwell. Some additional hacking by me.
* Rearrange explain.c's API so callers need not embed sizeof(ExplainState).Tom Lane2015-01-15
| | | | | | | | | | The folly of the previous arrangement was just demonstrated: there's no convenient way to add fields to ExplainState without breaking ABI, even if callers have no need to touch those fields. Since we might well need to do that again someday in back branches, let's change things so that only explain.c has to have sizeof(ExplainState) compiled into it. This costs one extra palloc() per EXPLAIN operation, which is surely pretty negligible.
* Improve performance of EXPLAIN with large range tables.Tom Lane2015-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | As of 9.3, ruleutils.c goes to some lengths to ensure that table and column aliases used in its output are unique. Of course this takes more time than was required before, which in itself isn't fatal. However, EXPLAIN was set up so that recalculation of the unique aliases was repeated for each subexpression printed in a plan. That results in O(N^2) time and memory consumption for large plan trees, which did not happen in older branches. Fortunately, the expensive work is the same across a whole plan tree, so there is no need to repeat it; we can do most of the initialization just once per query and re-use it for each subexpression. This buys back most (not all) of the performance loss since 9.2. We need an extra ExplainState field to hold the precalculated deparse context. That's no problem in HEAD, but in the back branches, expanding sizeof(ExplainState) seems risky because third-party extensions might have local variables of that struct type. So, in 9.4 and 9.3, introduce an auxiliary struct to keep sizeof(ExplainState) the same. We should refactor the APIs to avoid such local variables in future, but that's material for a separate HEAD-only commit. Per gripe from Alexey Bashtanov. Back-patch to 9.3 where the issue was introduced.
* Fix logging of pages skipped due to pins during vacuum.Andres Freund2015-01-08
| | | | | | | | | | | | | | | | | | | | | The new logging introduced in 35192f06 made the incorrect assumption that scan_all vacuums would always wait for buffer pins; but they only do so if the page actually needs to be frozen. Fix that inaccuracy by removing the difference in log output based on scan_all and just always remove the same message. I chose to keep the split log message from the original commit for now, it seems likely that it'll be of use in the future. Also merge the line about buffer pins in autovacuum's log output into the existing "pages: ..." line. It seems odd to have a separate line about pins, without the "topic: " prefix others have. Also rename the new 'pinned_pages' variable to 'pinskipped_pages' because it actually tracks the number of pages that could *not* be pinned. Discussion: 20150104005324.GC9626@awork2.anarazel.de
* Reject ANALYZE commands during VACUUM FULL or another ANALYZE.Noah Misch2015-01-07
| | | | | | vacuum()'s static variable handling makes it non-reentrant; an ensuing null pointer deference crashed the backend. Back-patch to 9.0 (all supported versions).
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Add error handling for failing fstat() calls in copy.c.Andres Freund2015-01-04
| | | | | | | | | These calls are pretty much guaranteed not to fail unless something has gone horribly wrong, and even in that case we'd just error out a short time later. But since several code checkers complain about the missing check it seems worthwile to fix it nonetheless. Pointed out by Coverity.
* pg_event_trigger_dropped_objects: Add name/args output columnsAlvaro Herrera2014-12-30
| | | | | | | | | These columns can be passed to pg_get_object_address() and used to reconstruct the dropped objects identities in a remote server containing similar objects, so that the drop can be replicated. Reviewed by Stephen Frost, Heikki Linnakangas, Abhijit Menon-Sen, Andres Freund.
* Use TypeName to represent type names in certain commandsAlvaro Herrera2014-12-30
| | | | | | | | | | | | | | | | | | In COMMENT, DROP, SECURITY LABEL, and the new pg_get_object_address function, we were representing types as a list of names, same as other objects; but types are special objects that require their own representation to be totally accurate. In the original COMMENT code we had a note about fixing it which was lost in the course of c10575ff005. Change all those places to use TypeName instead, as suggested by that comment. Right now the original coding doesn't cause any bugs, so no backpatch. It is more problematic for proposed future code that operate with object addresses from the SQL interface; type details such as array-ness are lost when working with the degraded representation. Thanks to Petr Jelínek and Dimitri Fontaine for offlist help on finding a solution to a shift/reduce grammar conflict.
* Revert "Use a bitmask to represent role attributes"Alvaro Herrera2014-12-23
| | | | | | | | | This reverts commit 1826987a46d079458007b7b6bbcbbd852353adbb. The overall design was deemed unacceptable, in discussion following the previous commit message; we might find some parts of it still salvageable, but I don't want to be on the hook for fixing it, so let's wait until we have a new patch.
* Add SQL-callable pg_get_object_addressAlvaro Herrera2014-12-23
| | | | | | | | | | | | | | | | | This allows access to get_object_address from SQL, which is useful to obtain OID addressing information from data equivalent to that emitted by the parser. This is necessary infrastructure of a project to let replication systems propagate object dropping events to remote servers, where the schema might be different than the server originating the DROP. This patch also adds support for OBJECT_DEFAULT to get_object_address; that is, it is now possible to refer to a column's default value. Catalog version bumped due to the new function. Reviewed by Stephen Frost, Heikki Linnakangas, Robert Haas, Andres Freund, Abhijit Menon-Sen, Adam Brightwell.
* Use a bitmask to represent role attributesAlvaro Herrera2014-12-23
| | | | | | | | | | | | | The previous representation using a boolean column for each attribute would not scale as well as we want to add further attributes. Extra auxilliary functions are added to go along with this change, to make up for the lost convenience of access of the old representation. Catalog version bumped due to change in catalogs and the new functions. Author: Adam Brightwell, minor tweaks by Álvaro Reviewed by: Stephen Frost, Andres Freund, Álvaro Herrera
* get_object_address: separate domain constraints from table constraintsAlvaro Herrera2014-12-23
| | | | | | | | | | | | Apart from enabling comments on domain constraints, this enables a future project to replicate object dropping to remote servers: with the current mechanism there's no way to distinguish between the two types of constraints, so there's no way to know what to drop. Also added support for the domain constraint comments in psql's \dd and pg_dump. Catalog version bumped due to the change in ObjectType enum.
* pg_event_trigger_dropped_objects: add behavior flagsAlvaro Herrera2014-12-19
| | | | | | | | | | | | | | | | | Add "normal" and "original" flags as output columns to the pg_event_trigger_dropped_objects() function. With this it's possible to distinguish which objects, among those listed, need to be explicitely referenced when trying to replicate a deletion. This is necessary so that the list of objects can be pruned to the minimum necessary to replicate the DROP command in a remote server that might have slightly different schema (for instance, TOAST tables and constraints with different names and such.) Catalog version bumped due to change of function definition. Reviewed by: Abhijit Menon-Sen, Stephen Frost, Heikki Linnakangas, Robert Haas.
* Use %u to print out BlockNumber variablesAlvaro Herrera2014-12-18
| | | | Per Tom Lane
* Have VACUUM log number of skipped pages due to pinsAlvaro Herrera2014-12-18
| | | | | | Author: Jim Nasby, some kibitzing by Heikki Linnankangas. Discussion leading to current behavior and precise wording fueled by thoughts from Robert Haas and Andres Freund.
* Improve hash_create's API for selecting simple-binary-key hash functions.Tom Lane2014-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if you wanted anything besides C-string hash keys, you had to specify a custom hashing function to hash_create(). Nearly all such callers were specifying tag_hash or oid_hash; which is tedious, and rather error-prone, since a caller could easily miss the opportunity to optimize by using hash_uint32 when appropriate. Replace this with a design whereby callers using simple binary-data keys just specify HASH_BLOBS and don't need to mess with specific support functions. hash_create() itself will take care of optimizing when the key size is four bytes. This nets out saving a few hundred bytes of code space, and offers a measurable performance improvement in tidbitmap.c (which was not exploiting the opportunity to use hash_uint32 for its 4-byte keys). There might be some wins elsewhere too, I didn't analyze closely. In future we could look into offering a similar optimized hashing function for 8-byte keys. Under this design that could be done in a centralized and machine-independent fashion, whereas getting it right for keys of platform-dependent sizes would've been notationally painful before. For the moment, the old way still works fine, so as not to break source code compatibility for loadable modules. Eventually we might want to remove tag_hash and friends from the exported API altogether, since there's no real need for them to be explicitly referenced from outside dynahash.c. Teodor Sigaev and Tom Lane
* Remove odd blank line in comment.Fujii Masao2014-12-18
| | | | Etsuro Fujita
* Allow CHECK constraints to be placed on foreign tables.Tom Lane2014-12-17
| | | | | | | | | | | | | | | As with NOT NULL constraints, we consider that such constraints are merely reports of constraints that are being enforced by the remote server (or other underlying storage mechanism). Their only real use is to allow planner optimizations, for example in constraint-exclusion checks. Thus, the code changes here amount to little more than removal of the error that was formerly thrown for applying CHECK to a foreign table. (In passing, do a bit of cleanup of the ALTER FOREIGN TABLE reference page, which had accumulated some weird decisions about ordering etc.) Shigeru Hanada and Etsuro Fujita, reviewed by Kyotaro Horiguchi and Ashutosh Bapat.
* Add CINE option for CREATE TABLE AS and CREATE MATERIALIZED VIEWAndrew Dunstan2014-12-13
| | | | Fabrízio de Royes Mello reviewed by Rushabh Lathia.
* Further changes to REINDEX SCHEMASimon Riggs2014-12-11
| | | | | | | | | | Ensure we reindex indexes built on Mat Views. Based on patch from Micheal Paquier Add thorough tests to check that indexes on tables, toast tables and mat views are reindexed. Simon Riggs
* Silence REINDEXSimon Riggs2014-12-09
| | | | | | | Previously REINDEX DATABASE and REINDEX SCHEMA produced a stream of NOTICE messages. Removing that since it is inconsistent for such a command to produce output without a VERBOSE option.
* REINDEX SCHEMASimon Riggs2014-12-09
| | | | | | | | Add new SCHEMA option to REINDEX and reindexdb. Sawada Masahiko Reviewed by Michael Paquier and Fabrízio de Royes Mello
* Event Trigger for table_rewriteSimon Riggs2014-12-08
| | | | | | | | | | | | | | Generate a table_rewrite event when ALTER TABLE attempts to rewrite a table. Provide helper functions to identify table and reason. Intended use case is to help assess or to react to schema changes that might hold exclusive locks for long periods. Dimitri Fontaine, triggering an edit by Simon Riggs Reviewed in detail by Michael Paquier
* Keep track of transaction commit timestampsAlvaro Herrera2014-12-03
| | | | | | | | | | | | | | | | | | | | | Transactions can now set their commit timestamp directly as they commit, or an external transaction commit timestamp can be fed from an outside system using the new function TransactionTreeSetCommitTsData(). This data is crash-safe, and truncated at Xid freeze point, same as pg_clog. This module is disabled by default because it causes a performance hit, but can be enabled in postgresql.conf requiring only a server restart. A new test in src/test/modules is included. Catalog version bumped due to the new subdirectory within PGDATA and a couple of new SQL functions. Authors: Álvaro Herrera and Petr Jelínek Reviewed to varying degrees by Michael Paquier, Andres Freund, Robert Haas, Amit Kapila, Fujii Masao, Jaime Casanova, Simon Riggs, Steven Singer, Peter Eisentraut
* Rename pg_rowsecurity -> pg_policy and other fixesStephen Frost2014-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | As pointed out by Robert, we should really have named pg_rowsecurity pg_policy, as the objects stored in that catalog are policies. This patch fixes that and updates the column names to start with 'pol' to match the new catalog name. The security consideration for COPY with row level security, also pointed out by Robert, has also been addressed by remembering and re-checking the OID of the relation initially referenced during COPY processing, to make sure it hasn't changed under us by the time we finish planning out the query which has been built. Robert and Alvaro also commented on missing OCLASS and OBJECT entries for POLICY (formerly ROWSECURITY or POLICY, depending) in various places. This patch fixes that too, which also happens to add the ability to COMMENT on policies. In passing, attempt to improve the consistency of messages, comments, and documentation as well. This removes various incarnations of 'row-security', 'row-level security', 'Row-security', etc, in favor of 'policy', 'row level security' or 'row_security' as appropriate. Happy Thanksgiving!
* Add infrastructure to save and restore GUC values.Robert Haas2014-11-24
| | | | | | This is further infrastructure for parallelism. Amit Khandekar, Noah Misch, Robert Haas
* Add missing case for CustomScan.Tom Lane2014-11-20
| | | | | | | Per KaiGai Kohei. In passing improve formatting of some code added in commit 30d7ae3c, because otherwise pgindent will make a mess of it.
* Revamp the WAL record format.Heikki Linnakangas2014-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each WAL record now carries information about the modified relation and block(s) in a standardized format. That makes it easier to write tools that need that information, like pg_rewind, prefetching the blocks to speed up recovery, etc. There's a whole new API for building WAL records, replacing the XLogRecData chains used previously. The new API consists of XLogRegister* functions, which are called for each buffer and chunk of data that is added to the record. The new API also gives more control over when a full-page image is written, by passing flags to the XLogRegisterBuffer function. This also simplifies the XLogReadBufferForRedo() calls. The function can dig the relation and block number from the WAL record, so they no longer need to be passed as arguments. For the convenience of redo routines, XLogReader now disects each WAL record after reading it, copying the main data part and the per-block data into MAXALIGNed buffers. The data chunks are not aligned within the WAL record, but the redo routines can assume that the pointers returned by XLogRecGet* functions are. Redo routines are now passed the XLogReaderState, which contains the record in the already-disected format, instead of the plain XLogRecord. The new record format also makes the fixed size XLogRecord header smaller, by removing the xl_len field. The length of the "main data" portion is now stored at the end of the WAL record, and there's a separate header after XLogRecord for it. The alignment padding at the end of XLogRecord is also removed. This compansates for the fact that the new format would otherwise be more bulky than the old format. Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera, Fujii Masao.
* Fix relpersistence setting in reindex_indexAlvaro Herrera2014-11-17
| | | | | | | | | | | | | | Buildfarm members with CLOBBER_CACHE_ALWAYS advised us that commit 85b506bbfc2937 was mistaken in setting the relpersistence value of the index directly in the relcache entry, within reindex_index. The reason for the failure is that an invalidation message that comes after mucking with the relcache entry directly, but before writing it to the catalogs, would cause the entry to become rebuilt in place from catalogs with the old contents, losing the update. Fix by passing the correct persistence value to RelationSetNewRelfilenode instead; this routine also writes the updated tuple to pg_class, avoiding the problem. Suggested by Tom Lane.
* Emit msg re skipping ANALYZE for absent inh treeSimon Riggs2014-11-15
| | | | | | | | | When checking a table that has an inheritance tree marked, if no child tables remain, we skip ANALYZE. This patch emits a message to show that the action has been skipped. Author: Etsuro Fujita Reviewer: Furuya Osamu
* Get rid of SET LOGGED indexes persistence kludgeAlvaro Herrera2014-11-15
| | | | | | | | | | | | This removes ATChangeIndexesPersistence() introduced by f41872d0c1239d36 which was too ugly to live for long. Instead, the correct persistence marking is passed all the way down to reindex_index, so that the transient relation built to contain the index relfilenode can get marked correctly right from the start. Author: Fabrízio de Royes Mello Review and editorialization by Michael Paquier and Álvaro Herrera