aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Add a new reloption, user_catalog_table.Robert Haas2013-12-10
| | | | | | | | | | | | | | | When this reloption is set and wal_level=logical is configured, we'll record the CIDs stamped by inserts, updates, and deletes to the table just as we would for an actual catalog table. This will allow logical decoding to use historical MVCC snapshots to access such tables just as they access ordinary catalog tables. Replication solutions built around the logical decoding machinery will likely need to set this operation for their configuration tables; it might also be needed by extensions which perform table access in their output functions. Andres Freund, reviewed by myself and others.
* Add new wal_level, logical, sufficient for logical decoding.Robert Haas2013-12-10
| | | | | | | | | | | | | | | | | | | | | | | When wal_level=logical, we'll log columns from the old tuple as configured by the REPLICA IDENTITY facility added in commit 07cacba983ef79be4a84fcd0e0ca3b5fcb85dd65. This makes it possible a properly-configured logical replication solution to correctly follow table updates even if they change the chosen key columns, or, with REPLICA IDENTITY FULL, even if the table has no key at all. Note that updates which do not modify the replica identity column won't log anything extra, making the choice of a good key (i.e. one that will rarely be changed) important to performance when wal_level=logical is configured. Each insert, update, or delete to a catalog table will also log the CMIN and/or CMAX values of stamped by the current transaction. This is necessary because logical decoding will require access to historical snapshots of the catalog in order to decode some data types, and the CMIN/CMAX values that we may need in order to judge row visibility may have been overwritten by the time we need them. Andres Freund, reviewed in various versions by myself, Heikki Linnakangas, KONDO Mitsumasa, and many others.
* Fix possible crash with nested SubLinks.Tom Lane2013-12-10
| | | | | | | | | | | | | An expression such as WHERE (... x IN (SELECT ...) ...) IN (SELECT ...) could produce an invalid plan that results in a crash at execution time, if the planner attempts to flatten the outer IN into a semi-join. This happens because convert_testexpr() was not expecting any nested SubLinks and would wrongly replace any PARAM_SUBLINK Params belonging to the inner SubLink. (I think the comment denying that this case could happen was wrong when written; it's certainly been wrong for quite a long time, since very early versions of the semijoin flattening logic.) Per report from Teodor Sigaev. Back-patch to all supported branches.
* Rename TABLE() to ROWS FROM().Noah Misch2013-12-10
| | | | | | | SQL-standard TABLE() is a subset of UNNEST(); they deal with arrays and other collection types. This feature, however, deals with set-returning functions. Use a different syntax for this feature to keep open the possibility of implementing the standard TABLE().
* Fixups for dsm.c's file descriptor handling.Robert Haas2013-12-09
| | | | Per complaint from Tom Lane.
* SSL: Support ECDH key exchangePeter Eisentraut2013-12-07
| | | | | | | | | | | | | | | | | This sets up ECDH key exchange, when compiling against OpenSSL that supports EC. Then the ECDHE-RSA and ECDHE-ECDSA cipher suites can be used for SSL connections. The latter one means that EC keys are now usable. The reason for EC key exchange is that it's faster than DHE and it allows to go to higher security levels where RSA will be horribly slow. There is also new GUC option ssl_ecdh_curve that specifies the curve name used for ECDH. It defaults to "prime256v1", which is the most common curve in use in HTTPS. From: Marko Kreen <markokr@gmail.com> Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>
* SSL: Add configuration option to prefer server cipher orderPeter Eisentraut2013-12-07
| | | | | | | | | | | | | | | By default, OpenSSL (and SSL/TLS in general) lets the client cipher order take priority. This is OK for browsers where the ciphers were tuned, but few PostgreSQL client libraries make the cipher order configurable. So it makes sense to have the cipher order in postgresql.conf take priority over client defaults. This patch adds the setting "ssl_prefer_server_ciphers" that can be turned on so that server cipher order is preferred. Per discussion, this now defaults to on. From: Marko Kreen <markokr@gmail.com> Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>
* Fix improper abort during update chain lockingAlvaro Herrera2013-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 247c76a98909, I added some code to do fine-grained checking of MultiXact status of locking/updating transactions when traversing an update chain. There was a thinko in that patch which would have the traversing abort, that is return HeapTupleUpdated, when the other transaction is a committed lock-only. In this case we should ignore it and return success instead. Of course, in the case where there is a committed update, HeapTupleUpdated is the correct return value. A user-visible symptom of this bug is that in REPEATABLE READ and SERIALIZABLE transaction isolation modes spurious serializability errors can occur: ERROR: could not serialize access due to concurrent update In order for this to happen, there needs to be a tuple that's key-share- locked and also updated, and the update must abort; a subsequent transaction trying to acquire a new lock on that tuple would abort with the above error. The reason is that the initial FOR KEY SHARE is seen as committed by the new locking transaction, which triggers this bug. (If the UPDATE commits, then the serialization error is correctly reported.) When running a query in READ COMMITTED mode, what happens is that the locking is aborted by the HeapTupleUpdated return value, then EvalPlanQual fetches the newest version of the tuple, which is then the only version that gets locked. (The second time the tuple is checked there is no misbehavior on the committed lock-only, because it's not checked by the code that traverses update chains; so no bug.) Only the newest version of the tuple is locked, not older ones, but this is harmless. The isolation test added by this commit illustrates the desired behavior, including the proper serialization errors that get thrown. Backpatch to 9.3.
* Clear retry flags properly in replacement OpenSSL sock_write function.Tom Lane2013-12-05
| | | | | | | | | | | | Current OpenSSL code includes a BIO_clear_retry_flags() step in the sock_write() function. Either we failed to copy the code correctly, or they added this since we copied it. In any case, lack of the clear step appears to be the cause of the server lockup after connection loss reported in bug #8647 from Valentine Gogichashvili. Assume that this is correct coding for all OpenSSL versions, and hence back-patch to all supported branches. Diagnosis and patch by Alexander Kukushkin.
* Avoid resetting Xmax when it's a multi with an aborted updateAlvaro Herrera2013-12-05
| | | | | | | | | | | | | | | | | | HeapTupleSatisfiesUpdate can very easily "forget" tuple locks while checking the contents of a multixact and finding it contains an aborted update, by setting the HEAP_XMAX_INVALID bit. This would lead to concurrent transactions not noticing any previous locks held by transactions that might still be running, and thus being able to acquire subsequent locks they wouldn't be normally able to acquire. This bug was introduced in commit 1ce150b7bb; backpatch this fix to 9.3, like that commit. This change reverts the change to the delete-abort-savept isolation test in 1ce150b7bb, because that behavior change was caused by this bug. Noticed by Andres Freund while investigating a different issue reported by Noah Misch.
* Don't include unused space in LOG_NEWPAGE records.Heikki Linnakangas2013-12-04
| | | | | This is the same trick we use when taking a full page image of a buffer passed to XLogInsert.
* Fix full-page writes of internal GIN pages.Heikki Linnakangas2013-12-03
| | | | | | | | | | | | | | | Insertion to a non-leaf GIN page didn't make a full-page image of the page, which is wrong. The code used to do it correctly, but was changed (commit 853d1c3103fa961ae6219f0281885b345593d101) because the redo-routine didn't track incomplete splits correctly when the page was restored from a full page image. Of course, that was not right way to fix it, the redo routine should've been fixed instead. The redo-routine was surreptitiously fixed in 2010 (commit 4016bdef8aded77b4903c457050622a5a1815c16), so all we need to do now is revert the code that creates the record to its original form. This doesn't change the format of the WAL record. Backpatch to all supported versions.
* Report exit code from external recovery commands properlyPeter Eisentraut2013-12-02
| | | | | | | | | | When an external recovery command such as restore_command or archive_cleanup_command fails, report the exit code properly, distinguishing signals and normal exists, using the existing wait_result_to_str() facility, instead of just reporting the return value from system(). Reviewed-by: Peter Geoghegan <pg@heroku.com>
* Fix crash in assign_collations_walker for EXISTS with empty SELECT list.Tom Lane2013-12-02
| | | | | We (I think I, actually) forgot about this corner case while coding collation resolution. Per bug #8648 from Arjen Nienhuis.
* Flag mmap implemenation of dynamic shared memory as resize-capable.Robert Haas2013-12-02
| | | | Error noted by Heikki Linnakangas
* Make NUM_TOCHAR_prepare and NUM_TOCHAR_finish macros declare "len".Robert Haas2013-12-02
| | | | | | | | Remove the variable from the enclosing scopes so that nothing can be relying on it. The net result of this refactoring is that we get rid of a few unnecessary strlen() calls. Original patch from Greg Jaskiewicz, substantially expanded by me.
* Avoid out-of-bounds read in errfinish if error_stack_depth < 0.Robert Haas2013-12-02
| | | | | | | | | If errordata_stack_depth < 0, we won't find that out and correct the problem until CHECK_STACK_DEPTH() is invoked. In the meantime, elevel will be set based on an invalid read. This is probably harmless in practice, but it seems cleaner this way. Xi Wang
* Translation updatesPeter Eisentraut2013-12-02
|
* Fix a couple of bugs in MultiXactId freezingAlvaro Herrera2013-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look into a multixact to check the members against cutoff_xid. This means that a very old Xid could survive hidden within a multi, possibly outliving its CLOG storage. In the distant future, this would cause clog lookup failures: ERROR: could not access status of transaction 3883960912 DETAIL: Could not open file "pg_clog/0E78": No such file or directory. This mostly was problematic when the updating transaction aborted, since in that case the row wouldn't get pruned away earlier in vacuum and the multixact could possibly survive for a long time. In many cases, data that is inaccessible for this reason way can be brought back heuristically. As a second bug, heap_freeze_tuple() didn't properly handle multixacts that need to be frozen according to cutoff_multi, but whose updater xid is still alive. Instead of preserving the update Xid, it just set Xmax invalid, which leads to both old and new tuple versions becoming visible. This is pretty rare in practice, but a real threat nonetheless. Existing corrupted rows, unfortunately, cannot be repaired in an automated fashion. Existing physical replicas might have already incorrectly frozen tuples because of different behavior than in master, which might only become apparent in the future once pg_multixact/ is truncated; it is recommended that all clones be rebuilt after upgrading. Following code analysis caused by bug report by J Smith in message CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com and privately by F-Secure. Backpatch to 9.3, where freezing of MultiXactIds was introduced. Analysis and patch by Andres Freund, with some tweaks by Álvaro.
* Don't TransactionIdDidAbort in HeapTupleGetUpdateXidAlvaro Herrera2013-11-29
| | | | | | | | | | | | | | | | It is dangerous to do so, because some code expects to be able to see what's the true Xmax even if it is aborted (particularly while traversing HOT chains). So don't do it, and instead rely on the callers to verify for abortedness, if necessary. Several race conditions and bugs fixed in the process. One isolation test changes the expected output due to these. This also reverts commit c235a6a589b, which is no longer necessary. Backpatch to 9.3, where this function was introduced. Andres Freund
* Truncate pg_multixact/'s contents during crash recoveryAlvaro Herrera2013-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9dc842f08 of 8.2 era prevented MultiXact truncation during crash recovery, because there was no guarantee that enough state had been setup, and because it wasn't deemed to be a good idea to remove data during crash recovery anyway. Since then, due to Hot-Standby, streaming replication and PITR, the amount of time a cluster can spend doing crash recovery has increased significantly, to the point that a cluster may even never come out of it. This has made not truncating the content of pg_multixact/ not defensible anymore. To fix, take care to setup enough state for multixact truncation before crash recovery starts (easy since checkpoints contain the required information), and move the current end-of-recovery actions to a new TrimMultiXact() function, analogous to TrimCLOG(). At some later point, this should probably done similarly to the way clog.c is doing it, which is to just WAL log truncations, but we can't do that for the back branches. Back-patch to 9.0. 8.4 also has the problem, but since there's no hot standby there, it's much less pressing. In 9.2 and earlier, this patch is simpler than in newer branches, because multixact access during recovery isn't required. Add appropriate checks to make sure that's not happening. Andres Freund
* Fix full-table-vacuum request mechanism for MultiXactIdsAlvaro Herrera2013-11-29
| | | | | | | | | | | | | | | | | While autovacuum dutifully launched anti-multixact-wraparound vacuums when the multixact "age" was reached, the vacuum code was not aware that it needed to make them be full table vacuums. As the resulting partial-table vacuums aren't capable of actually increasing relminmxid, autovacuum continued to launch anti-wraparound vacuums that didn't have the intended effect, until age of relfrozenxid caused the vacuum to finally be a full table one via vacuum_freeze_table_age. To fix, introduce logic for multixacts similar to that for plain TransactionIds, using the same GUCs. Backpatch to 9.3, where permanent MultiXactIds were introduced. Andres Freund, some cleanup by Álvaro
* Replace hardcoded 200000000 with autovacuum_freeze_max_ageAlvaro Herrera2013-11-29
| | | | | | | | | | | | Parts of the code used autovacuum_freeze_max_age to determine whether anti-multixact-wraparound vacuums are necessary, while others used a hardcoded 200000000 value. This leads to problems when autovacuum_freeze_max_age is set to a non-default value. Use the latter everywhere. Backpatch to 9.3, where vacuuming of multixacts was introduced. Andres Freund
* Be sure to release proc->backendLock after SetupLockInTable() failure.Tom Lane2013-11-29
| | | | | | | | | | | | | | The various places that transferred fast-path locks to the main lock table neglected to release the PGPROC's backendLock if SetupLockInTable failed due to being out of shared memory. In most cases this is no big deal since ensuing error cleanup would release all held LWLocks anyway. But there are some hot-standby functions that don't consider failure of FastPathTransferRelationLocks to be a hard error, and in those cases this oversight could lead to system lockup. For consistency, make all of these places look the same as FastPathTransferRelationLocks. Noted while looking for the cause of Dan Wood's bugs --- this wasn't it, but it's a bug anyway.
* Fix assorted race conditions in the new timeout infrastructure.Tom Lane2013-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent handle_sig_alarm from losing control partway through due to a query cancel (either an asynchronous SIGINT, or a cancel triggered by one of the timeout handler functions). That would at least result in failure to schedule any required future interrupt, and might result in actual corruption of timeout.c's data structures, if the interrupt happened while we were updating those. We could still lose control if an asynchronous SIGINT arrives just as the function is entered. This wouldn't break any data structures, but it would have the same effect as if the SIGALRM interrupt had been silently lost: we'd not fire any currently-due handlers, nor schedule any new interrupt. To forestall that scenario, forcibly reschedule any pending timer interrupt during AbortTransaction and AbortSubTransaction. We can avoid any extra kernel call in most cases by not doing that until we've allowed LockErrorCleanup to kill the DEADLOCK_TIMEOUT and LOCK_TIMEOUT events. Another hazard is that some platforms (at least Linux and *BSD) block a signal before calling its handler and then unblock it on return. When we longjmp out of the handler, the unblock doesn't happen, and the signal is left blocked indefinitely. Again, we can fix that by forcibly unblocking signals during AbortTransaction and AbortSubTransaction. These latter two problems do not manifest when the longjmp reaches postgres.c, because the error recovery code there kills all pending timeout events anyway, and it uses sigsetjmp(..., 1) so that the appropriate signal mask is restored. So errors thrown outside any transaction should be OK already, and cleaning up in AbortTransaction and AbortSubTransaction should be enough to fix these issues. (We're assuming that any code that catches a query cancel error and doesn't re-throw it will do at least a subtransaction abort to clean up; but that was pretty much required already by other subsystems.) Lastly, ProcSleep should not clear the LOCK_TIMEOUT indicator flag when disabling that event: if a lock timeout interrupt happened after the lock was granted, the ensuing query cancel is still going to happen at the next CHECK_FOR_INTERRUPTS, and we want to report it as a lock timeout not a user cancel. Per reports from Dan Wood. Back-patch to 9.3 where the new timeout handling infrastructure was introduced. We may at some point decide to back-patch the signal unblocking changes further, but I'll desist from that until we hear actual field complaints about it.
* Refine our definition of what constitutes a system relation.Robert Haas2013-11-28
| | | | | | | | | | | | | | | | | | | | | | | Although user-defined relations can't be directly created in pg_catalog, it's possible for them to end up there, because you can create them in some other schema and then use ALTER TABLE .. SET SCHEMA to move them there. Previously, such relations couldn't afterwards be manipulated, because IsSystemRelation()/IsSystemClass() rejected all attempts to modify objects in the pg_catalog schema, regardless of their origin. With this patch, they now reject only those objects in pg_catalog which were created at initdb-time, allowing most operations on user-created tables in pg_catalog to proceed normally. This patch also adds new functions IsCatalogRelation() and IsCatalogClass(), which is similar to IsSystemRelation() and IsSystemClass() but with a slightly narrower definition: only TOAST tables of system catalogs are included, rather than *all* TOAST tables. This is currently used only for making decisions about when invalidation messages need to be sent, but upcoming logical decoding patches will find other uses for this information. Andres Freund, with some modifications by me.
* Another gin_desc fix.Heikki Linnakangas2013-11-28
| | | | The number of items inserted was incorrectly printed as if it was a boolean.
* Fix gin_desc routine to match the WAL format.Heikki Linnakangas2013-11-28
| | | | | | | In the GIN incomplete-splits patch, I used BlockIdDatas to store the block number of left and right children, when inserting a downlink after a split to an internal page posting list page. But gin_desc thought they were stored as BlockNumbers.
* Fix latent(?) race condition in LockReleaseAll.Tom Lane2013-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We have for a long time checked the head pointer of each of the backend's proclock lists and skipped acquiring the corresponding locktable partition lock if the head pointer was NULL. This was safe enough in the days when proclock lists were changed only by the owning backend, but it is pretty questionable now that the fast-path patch added cases where backends add entries to other backends' proclock lists. However, we don't really wish to revert to locking each partition lock every time, because in simple transactions that would add a lot of useless lock/unlock cycles on already-heavily-contended LWLocks. Fortunately, the only way that another backend could be modifying our proclock list at this point would be if it was promoting a formerly fast-path lock of ours; and any such lock must be one that we'd decided not to delete in the previous loop over the locallock table. So it's okay if we miss seeing it in this loop; we'd just decide not to delete it again. However, once we've detected a non-empty list, we'd better re-fetch the list head pointer after acquiring the partition lock. This guards against possibly fetching a corrupt-but-non-null pointer if pointer fetch/store isn't atomic. It's not clear if any practical architectures are like that, but we've never assumed that before and don't wish to start here. In any case, the situation certainly deserves a code comment. While at it, refactor the partition traversal loop to use a for() construct instead of a while() loop with goto's. Back-patch, just in case the risk is real and not hypothetical.
* Unbreak buildfarmAlvaro Herrera2013-11-28
| | | | | I removed an intermediate commit before pushing and forgot to test the resulting tree :-(
* Use a more granular approach to follow update chainsAlvaro Herrera2013-11-28
| | | | | | | | Instead of simply checking the KEYS_UPDATED bit, we need to check whether each lock held on the future version of the tuple conflicts with the lock we're trying to acquire. Per bug report #8434 by Tomonari Katsumata
* Compare Xmin to previous Xmax when locking an update chainAlvaro Herrera2013-11-28
| | | | | | | | | | | | Not doing so causes us to traverse an update chain that has been broken by concurrent page pruning. All other code that traverses update chains uses this check as one of the cases in which to stop iterating, so replicate it here too. Failure to do so leads to erroneous CLOG, subtrans or multixact lookups. Per discussion following the bug report by J Smith in CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com as diagnosed by Andres Freund.
* Don't try to set InvalidXid as page pruning hintAlvaro Herrera2013-11-28
| | | | | | | | | | | | | | If a transaction updates/deletes a tuple just before aborting, and a concurrent transaction tries to prune the page concurrently, the pruner may see HeapTupleSatisfiesVacuum return HEAPTUPLE_DELETE_IN_PROGRESS, but a later call to HeapTupleGetUpdateXid() return InvalidXid. This would cause an assertion failure in development builds, but would be otherwise Mostly Harmless. Fix by checking whether the updater Xid is valid before trying to apply it as page prune point. Reported by Andres in 20131124000203.GA4403@alap2.anarazel.de
* Cope with heap_fetch failure while locking an update chainAlvaro Herrera2013-11-28
| | | | | | | | | | | The reason for the fetch failure is that the tuple was removed because it was dead; so the failure is innocuous and can be ignored. Moreover, there's no need for further work and we can return success to the caller immediately. EvalPlanQualFetch is doing something very similar to this already. Report and test case from Andres Freund in 20131124000203.GA4403@alap2.anarazel.de
* Fix stale-pointer problem in fast-path locking logic.Tom Lane2013-11-27
| | | | | | | | | | | | | | | | | | | | | | | | When acquiring a lock in fast-path mode, we must reset the locallock object's lock and proclock fields to NULL. They are not necessarily that way to start with, because the locallock could be left over from a failed lock acquisition attempt earlier in the transaction. Failure to do this led to all sorts of interesting misbehaviors when LockRelease tried to clean up no-longer-related lock and proclock objects in shared memory. Per report from Dan Wood. In passing, modify LockRelease to elog not just Assert if it doesn't find lock and proclock objects for a formerly fast-path lock, matching the code in FastPathGetRelationLockEntry and LockRefindAndRelease. This isn't a bug but it will help in diagnosing any future bugs in this area. Also, modify FastPathTransferRelationLocks and FastPathGetRelationLockEntry to break out of their loops over the fastpath array once they've found the sole matching entry. This was inconsistently done in some search loops and not others. Improve assorted related comments, too. Back-patch to 9.2 where the fast-path mechanism was introduced.
* Minor corrections in lmgr/README.Tom Lane2013-11-27
| | | | | | | | | | Correct an obsolete statement that no backend touches another backend's PROCLOCK lists. This was probably wrong even when written (the deadlock checker looks at everybody's lists), and it's certainly quite wrong now that fast-path locking can require creation of lock and proclock objects on behalf of another backend. Also improve some statements in the hot standby explanation, and do one or two other trivial bits of wordsmithing/ reformatting.
* Get rid of the post-recovery cleanup step of GIN page splits.Heikki Linnakangas2013-11-27
| | | | | | | | | | | | | | | | | | Replace it with an approach similar to what GiST uses: when a page is split, the left sibling is marked with a flag indicating that the parent hasn't been updated yet. When the parent is updated, the flag is cleared. If an insertion steps on a page with the flag set, it will finish split before proceeding with the insertion. The post-recovery cleanup mechanism was never totally reliable, as insertion to the parent could fail e.g because of running out of memory or disk space, leaving the tree in an inconsistent state. This also divides the responsibility of WAL-logging more clearly between the generic ginbtree.c code, and the parts specific to entry and posting trees. There is now a common WAL record format for insertions and deletions, which is written by ginbtree.c, followed by tree-specific payload, which is returned by the placetopage- and split- callbacks.
* More GIN refactoring.Heikki Linnakangas2013-11-27
| | | | | | | | | | | | Separate the insertion payload from the more static portions of GinBtree. GinBtree now only contains information related to searching the tree, and the information of what to insert is passed separately. Add root block number to GinBtree, instead of passing it around all the functions as argument. Split off ginFinishSplit() from ginInsertValue(). ginFinishSplit is responsible for finding the parent and inserting the downlink to it.
* Don't update relfrozenxid if any pages were skipped.Heikki Linnakangas2013-11-27
| | | | | | | | | | | | | | | | | | | | | Vacuum recognizes that it can update relfrozenxid by checking whether it has processed all pages of a relation. Unfortunately it performed that check after truncating the dead pages at the end of the relation, and used the new number of pages to decide whether all pages have been scanned. If the new number of pages happened to be smaller or equal to the number of pages scanned, it incorrectly decided that all pages were scanned. This can lead to relfrozenxid being updated, even though some pages were skipped that still contain old XIDs. That can lead to data loss due to xid wraparounds with some rows suddenly missing. This likely has escaped notice so far because it takes a large number (~2^31) of xids being used to see the effect, while a full-table vacuum before that would fix the issue. The incorrect logic was introduced by commit b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. Backpatch this fix down to 8.4, like that commit. Andres Freund, with some modifications by me.
* Implement information_schema.parameters.parameter_default columnPeter Eisentraut2013-11-26
| | | | | | Reviewed-by: Ali Dar <ali.munir.dar@gmail.com> Reviewed-by: Amit Khandekar <amit.khandekar@enterprisedb.com> Reviewed-by: Rodolfo Campero <rodolfo.campero@anachronics.com>
* Add missing entry for session_preload_libraries in sample config.Jeff Davis2013-11-25
| | | | The omission was apparently an oversight in the original patch.
* Change SET LOCAL/CONSTRAINTS/TRANSACTION and ABORT behaviorBruce Momjian2013-11-25
| | | | | | | Change SET LOCAL/CONSTRAINTS/TRANSACTION behavior outside of a transaction block from error (post-9.3) to warning. (Was nothing in <= 9.3.) Also change ABORT outside of a transaction block from notice to warning.
* Lessen library-loading log level.Jeff Davis2013-11-24
| | | | | | | | | | | | | | | Previously, messages were emitted at the LOG level every time a backend preloaded a library. That was acceptable (though unnecessary) for shared_preload_libraries; but it was excessive for local_preload_libraries and session_preload_libraries. Reduce to DEBUG1. Also, there was logic in the EXEC_BACKEND case to avoid repeated messages for shared_preload_libraries by demoting them to DEBUG2. DEBUG1 seems more appropriate there, as well, so eliminate that special case. Peter Geoghegan.
* Fix new and latent bugs with errno handling in secure_read/secure_write.Tom Lane2013-11-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | These functions must be careful that they return the intended value of errno to their callers. There were several scenarios where this might not happen: 1. The recent SSL renegotiation patch added a hunk of code that would execute after setting errno. In the first place, it's doubtful that we should consider renegotiation to be successfully completed after a failure, and in the second, there's no real guarantee that the called OpenSSL routines wouldn't clobber errno. Fix by not executing that hunk except during success exit. 2. errno was left in an unknown state in case of an unrecognized return code from SSL_get_error(). While this is a "can't happen" case, it seems like a good idea to be sure we know what would happen, so reset errno to ECONNRESET in such cases. (The corresponding code in libpq's fe-secure.c already did this.) 3. There was an (undocumented) assumption that client_read_ended() wouldn't change errno. While true in the current state of the code, this seems less than future-proof. Add explicit saving/restoring of errno to make sure that changes in the called functions won't break things. I see no need to back-patch, since #1 is new code and the other two issues are mostly hypothetical. Per discussion with Amit Kapila.
* Fix array slicing of int2vector and oidvector values.Tom Lane2013-11-23
| | | | | | | | | | | | | | | | | | | | | | | | The previous coding labeled expressions such as pg_index.indkey[1:3] as being of int2vector type; which is not right because the subscript bounds of such a result don't, in general, satisfy the restrictions of int2vector. To fix, implicitly promote the result of slicing int2vector to int2[], or oidvector to oid[]. This is similar to what we've done with domains over arrays, which is a good analogy because these types are very much like restricted domains of the corresponding regular-array types. A side-effect is that we now also forbid array-element updates on such columns, eg while "update pg_index set indkey[4] = 42" would have worked before if you were superuser (and corrupted your catalogs irretrievably, no doubt) it's now disallowed. This seems like a good thing since, again, some choices of subscripting would've led to results not satisfying the restrictions of int2vector. The case of an array-slice update was rejected before, though with a different error message than you get now. We could make these cases work in future if we added a cast from int2[] to int2vector (with a cast function checking the subscript restrictions) but it seems unlikely that there's any value in that. Per report from Ronan Dunklau. Back-patch to all supported branches because of the crash risks involved.
* Fix thinko in SPI_execute_plan() callsPeter Eisentraut2013-11-23
| | | | | | | | Two call sites were apparently thinking that the last argument of SPI_execute_plan() is the number of query parameters, but it is actually the row limit. Change the calls to 0, since we don't care about the limit there. The previous code didn't break anything, but it was still wrong.
* Avoid potential buffer overflow crashPeter Eisentraut2013-11-23
| | | | | | | | | | A pointer to a C string was treated as a pointer to a "name" datum and passed to SPI_execute_plan(). This pointer would then end up being passed through datumCopy(), which would try to copy the entire 64 bytes of name data, thus running past the end of the C string. Fix by converting the string to a proper name structure. Found by LLVM AddressSanitizer.
* Flatten join alias Vars before pulling up targetlist items from a subquery.Tom Lane2013-11-22
| | | | | | | | | | | | | | | | | | | | | | | | pullup_replace_vars()'s decisions about whether a pulled-up replacement expression needs to be wrapped in a PlaceHolderVar depend on the assumption that what looks like a Var behaves like a Var. However, if the Var is a join alias reference, later flattening of join aliases might replace the Var with something that's not a Var at all, and should have been wrapped. To fix, do a forcible pass of flatten_join_alias_vars() on the subquery targetlist before we start to copy items out of it. We'll re-run that processing on the pulled-up expressions later, but that's harmless. Per report from Ken Tanzer; the added regression test case is based on his example. This bug has been there since the PlaceHolderVar mechanism was invented, but has escaped detection because the circumstances that trigger it are fairly narrow. You need a flattenable query underneath an outer join, which contains another flattenable query inside a join of its own, with a dangerous expression (a constant or something else non-strict) in that one's targetlist. Having seen this, I'm wondering if it wouldn't be prudent to do all alias-variable flattening earlier, perhaps even in the rewriter. But that would probably not be a back-patchable change.
* Fix Hot-Standby initialization of clog and subtrans.Heikki Linnakangas2013-11-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These bugs can cause data loss on standbys started with hot_standby=on at the moment they start to accept read only queries, by marking committed transactions as uncommited. The likelihood of such corruptions is small unless the primary has a high transaction rate. 5a031a5556ff83b8a9646892715d7fef415b83c3 fixed bugs in HS's startup logic by maintaining less state until at least STANDBY_SNAPSHOT_PENDING state was reached, missing the fact that both clog and subtrans are written to before that. This only failed to fail in common cases because the usage of ExtendCLOG in procarray.c was superflous since clog extensions are actually WAL logged. f44eedc3f0f347a856eea8590730769125964597/I then tried to fix the missing extensions of pg_subtrans due to the former commit's changes - which are not WAL logged - by performing the extensions when switching to a state > STANDBY_INITIALIZED and not performing xid assignments before that - again missing the fact that ExtendCLOG is unneccessary - but screwed up twice: Once because latestObservedXid wasn't updated anymore in that state due to the earlier commit and once by having an off-by-one error in the loop performing extensions. This means that whenever a CLOG_XACTS_PER_PAGE (32768 with default settings) boundary was crossed between the start of the checkpoint recovery started from and the first xl_running_xact record old transactions commit bits in pg_clog could be overwritten if they started and committed in that window. Fix this mess by not performing ExtendCLOG() in HS at all anymore since it's unneeded and evidently dangerous and by performing subtrans extensions even before reaching STANDBY_SNAPSHOT_PENDING. Analysis and patch by Andres Freund. Reported by Christophe Pettus. Backpatch down to 9.0, like the previous commit that caused this.
* Avoid acquiring spinlock when checking if recovery has finished, for speed.Heikki Linnakangas2013-11-22
| | | | | | | | | | | | RecoveryIsInProgress() can be called very frequently. During normal operation, it just checks a backend-local variable and returns quickly, but during hot standby, it checks a spinlock-protected shared variable. Those spinlock acquisitions can become a point of contention on a busy hot standby system. Replace the spinlock acquisition with a memory barrier. Per discussion with Andres Freund, Ants Aasma and Merlin Moncure.