aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Improve generation algorithm for database system identifier.Tom Lane2014-04-26
| | | | | | | | | | | | | As noted some time ago, the original coding had a typo ("|" for "^") that made the result less unique than intended. Even the intended behavior is obsolete since it was based on wanting to produce a usable value even if we didn't have int64 arithmetic --- a limitation we stopped supporting years ago. Instead, let's redefine the system identifier as tv_sec in the upper 32 bits (same as before), tv_usec in the next 20 bits, and the low 12 bits of getpid() in the remaining bits. This is still hardly guaranteed-universally-unique, but it's noticeably better than before. Per my proposal at <29019.1374535940@sss.pgh.pa.us>
* Record the proper typmod for an index expression column.Tom Lane2014-04-26
| | | | | | | | | | We should use exprTypmod() to extract the typmod of the expression, instead of just blindly storing -1. This seems to have been an aboriginal oversight in commit fc8d970cbcdd6f025475822a4cf01dfda0873226 which introduced general-expression indexes. The consequences are only cosmetic at present, since the index machinery doesn't really look at typmod for index columns; but still it seems best to describe the column type as precisely as we can. Per off-list complaint from Thomas Fanghaenel.
* Fix off-by-one bug in LWLockRegisterTranche().Tom Lane2014-04-25
| | | | | | | Original coding failed to enlarge the array as required if the requested tranche_id was equal to LWLockTranchesAllocated. In passing, fix poor style of not casting the result of (re)palloc.
* Fix race when updating a tuple concurrently locked by another processAlvaro Herrera2014-04-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a tuple is locked, and this lock is later upgraded either to an update or to a stronger lock, and in the meantime some other process tries to lock, update or delete the same tuple, it (the tuple) could end up being updated twice, or having conflicting locks held. The reason for this is that the second updater checks for a change in Xmax value, or in the HEAP_XMAX_IS_MULTI infomask bit, after noticing the first lock; and if there's a change, it restarts and re-evaluates its ability to update the tuple. But it neglected to check for changes in lock strength or in lock-vs-update status when those two properties stayed the same. This would lead it to take the wrong decision and continue with its own update, when in reality it shouldn't do so but instead restart from the top. This could lead to either an assertion failure much later (when a multixact containing multiple updates is detected), or duplicate copies of tuples. To fix, make sure to compare the other relevant infomask bits alongside the Xmax value and HEAP_XMAX_IS_MULTI bit, and restart from the top if necessary. Also, in the belt-and-suspenders spirit, add a check to MultiXactCreateFromMembers that a multixact being created does not have two or more members that are claimed to be updates. This should protect against other bugs that might cause similar bogus situations. Backpatch to 9.3, where the possibility of multixacts containing updates was introduced. (In prior versions it was possible to have the tuple lock upgraded from shared to exclusive, and an update would not restart from the top; yet we're protected against a bug there because there's always a sleep to wait for the locking transaction to complete before continuing to do anything. Really, the fact that tuple locks always conflicted with concurrent updates is what protected against bugs here.) Per report from Andrew Dunstan and Josh Berkus in thread at http://www.postgresql.org/message-id/534C8B33.9050807@pgexperts.com Bug analysis by Andres Freund.
* Reset pg_stat_activity.xact_start during PREPARE TRANSACTION.Tom Lane2014-04-24
| | | | | | | | | | | | | | | Once we've completed a PREPARE, our session is not running a transaction, so its entry in pg_stat_activity should show xact_start as null, rather than leaving the value as the start time of the now-prepared transaction. I think possibly this oversight was triggered by faulty extrapolation from the adjacent comment that says PrepareTransaction should not call AtEOXact_PgStat, so tweak the wording of that comment. Noted by Andres Freund while considering bug #10123 from Maxim Boguk, although this error doesn't seem to explain that report. Back-patch to all active branches.
* Allow polymorphic aggregates to have non-polymorphic state data types.Tom Lane2014-04-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before 9.4, such an aggregate couldn't be declared, because its final function would have to have polymorphic result type but no polymorphic argument, which CREATE FUNCTION would quite properly reject. The ordered-set-aggregate patch found a workaround: allow the final function to be declared as accepting additional dummy arguments that have types matching the aggregate's regular input arguments. However, we failed to notice that this problem applies just as much to regular aggregates, despite the fact that we had a built-in regular aggregate array_agg() that was known to be undeclarable in SQL because its final function had an illegal signature. So what we should have done, and what this patch does, is to decouple the extra-dummy-arguments behavior from ordered-set aggregates and make it generally available for all aggregate declarations. We have to put this into 9.4 rather than waiting till later because it slightly alters the rules for declaring ordered-set aggregates. The patch turned out a bit bigger than I'd hoped because it proved necessary to record the extra-arguments option in a new pg_aggregate column. I'd thought we could just look at the final function's pronargs at runtime, but that didn't work well for variadic final functions. It's probably just as well though, because it simplifies life for pg_dump to record the option explicitly. While at it, fix array_agg() to have a valid final-function signature, and add an opr_sanity test to notice future deviations from polymorphic consistency. I also marked the percentile_cont() aggregates as not needing extra arguments, since they don't.
* Update obsolete comments.Heikki Linnakangas2014-04-23
| | | | We no longer have a TLI field in the page header.
* Fix typos in comment.Heikki Linnakangas2014-04-23
|
* Cleanup of new b-tree page deletion code.Heikki Linnakangas2014-04-23
| | | | | | | | | | | | When marking a branch as half-dead, a pointer to the top of the branch is stored in the leaf block's hi-key. During normal operation, the high key was left in place, and the block number was just stored in the ctid field of the high key tuple, but in WAL replay, the high key was recreated as a truncated tuple with zero columns. For the sake of easier debugging, also truncate the tuple in normal operation, so that the page is identical after WAL replay. Also, rename the 'downlink' field in the WAL record to 'topparent', as that seems like a more descriptive name. And make sure it's set to invalid when unlinking the leaf page.
* Fix documentation of FmgrInfo.fn_nargs.Tom Lane2014-04-22
| | | | | | | Some ancient comments claimed that fn_nargs could be -1 to indicate a variable number of input arguments; but this was never implemented, and is at variance with what we ultimately did with "variadic" functions. Update the comments.
* Fix broken logic in logical_heap_rewrite_flush_mappings().Tom Lane2014-04-22
| | | | | It's blatantly obvious that commit 4d0d607a454ee832574afd52a3c515099cc85eb3 wasn't tested. The leak's real enough, though.
* revert 4d0d607a454ee832574afd52a3c515099cc85eb3Bruce Momjian2014-04-22
| | | | Revert due to contrib/test_decoding regression failure
* release memory used while flushing logical mappingsBruce Momjian2014-04-22
| | | | Patch by Ants Aasma
* Fix bug in the new B-tree incomplete-split code.Heikki Linnakangas2014-04-22
| | | | | | Forgot to update LSN of left sibling's page, when creating a new root. I fixed this for regular insertions and page splits earlier, but missed new root creation.
* Fix Gin README.Heikki Linnakangas2014-04-22
| | | | | | | The README incorrectly claimed that GIN posting tree pages contain an array of uncompressed items in addition to compressed posting lists. Earlier versions of the GIN posting list compression patch worked that way, but not the one that was committed.
* Fix bug in new B-tree page deletion code.Heikki Linnakangas2014-04-22
| | | | | When modifying a page, must hold an exclusive lock. A shared lock is obviously not good enough.
* Retain original physical order of tuples in redo of b-tree splits.Heikki Linnakangas2014-04-22
| | | | | It makes no difference to the system, but minimizing the differences between a master and standby makes debugging simpler.
* Fix rm_desc routine of b-tree page delete records.Heikki Linnakangas2014-04-22
| | | | A couple of typos from my refactoring of the page deletion patch.
* Avoid transient bogus page contents when creating a sequence.Heikki Linnakangas2014-04-22
| | | | | | | | | | | | Don't use simple_heap_insert to insert the tuple to a sequence relation. simple_heap_insert creates a heap insertion WAL record, and replaying that will create a regular heap page without the special area containing the sequence magic constant, which is wrong for a sequence. That was not a bug because we always created a sequence WAL record after that, and replaying that overwrote the bogus heap page, and the transient state could never be seen by another backend because it was only done when creating a new sequence relation. But it's simpler and cleaner to avoid that in the first place.
* Fix typo.Robert Haas2014-04-20
| | | | Etsuro Fujita
* Fix typoMagnus Hagander2014-04-18
| | | | Amit Langote
* report stat() error in trigger file checkBruce Momjian2014-04-17
| | | | | | | Permissions might prevent the existence of the trigger file from being checked. Per report from Andres Freund
* Set the all-visible flag on heap page before writing WAL record, not after.Heikki Linnakangas2014-04-17
| | | | | | | | | | | | | | | | If we set the all-visible flag after writing WAL record, and XLogInsert takes a full-page image of the page, the image would not include the flag. We will then proceed to set the VM bit, which would then be set without the corresponding all-visible flag on the heap page. Found by comparing page images on master and standby, after writing/replaying each WAL record. (There is still a discrepancy: the all-visible flag won't be set after replaying the HEAP_CLEAN record, even though it is set in the master. However, it will be set when replaying the HEAP2_VISIBLE record and setting the VM bit, so the all-visible flag and VM bit are always consistent on the standby, even though they are momentarily out-of-sync with master) Backpatch to 9.3 where this code was introduced.
* Rename EXPLAIN ANALYZE's "total runtime" output to "execution time".Tom Lane2014-04-16
| | | | | | | | | | | | | Now that EXPLAIN also outputs a "planning time" measurement, the use of "total" here seems rather confusing: it sounds like it might include the planning time which of course it doesn't. Majority opinion was that "execution time" is a better label, so we'll call it that. This should be noted as a backwards incompatibility for tools that examine EXPLAIN ANALYZE output. In passing, I failed to resist the temptation to do a little editing on the materialized-view example affected by this change.
* Fix object identities for text search objectsAlvaro Herrera2014-04-16
| | | | | | | We were neglecting to schema-qualify them. Backpatch to 9.3, where object identities were introduced as a concept by commit f8348ea32ec8.
* Use AF_UNSPEC not PF_UNSPEC in getaddrinfo calls.Tom Lane2014-04-16
| | | | | | | | | | | | | | | | | | According to the Single Unix Spec and assorted man pages, you're supposed to use the constants named AF_xxx when setting ai_family for a getaddrinfo call. In a few places we were using PF_xxx instead. Use of PF_xxx appears to be an ancient BSD convention that was not adopted by later standardization. On BSD and most later Unixen, it doesn't matter much because those constants have equivalent values anyway; but nonetheless this code is not per spec. In the same vein, replace PF_INET by AF_INET in one socket() call, which wasn't even consistent with the other socket() call in the same function let alone the remainder of our code. Per investigation of a Cygwin trouble report from Marco Atzeri. It's probably a long shot that this will fix his issue, but it's wrong in any case.
* Add to_regprocedure() and to_regoperator().Robert Haas2014-04-16
| | | | | | | | | These are natural complements to the functions added by commit 0886fc6a5c75b294544263ea979b9cf6195407d9, but they weren't included in the original patch for some reason. Add them. Patch by me, per a complaint by Tom Lane. Review by Tatsuo Ishii.
* Try to fix spurious DSM failures on Windows.Robert Haas2014-04-16
| | | | | | | | Apparently, Windows can sometimes return an error code even when the operation actually worked just fine. Rearrange the order of checks according to what appear to be the best practices in this area. Amit Kapila
* check socket creation errors against PGINVALID_SOCKETBruce Momjian2014-04-16
| | | | | | | | Previously, in some places, socket creation errors were checked for negative values, which is not true for Windows because sockets are unsigned. This masked socket creation errors on Windows. Backpatch through 9.0. 8.4 doesn't have the infrastructure to fix this.
* Use correctly-sized buffer when zero-filling a WAL file.Heikki Linnakangas2014-04-16
| | | | | | I mixed up BLCKSZ and XLOG_BLCKSZ when I changed the way the buffer is allocated a couple of weeks ago. With the default settings, they are both 8k, but they can be changed at compile-time.
* Set pd_lower on internal GIN posting tree pages.Heikki Linnakangas2014-04-14
| | | | | | | | | | | | | | | | This allows squeezing out the unused space in full-page writes. And more importantly, it can be a useful debugging aid. In hindsight we should've done this back when GIN was added - we wouldn't need the 'maxoff' field in the page opaque struct if we had used pd_lower and pd_upper like on normal pages. But as long as there can be pages in the index that have been binary-upgraded from pre-9.4 versions, we can't rely on that, and have to continue using 'maxoff'. Most of the code churn comes from renaming some macros, now that they're used on internal pages, too. This change is completely backwards-compatible, no effect on pg_upgrade.
* Fix bogus handling of bad strategy number in GIST consistent() functions.Tom Lane2014-04-14
| | | | | | | | | | | Make sure we throw an error instead of silently doing the wrong thing when fed a strategy number we don't recognize. Also, in the places that did already throw an error, spell the error message in a way more consistent with our message style guidelines. Per report from Paul Jones. Although this is a bug, it won't occur unless a superuser tries to do something he shouldn't, so it doesn't seem worth back-patching.
* Remove dead checks for invalid left page in ginDeletePage.Heikki Linnakangas2014-04-14
| | | | | In some places, the function assumes the left page is valid, and in others, it checks if it is valid. Remove all the checks.
* GIN entry pages follow the standard page layout - tell XLogInsert.Heikki Linnakangas2014-04-14
| | | | | | The entry B-tree pages all follow the standard page layout. The 9.3 code has this right. I inadvertently changed this at some point during the big refactorings in git master.
* Improve some O(N^2) behavior in window function evaluation.Tom Lane2014-04-13
| | | | | | | | | | | | | | | | | | | Repositioning the tuplestore seek pointer in window_gettupleslot() turns out to be a very significant expense when the window frame is sizable and the frame end can move. To fix, introduce a tuplestore function for skipping an arbitrary number of tuples in one call, parallel to the one we introduced for tuplesort objects in commit 8d65da1f. This reduces the cost of window_gettupleslot() to O(1) if the tuplestore has not spilled to disk. As in the previous commit, I didn't try to do any real optimization of tuplestore_skiptuples for the case where the tuplestore has spilled to disk. There is probably no practical way to get the cost to less than O(N) anyway, but perhaps someone can think of something later. Also fix PersistHoldablePortal() to make use of this API now that we have it. Based on a suggestion by Dean Rasheed, though this turns out not to look much like his patch.
* Make a dedicated AlterTblSpcStmt productionStephen Frost2014-04-13
| | | | | | | | | | Given that ALTER TABLESPACE has moved on from just existing for general purpose rename/owner changes, it deserves its own top-level production in the grammar. This also cleans up the RenameStmt to only ever be used for actual RENAMEs again- it really wasn't appropriate to hide non-RENAME productions under there. Noted by Alvaro.
* Provide moving-aggregate support for boolean aggregates.Tom Lane2014-04-13
| | | | David Rowley and Florian Pflug, reviewed by Dean Rasheed
* Make security barrier views automatically updatableStephen Frost2014-04-12
| | | | | | | | | | | | | | | | | Views which are marked as security_barrier must have their quals applied before any user-defined quals are called, to prevent user-defined functions from being able to see rows which the security barrier view is intended to prevent them from seeing. Remove the restriction on security barrier views being automatically updatable by adding a new securityQuals list to the RTE structure which keeps track of the quals from security barrier views at each level, independently of the user-supplied quals. When RTEs are later discovered which have securityQuals populated, they are turned into subquery RTEs which are marked as security_barrier to prevent any user-supplied quals being pushed down (modulo LEAKPROOF quals). Dean Rasheed, reviewed by Craig Ringer, Simon Riggs, KaiGai Kohei
* Provide moving-aggregate support for a bunch of numerical aggregates.Tom Lane2014-04-12
| | | | | | | | | | | | | First installment of the promised moving-aggregate support in built-in aggregates: count(), sum(), avg(), stddev() and variance() for assorted datatypes, though not for float4/float8. In passing, remove a 2001-vintage kluge in interval_accum(): interval array elements have been properly aligned since around 2003, but nobody remembered to take out this workaround. Also, fix a thinko in the opr_sanity tests for moving-aggregate catalog entries. David Rowley and Florian Pflug, reviewed by Dean Rasheed
* Create infrastructure for moving-aggregate optimization.Tom Lane2014-04-12
| | | | | | | | | | | | | | | | | | | Until now, when executing an aggregate function as a window function within a window with moving frame start (that is, any frame start mode except UNBOUNDED PRECEDING), we had to recalculate the aggregate from scratch each time the frame head moved. This patch allows an aggregate definition to include an alternate "moving aggregate" implementation that includes an inverse transition function for removing rows from the aggregate's running state. As long as this can be done successfully, runtime is proportional to the total number of input rows, rather than to the number of input rows times the average frame length. This commit includes the core infrastructure, documentation, and regression tests using user-defined aggregates. Follow-on commits will update some of the built-in aggregates to use this feature. David Rowley and Florian Pflug, reviewed by Dean Rasheed; additional hacking by me
* Fix bugs in GIN "fast scan" with partial match.Heikki Linnakangas2014-04-10
| | | | | | | | | | | | | | There were a couple of bugs here. First, if the fuzzy limit was exceeded, the loop in entryGetItem might drop out too soon if a whole block needs to be skipped because it's < advancePast ("continue" in a while-loop checks the loop condition too). Secondly, the loop checked when stepping to a new page that there is at least one offset on the page < advancePast, but we cannot rely on that on subsequent calls of entryGetItem, because advancePast might change in between. That caused the skipping loop to read bogus items in the TbmIterateResult's offset array. First item and fix by Alexander Korotkov, second bug pointed out by Fabrízio de Royes Mello, by a small variation of Alexander's test query.
* C comment: track_activity_query_size doesn't support memory unitsBruce Momjian2014-04-10
| | | | | | And explain why. Per report from Pavel Stehule
* Fix typo in comment.Heikki Linnakangas2014-04-10
| | | | Tomonari Katsumata
* Fix a few more misc typos in comments.Heikki Linnakangas2014-04-10
|
* Fix misc typos in comments.Heikki Linnakangas2014-04-09
|
* Add missing include.Robert Haas2014-04-09
| | | | | | This is more cleanup from commit 11a65eed1637a05b03e174700799b024e104bfb4. Amit Kapila
* Fix silly oversight in patch to remove dsm state file.Robert Haas2014-04-08
| | | | | I'm not sure if this is what's causing the Windows buildfarm members to get unhappy, but I don't think it can be helping anything...
* Add an in-core GiST index opclass for inet/cidr types.Tom Lane2014-04-08
| | | | | | | | | | | | | | | | | | | | | | This operator class can accelerate subnet/supernet tests as well as btree-equivalent ordered comparisons. It also handles a new network operator inet && inet (overlaps, a/k/a "is supernet or subnet of"), which is expected to be useful in exclusion constraints. Ideally this opclass would be the default for GiST with inet/cidr data, but we can't mark it that way until we figure out how to do a more or less graceful transition from the current situation, in which the really-completely-bogus inet/cidr opclasses in contrib/btree_gist are marked as default. Having the opclass in core and not default is better than not having it at all, though. While at it, add new documentation sections to allow us to officially document GiST/GIN/SP-GiST opclasses, something there was never a clear place to do before. I filled these in with some simple tables listing the existing opclasses and the operators they support, but there's certainly scope to put more information there. Emre Hasegeli, reviewed by Andreas Karlsson, further hacking by me
* Get rid of the dynamic shared memory state file.Robert Haas2014-04-08
| | | | | | | | | | | | | Instead of storing the ID of the dynamic shared memory control segment in a file within the data directory, store it in the main control segment. This avoids a number of nasty corner cases, most seriously that doing an online backup and then using it on the same machine (e.g. to fire up a standby) would result in the standby clobbering all of the master's dynamic shared memory segments. Per complaints from Heikki Linnakangas, Fujii Masao, and Tom Lane.
* Add new to_reg* functions for error-free OID lookups.Robert Haas2014-04-08
| | | | | | | | | These functions won't throw an error if the object doesn't exist, or if (for functions and operators) there's more than one matching object. Yugo Nagata and Nozomi Anzai, reviewed by Amit Khandekar, Marti Raudsepp, Amit Kapila, and me.