aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Clean up whitespace and indentation in parser and scanner filesPeter Eisentraut2011-11-01
| | | | These are not touched by pgindent, so clean them up a bit manually.
* Comment changes to show bgwriter no longer performs checkpoints.Simon Riggs2011-11-01
|
* Have checkpointer send stats once each processing loop.Simon Riggs2011-11-01
| | | | Noted by Fujii Masao
* Add new file for checkpointer.cSimon Riggs2011-11-01
|
* Split work of bgwriter between 2 processes: bgwriter and checkpointer.Simon Riggs2011-11-01
| | | | | | | | | | | | | bgwriter is now a much less important process, responsible for page cleaning duties only. checkpointer is now responsible for checkpoints and so has a key role in shutdown. Later patches will correct doc references to the now old idea that bgwriter performs checkpoints. Has beneficial effect on performance at high write rates, but mainly refactoring to more easily allow changes for power reduction by simplifying previously tortuous code around required to allow page cleaning and checkpointing to time slice in the same process. Patch by me, Review by Dickson Guedes
* Stop btree indexscans upon reaching nulls in either direction.Tom Lane2011-10-31
| | | | | | | | | | | The existing scan-direction-sensitive tests were overly complex, and failed to stop the scan in cases where it's perfectly legitimate to do so. Per bug #6278 from Maksym Boguk. Back-patch to 8.3, which is as far back as the patch applies easily. Doesn't seem worth sweating over a relatively minor performance issue in 8.2 at this late date. (But note that this was a performance regression from 8.1 and before, so 8.2 is being left as an outlier.)
* Support more locale-specific formatting options in cash_out().Tom Lane2011-10-30
| | | | | | | | | | | | | | | The POSIX spec defines locale fields for controlling the ordering of the value, sign, and currency symbol in monetary output, but cash_out only supported a small subset of these options. Fully implement p/n_sign_posn, p/n_cs_precedes, and p/n_sep_by_space per spec. Fix up cash_in so that it will accept all these format variants. Also, make sure that thousands_sep is only inserted to the left of the decimal point, as required by spec. Per bug #6144 from Eduard Kracmar and discussion of bug #6277. This patch includes some ideas from Alexander Lakhin's proposed patch, though it is very different in detail.
* Further improvement of make_greater_string.Tom Lane2011-10-30
| | | | | | | | | Make sure that it considers all the possibilities that the old code did, instead of trying only one possibility per character position. To keep the runtime in bounds, instead tweak the character incrementers to not try every possible multibyte character code. Remove unnecessary logic to restore the old character value on failure. Additional comment and formatting cleanup.
* Update visibilitymap.c header comments.Robert Haas2011-10-29
| | | | Recent work on index-only scans left this somewhat out of date.
* Fix assorted bogosities in cash_in() and cash_out().Tom Lane2011-10-29
| | | | | | | | | | | | | cash_out failed to handle multiple-byte thousands separators, as per bug #6277 from Alexander Law. In addition, cash_in didn't handle that either, nor could it handle multiple-byte positive_sign. Both routines failed to support multiple-byte mon_decimal_point, which I did not think was worth changing, but at least now they check for the possibility and fall back to using '.' rather than emitting invalid output. Also, make cash_in handle trailing negative signs, which formerly it would reject. Since cash_out generates trailing negative signs whenever the locale tells it to, this last omission represents a fail-to-reload-dumped-data bug. IMO that justifies patching this all the way back.
* Improve make_greater_string() with encoding-specific incrementers.Robert Haas2011-10-29
| | | | | | | | | This infrastructure doesn't in any way guarantee that the character we produce will sort before the one we incremented; but it does at least make it much more likely that we'll end up with something that is a valid character, which improves our chances. Kyotaro Horiguchi, with various adjustments by me.
* Allow hint bits to be set sooner for temporary and unlogged tables.Robert Haas2011-10-28
| | | | | | | | | | | We need not wait until the commit record is durably on disk, because in the event of a crash the page we're updating with hint bits will be gone anyway. Per off-list report from Heikki Linnakangas, this can significantly degrade the performance of unlogged tables; I was able to show a 2x speedup from this patch on a pgbench run with scale factor 15. In practice, this will mostly help small, heavily updated tables, because on larger tables you're unlikely to run into the same row again before the commit record makes it out to disk.
* Fix the number of lwlocks needed by the "fast path" lock patch. It needsHeikki Linnakangas2011-10-27
| | | | | | | | one lock per backend or auxiliary process - the need for a lock for each aux processes was not accounted for in NumLWLocks(). No-one noticed, because the three locks needed for the three aux processes fit into the few extra lwlocks we allocate for 3rd party modules that don't call RequestAddinLWLocks() (NUM_USER_DEFINED_LWLOCKS, 4 by default).
* Improve planner's ability to recognize cases where an IN's RHS is unique.Tom Lane2011-10-26
| | | | | | | | | | | | If the right-hand side of a semijoin is unique, then we can treat it like a normal join (or another way to say that is: we don't need to explicitly unique-ify the data before doing it as a normal join). We were recognizing such cases when the RHS was a sub-query with appropriate DISTINCT or GROUP BY decoration, but there's another way: if the RHS is a plain relation with unique indexes, we can check if any of the indexes prove the output is unique. Most of the infrastructure for that was there already in the join removal code, though I had to rearrange it a bit. Per reflection about a recent example in pgsql-performance.
* Change FK trigger naming convention to fix self-referential FKs.Tom Lane2011-10-26
| | | | | | | | | | | Use names like "RI_ConstraintTrigger_a_NNNN" for FK action triggers and "RI_ConstraintTrigger_c_NNNN" for FK check triggers. This ensures the action trigger fires first in self-referential cases where the very same row update fires both an action and a check trigger. This change provides a non-probabilistic solution for bug #6268, at the risk that it could break client code that is making assumptions about the exact names assigned to auto-generated FK triggers. Hence, change this in HEAD only. No need for forced initdb since old triggers continue to work fine.
* Change FK trigger creation order to better support self-referential FKs.Tom Lane2011-10-26
| | | | | | | | | | | | | | | | | | | | | | | | | | When a foreign-key constraint references another column of the same table, row updates will queue both the PK's ON UPDATE action and the FK's CHECK action in the same event. The ON UPDATE action must execute first, else the CHECK will check a non-final state of the row and possibly throw an inappropriate error, as seen in bug #6268 from Roman Lytovchenko. Now, the firing order of multiple triggers for the same event is determined by the sort order of their pg_trigger.tgnames, and the auto-generated names we use for FK triggers are "RI_ConstraintTrigger_NNNN" where NNNN is the trigger OID. So most of the time the firing order is the same as creation order, and so rearranging the creation order fixes it. This patch will fail to fix the problem if the OID counter wraps around or adds a decimal digit (eg, from 99999 to 100000) while we are creating the triggers for an FK constraint. Given the small odds of that, and the low usage of self-referential FKs, we'll live with that solution in the back branches. A better fix is to change the auto-generated names for FK triggers, but it seems unwise to do that in stable branches because there may be client code that depends on the naming convention. We'll fix it that way in HEAD in a separate patch. Back-patch to all supported branches, since this bug has existed for a long time.
* Make event_source visible on all platformsMagnus Hagander2011-10-25
| | | | | | On non-windows platform, we just ignore any value set there. Noted by Jaime Casanova
* Support configurable eventlog application names on WindowsMagnus Hagander2011-10-25
| | | | | | | | This allows different instances to use the eventlog with different identifiers, by setting the event_source GUC, similar to how syslog_ident works. Original patch by MauMau, heavily modified by Magnus Hagander
* Don't trust deferred-unique indexes for join removal.Tom Lane2011-10-23
| | | | | | | | | | | The uniqueness condition might fail to hold intra-transaction, and assuming it does can give incorrect query results. Per report from Marti Raudsepp, though this is not his proposed patch. Back-patch to 9.0, where both these features were introduced. In the released branches, add the new IndexOptInfo field to the end of the struct, to try to minimize ABI breakage for third-party code that may be examining that struct.
* Support synchronization of snapshots through an export/import procedure.Tom Lane2011-10-22
| | | | | | | | | | | | | | A transaction can export a snapshot with pg_export_snapshot(), and then others can import it with SET TRANSACTION SNAPSHOT. The data does not leave the server so there are not security issues. A snapshot can only be imported while the exporting transaction is still running, and there are some other restrictions. I'm not totally convinced that we've covered all the bases for SSI (true serializable) mode, but it works fine for lesser isolation modes. Joachim Wieland, reviewed by Marko Tiikkaja, and rather heavily modified by Tom Lane
* Fix overly-complicated usage of errcode_for_file_access().Heikki Linnakangas2011-10-22
| | | | | | | No need to do "errcode(errcode_for_file_access())", just "errcode_for_file_access()" is enough. The extra errcode() call is useless but harmless, so there's no user-visible bug here. Nevertheless, backpatch to 9.1 where this code were added.
* Code review for pgstat_get_crashed_backend_activity patch.Tom Lane2011-10-21
| | | | | | | Avoid possibly dumping core when pgstat_track_activity_query_size has a less-than-default value; avoid uselessly searching for the query string of a successfully-exited backend; don't bother putting out an ERRDETAIL if we don't have a query to show; some other minor stylistic improvements.
* More cleanup after failed reduced-lock-levels-for-DDL feature.Tom Lane2011-10-21
| | | | | | | | | | | | | | | | Turns out that use of ShareUpdateExclusiveLock or ShareRowExclusiveLock to protect DDL changes had gotten copied into several places that were not touched by either of Simon's original patches for the feature, and thus neither he nor I thought to revert them. (Indeed, it appears that two of these uses were committed *after* the reversion, which just goes to show that git merging is no panacea.) Change these places to use AccessExclusiveLock again. If we ever manage to resurrect that feature, we're going to have to think a bit harder about how to keep lock level usage in sync for DDL operations that aren't within the AlterTable infrastructure. Two of these bugs are only in HEAD, but one is in the 9.1 branch too. Alvaro found one of them, I found the other two.
* Try to log current the query string when a backend crashes.Robert Haas2011-10-21
| | | | | | | | | | | | | | | | To avoid minimize risk inside the postmaster, we subject this feature to a number of significant limitations. We very much wish to avoid doing any complex processing inside the postmaster, due to the posssibility that the crashed backend has completely corrupted shared memory. To that end, no encoding conversion is done; instead, we just replace anything that doesn't look like an ASCII character with a question mark. We limit the amount of data copied to 1024 characters, and carefully sanity check the source of that data. While these restrictions would doubtless be unacceptable in a general-purpose logging facility, even this limited facility seems like an improvement over the status quo ante. Marti Raudsepp, reviewed by PDXPUG and myself
* Fix DROP OPERATOR FAMILY IF EXISTS.Robert Haas2011-10-21
| | | | | | | | | | Essentially, the "IF EXISTS" portion was being ignored, and an error thrown anyway if the opfamily did not exist. I broke this in commit fd1843ff8979c0461fb3f1a9eab61140c977e32d; so backpatch to 9.1.X. Report and diagnosis by KaiGai Kohei.
* Simplify and improve ProcessStandbyHSFeedbackMessage logic.Tom Lane2011-10-20
| | | | | | | | | | | | | | There's no need to clamp the standby's xmin to be greater than GetOldestXmin's result; if there were any such need this logic would be hopelessly inadequate anyway, because it fails to account for within-database versus cluster-wide values of GetOldestXmin. So get rid of that, and just rely on sanity-checking that the xmin is not wrapped around relative to the nextXid counter. Also, don't reset the walsender's xmin if the current feedback xmin is indeed out of range; that just creates more problems than we already had. Lastly, don't bother to take the ProcArrayLock; there's no need to do that to set xmin. Also improve the comments about this in GetOldestXmin itself.
* Fix get_object_namespace() not to think extensions are "in" a schema.Robert Haas2011-10-20
| | | | | | | extnamespace means something altogether different in this context. Mostly by accident, this coding error (introduced in my commit 82a4a777d94bec965ab2f1d04b6e6a3f0447b377) broke the buildfarm instead of just silently doing the wrong thing.
* Add "skipping" to the NOTICE produced by DROP OPERATOR CLASS IF EXISTS.Robert Haas2011-10-19
| | | | | | | This makes this message consistent with all the other similar notices produced by other DROP IF EXISTS commands. Noted by KaiGai Kohei
* Consolidate DROP handling for some object types.Robert Haas2011-10-19
| | | | | | | This gets rid of a significant amount of duplicative code. KaiGai Kohei, reviewed in earlier versions by Dimitri Fontaine, with further review and cleanup by me.
* Suppress -Wunused-result warnings about write() and fwrite().Tom Lane2011-10-18
| | | | | | | This is merely an exercise in satisfying pedants, not a bug fix, because in every case we were checking for failure later with ferror(), or else there was nothing useful to be done about a failure anyway. Document the latter cases.
* Reject empty pg_hba.conf files.Tom Lane2011-10-18
| | | | | | | | | | | An empty HBA file is surely an error, since it means there is no way to connect to the server. We've not heard identifiable reports of people actually doing that, but this will also close off the case Thom Brown just complained of, namely pointing hba_file at a directory. (On at least some platforms with some directories, it will read as an empty file.) Perhaps this should be back-patched, but given the lack of previous complaints, I won't add extra work for the translators.
* Exclude postmaster.opts from base backupsMagnus Hagander2011-10-18
| | | | Noted by Fujii Masao
* Avoid assuming that index-only scan data matches the index's rowtype.Tom Lane2011-10-16
| | | | | | | | | | | | | | | | | | | | | | In general the data returned by an index-only scan should have the datatypes originally computed by FormIndexDatum. If the index opclasses use "storage" datatypes different from their input datatypes, the scan tuple will not have the same rowtype attributed to the index; but we had a hard-wired assumption that that was true in nodeIndexonlyscan.c. We'd already hacked around the issue for the one case where the types are different in btree indexes (btree name_ops), but this would definitely come back to bite us if we ever implement index-only scans in GiST. To fix, require the index AM to explicitly provide the tupdesc for the tuple it is returning. btree can just pass back the index's tupdesc, but GiST will have to work harder when and if it supports index-only scans. I had previously proposed fixing this by allowing the index AM to fill the scan tuple slot directly; but on reflection that seemed like a module layering violation, since TupleTableSlots are creatures of the executor. At least in the btree case, it would also be less efficient, since the tuple deconstruction work would occur even for rows later found to be invisible to the scan's snapshot.
* Teach btree to handle ScalarArrayOpExpr quals natively.Tom Lane2011-10-16
| | | | | This allows "indexedcol op ANY(ARRAY[...])" conditions to be used in plain indexscans, and particularly in index-only scans.
* Fix bugs in information_schema.referential_constraints view.Tom Lane2011-10-14
| | | | | | | | | | | | | | | | This view was being insufficiently careful about matching the FK constraint to the depended-on primary or unique key constraint. That could result in failure to show an FK constraint at all, or showing it multiple times, or claiming that it depended on a different constraint than the one it really does. Fix by joining via pg_depend to ensure that we find only the correct dependency. Back-patch, but don't bump catversion because we can't force initdb in back branches. The next minor-version release notes should explain that if you need to fix this in an existing installation, you can drop the information_schema schema then re-create it by sourcing $SHAREDIR/information_schema.sql in each database (as a superuser of course).
* Measure the number of all-visible pages for use in index-only scan costing.Tom Lane2011-10-14
| | | | | | | | | | | | | | | | | Add a column pg_class.relallvisible to remember the number of pages that were all-visible according to the visibility map as of the last VACUUM (or ANALYZE, or some other operations that update pg_class.relpages). Use relallvisible/relpages, instead of an arbitrary constant, to estimate how many heap page fetches can be avoided during an index-only scan. This is pretty primitive and will no doubt see refinements once we've acquired more field experience with the index-only scan mechanism, but it's way better than using a constant. Note: I had to adjust an underspecified query in the window.sql regression test, because it was changing answers when the plan changed to use an index-only scan. Some of the adjacent tests perhaps should be adjusted as well, but I didn't do that here.
* Avoid potential relcache leak in objectaddress.c.Robert Haas2011-10-14
| | | | | | | Nobody using the missing_ok flag yet, but let's speculate that this will be a better interface for future callers. KaiGai Kohei, with some adjustments by me.
* Remove all "traces" of trace_userlocks, because userlocks were removedBruce Momjian2011-10-13
| | | | in PG 8.2.
* Don't mark auto-generated types as extension members.Tom Lane2011-10-12
| | | | | | | | | | | | | | | | | | Relation rowtypes and automatically-generated array types do not need to have their own extension membership dependency entries. If we create such then it becomes more difficult to remove items from an extension, and it's also harder for an extension upgrade script to make sure it duplicates the dependencies created by the extension's regular installation script. I changed the code in such a way that this happened in commit 988cccc620dd8c16d77f88ede167b22056176324, I think because of worries about the shell-type-replacement case; but that cure was worse than the disease. It would only matter if one extension created a shell type that was replaced with an auto-generated type in another extension, which seems pretty far-fetched. Better to make this work unsurprisingly in normal cases. Report and patch by Robert Haas, comment adjustments by me.
* Modify RelationGetBufferForTuple() to use a typedef, rather than aBruce Momjian2011-10-12
| | | | struct, to help pgindent.
* Throw a useful error message if an extension script file is fed to psql.Tom Lane2011-10-12
| | | | | | | | | | | | | | | | We have seen one too many reports of people trying to use 9.1 extension files in the old-fashioned way of sourcing them in psql. Not only does that usually not work (due to failure to substitute for MODULE_PATHNAME and/or @extschema@), but if it did work they'd get a collection of loose objects not an extension. To prevent this, insert an \echo ... \quit line that prints a suitable error message into each extension script file, and teach commands/extension.c to ignore lines starting with \echo. That should not only prevent any adverse consequences of loading a script file the wrong way, but make it crystal clear to users that they need to do it differently now. Tom Lane, following an idea of Andrew Dunstan's. Back-patch into 9.1 ... there is not going to be much value in this if we wait till 9.2.
* Add comment on why pulling data from a "name" index column can't crash.Tom Lane2011-10-11
| | | | | | | | | | | It's been bothering me for several days that pretending that the cstring data stored in a btree name_ops column is really a "name" Datum could lead to reading past the end of memory. However, given the current memory layout used for index-only scans in the btree code, a crash is in fact not possible. Document that so we don't break it. I have not thought of any other solutions that aren't fairly ugly too, and most of them lose the functionality of index-only scans on name columns altogether, so this seems like the way to go.
* Generate index-only scan tuple descriptor from the plan node's indextlist.Tom Lane2011-10-11
| | | | | | | | | | Dept. of second thoughts: as long as we've got that tlist hanging around anyway, we can apply ExecTypeFromTL to it to get a suitable descriptor for the ScanTupleSlot. This is a nicer solution than the previous one because it eliminates some hard-wired knowledge about btree name_ops, and because it avoids the somewhat shaky assumption that we needn't set up the scan tuple descriptor in EXPLAIN_ONLY mode. It doesn't change what actually happens at run-time though, and I'm still a bit nervous about that.
* Consider index-only scans even when there is no matching qual or ORDER BY.Tom Lane2011-10-11
| | | | By popular demand.
* Rearrange the implementation of index-only scans.Tom Lane2011-10-11
| | | | | | | | | | | | | | | | | | | | | | This commit changes index-only scans so that data is read directly from the index tuple without first generating a faux heap tuple. The only immediate benefit is that indexes on system columns (such as OID) can be used in index-only scans, but this is necessary infrastructure if we are ever to support index-only scans on expression indexes. The executor is now ready for that, though the planner still needs substantial work to recognize the possibility. To do this, Vars in index-only plan nodes have to refer to index columns not heap columns. I introduced a new special varno, INDEX_VAR, to mark such Vars to avoid confusion. (In passing, this commit renames the two existing special varnos to OUTER_VAR and INNER_VAR.) This allows ruleutils.c to handle them with logic similar to what we use for subplan reference Vars. Since index-only scans are now fundamentally different from regular indexscans so far as their expression subtrees are concerned, I also chose to change them to have their own plan node type (and hence, their own executor source file).
* Replace hardcoded switch in object_exists() with a lookup table.Robert Haas2011-10-11
| | | | | | | | | | | | There's no particular advantage to this change on its face; indeed, it's possible that this might be slightly slower than the old way. But it makes this information more easily accessible to other functions, and therefore paves the way for future code consolidation. Performance isn't critical here, so there's no need to be smart about how we do the search. This is a heavily cut-down version of a patch from KaiGai Kohei, with several fixes by me. Additional review from Dimitri Fontaine.
* Repair breakage in VirtualXactLock.Robert Haas2011-10-11
| | | | | I broke this in commit 84e37126770dd6de903dad88ce150a49b63b5ef9. Report and fix by Fujii Masao.
* Mark GUC external_pid_file's default as '' in postgresql.conf, ratherBruce Momjian2011-10-10
| | | | than '(none)'.
* Fix ALTER TABLE ONLY .. DROP CONSTRAINT.Robert Haas2011-10-09
| | | | | | | | | | | | | | | | | | When I consolidated two copies of the HOT-chain search logic in commit 4da99ea4231e3d8bbf28b666748c1028e7b7d665, I introduced a behavior change: the old code wouldn't necessarily traverse the entire chain, if the most recently returned tuple were updated while the HOT chain traversal is in progress. The new behavior seems more correct, but unfortunately, the code here relies on a scan with SnapshotNow failing to see its own updates. That seems pretty shaky even with the old HOT chain traversal behavior, since there's no guarantee that these updates will always be HOT, but it's trivial to broke a failure with the new HOT search logic. Fix by updating just the first matching pg_constraint tuple, rather than all of them, since there should be only one anyway. But since nobody has reproduced this failure on older versions, no back-patch for now. Report and test case by Alex Hunsaker; tablecmds.c changes by me.
* Clean up a couple of box gist helper functions.Heikki Linnakangas2011-10-09
| | | | | | | | | | The original idea of this patch was to make box picksplit run faster, by eliminating unnecessary palloc() overhead, but that was obsoleted by the new double-sorting split algorithm that doesn't call these functions so heavily anymore. Nevertheless, the code looks better this way. Original patch by me, reviewed and tidied up after the double-sorting patch by Kevin Grittner.