aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Use %u to print out BlockNumber variablesAlvaro Herrera2014-12-18
| | | | Per Tom Lane
* Have VACUUM log number of skipped pages due to pinsAlvaro Herrera2014-12-18
| | | | | | Author: Jim Nasby, some kibitzing by Heikki Linnankangas. Discussion leading to current behavior and precise wording fueled by thoughts from Robert Haas and Andres Freund.
* Improve hash_create's API for selecting simple-binary-key hash functions.Tom Lane2014-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if you wanted anything besides C-string hash keys, you had to specify a custom hashing function to hash_create(). Nearly all such callers were specifying tag_hash or oid_hash; which is tedious, and rather error-prone, since a caller could easily miss the opportunity to optimize by using hash_uint32 when appropriate. Replace this with a design whereby callers using simple binary-data keys just specify HASH_BLOBS and don't need to mess with specific support functions. hash_create() itself will take care of optimizing when the key size is four bytes. This nets out saving a few hundred bytes of code space, and offers a measurable performance improvement in tidbitmap.c (which was not exploiting the opportunity to use hash_uint32 for its 4-byte keys). There might be some wins elsewhere too, I didn't analyze closely. In future we could look into offering a similar optimized hashing function for 8-byte keys. Under this design that could be done in a centralized and machine-independent fashion, whereas getting it right for keys of platform-dependent sizes would've been notationally painful before. For the moment, the old way still works fine, so as not to break source code compatibility for loadable modules. Eventually we might want to remove tag_hash and friends from the exported API altogether, since there's no real need for them to be explicitly referenced from outside dynahash.c. Teodor Sigaev and Tom Lane
* Change how first WAL segment on new timeline after promotion is created.Heikki Linnakangas2014-12-18
| | | | | | | | | | | | | | Two changes: 1. When copying a WAL segment from old timeline to create the first segment on the new timeline, only copy up to the point where the timeline switch happens, and zero-fill the rest. This avoids corner cases where we might think that the copied WAL from the previous timeline belong to the new timeline. 2. If the timeline switch happens at a segment boundary, don't copy the whole old segment to the new timeline. It's pointless, because it's 100% identical to the old segment.
* Add memory barriers for PgBackendStatus.st_changecount protocol.Fujii Masao2014-12-18
| | | | | | | | | | | | | | | | | | st_changecount protocol needs the memory barriers to ensure that the apparent order of execution is as it desires. Otherwise, for example, the CPU might rearrange the code so that st_changecount is incremented twice before the modification on a machine with weak memory ordering. This surprising result can lead to bugs. This commit introduces the macros to load and store st_changecount with the memory barriers. These are called before and after PgBackendStatus entries are modified or copied into private memory, in order to prevent CPU from reordering PgBackendStatus access. Per discussion on pgsql-hackers, we decided not to back-patch this to 9.4 or before until we get an actual bug report about this. Patch by me. Review by Robert Haas.
* Ensure variables live across calls in generate_series(numeric, numeric).Fujii Masao2014-12-18
| | | | | | | | | | | In generate_series_step_numeric(), the variables "start_num" and "stop_num" may be potentially freed until the next call. So they should be put in the location which can survive across calls. But previously they were not, and which could cause incorrect behavior of generate_series(numeric, numeric). This commit fixes this problem by copying them on multi_call_memory_ctx. Andrew Gierth
* Remove odd blank line in comment.Fujii Masao2014-12-18
| | | | Etsuro Fujita
* Fix (re-)starting from a basebackup taken off a standby after a failure.Andres Freund2014-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | When starting up from a basebackup taken off a standby extra logic has to be applied to compute the point where the data directory is consistent. Normal base backups use a WAL record for that purpose, but that isn't possible on a standby. That logic had a error check ensuring that the cluster's control file indicates being in recovery. Unfortunately that check was too strict, disregarding the fact that the control file could also indicate that the cluster was shut down while in recovery. That's possible when the a cluster starting from a basebackup is shut down before the backup label has been removed. When everything goes well that's a short window, but when either restore_command or primary_conninfo isn't configured correctly the window can get much wider. That's because inbetween reading and unlinking the label we restore the last checkpoint from WAL which can need additional WAL. To fix simply also allow starting when the control file indicates "shutdown in recovery". There's nicer fixes imaginable, but they'd be more invasive. Backpatch to 9.2 where support for taking basebackups from standbys was added.
* Allow CHECK constraints to be placed on foreign tables.Tom Lane2014-12-17
| | | | | | | | | | | | | | | As with NOT NULL constraints, we consider that such constraints are merely reports of constraints that are being enforced by the remote server (or other underlying storage mechanism). Their only real use is to allow planner optimizations, for example in constraint-exclusion checks. Thus, the code changes here amount to little more than removal of the error that was formerly thrown for applying CHECK to a foreign table. (In passing, do a bit of cleanup of the ALTER FOREIGN TABLE reference page, which had accumulated some weird decisions about ordering etc.) Shigeru Hanada and Etsuro Fujita, reviewed by Kyotaro Horiguchi and Ashutosh Bapat.
* Fix poorly worded error message.Tom Lane2014-12-17
| | | | Adam Brightwell, per report from Martín Marqués.
* Fix off-by-one loop count in MapArrayTypeName, and get rid of static array.Tom Lane2014-12-16
| | | | | | | | | | | | | | | | | | | | | MapArrayTypeName would copy up to NAMEDATALEN-1 bytes of the base type name, which of course is wrong: after prepending '_' there is only room for NAMEDATALEN-2 bytes. Aside from being the wrong result, this case would lead to overrunning the statically allocated work buffer. This would be a security bug if the function were ever used outside bootstrap mode, but it isn't, at least not in any currently supported branches. Aside from fixing the off-by-one loop logic, this patch gets rid of the static work buffer by having MapArrayTypeName pstrdup its result; the sole caller was already doing that, so this just requires moving the pstrdup call. This saves a few bytes but mainly it makes the API a lot cleaner. Back-patch on the off chance that there is some third-party code using MapArrayTypeName with less-secure input. Pushing pstrdup into the function should not cause any serious problems for such hypothetical code; at worst there might be a short term memory leak. Per Coverity scanning.
* Fix some jsonb issues found by Coverity in recent commits.Andrew Dunstan2014-12-16
| | | | | | | | | | | | Mostly these issues concern the non-use of function results. These have been changed to use (void) pushJsonbValue(...) instead of assigning the result to a variable that gets overwritten before it is used. There is a larger issue that we should possibly examine the API for pushJsonbValue(), so that instead of returning a value it modifies a state argument. The current idiom is rather clumsy. However, changing that requires quite a bit more work, so this change should do for the moment.
* Misc comment typo fixes.Heikki Linnakangas2014-12-16
| | | | | Backpatch the applicable parts, just to make backpatching future patches easier.
* Fix point <-> polygon code for zero-distance case.Tom Lane2014-12-15
| | | | | "PG_RETURN_FLOAT8(x)" is not "return x", except perhaps by accident on some platforms.
* Add point <-> polygon distance operator.Heikki Linnakangas2014-12-15
| | | | Alexander Korotkov, reviewed by Emre Hasegeli.
* Translation updatesPeter Eisentraut2014-12-15
|
* Add CINE option for CREATE TABLE AS and CREATE MATERIALIZED VIEWAndrew Dunstan2014-12-13
| | | | Fabrízio de Royes Mello reviewed by Rushabh Lathia.
* Repair corner-case bug in array version of percentile_cont().Tom Lane2014-12-13
| | | | | | | | The code for advancing through the input rows overlooked the case that we might already be past the first row of the row pair now being considered, in case the previous percentile also fell between the same two input rows. Report and patch by Andrew Gierth; logic rewritten a bit for clarity by me.
* Add several generator functions for jsonb that exist for json.Andrew Dunstan2014-12-12
| | | | | | | | | | | | | | | | The functions are: to_jsonb() jsonb_object() jsonb_build_object() jsonb_build_array() jsonb_agg() jsonb_object_agg() Also along the way some better logic is implemented in json_categorize_type() to match that in the newly implemented jsonb_categorize_type(). Andrew Dunstan, reviewed by Pavel Stehule and Alvaro Herrera.
* Add json_strip_nulls and jsonb_strip_nulls functions.Andrew Dunstan2014-12-12
| | | | | | | | The functions remove object fields, including in nested objects, that have null as a value. In certain cases this can lead to considerably smaller datums, with no loss of semantic information. Andrew Dunstan, reviewed by Pavel Stehule.
* Put the logic to decide which synchronous standby is active into a function.Heikki Linnakangas2014-12-12
| | | | | | This avoids duplicating the code. Michael Paquier, reviewed by Simon Riggs and me
* Fix planning of SELECT FOR UPDATE on child table with partial index.Tom Lane2014-12-11
| | | | | | | | | | | | | | | | | | | Ordinarily we can omit checking of a WHERE condition that matches a partial index's condition, when we are using an indexscan on that partial index. However, in SELECT FOR UPDATE we must include the "redundant" filter condition in the plan so that it gets checked properly in an EvalPlanQual recheck. The planner got this mostly right, but improperly omitted the filter condition if the index in question was on an inheritance child table. In READ COMMITTED mode, this could result in incorrectly returning just-updated rows that no longer satisfy the filter condition. The cause of the error is using get_parse_rowmark() when get_plan_rowmark() is what should be used during planning. In 9.3 and up, also fix the same mistake in contrib/postgres_fdw. It's currently harmless there (for lack of inheritance support) but wrong is wrong, and the incorrect code might get copied to someplace where it's more significant. Report and fix by Kyotaro Horiguchi. Back-patch to all supported branches.
* Fix corner case where SELECT FOR UPDATE could return a row twice.Tom Lane2014-12-11
| | | | | | | | | | | | | | | | In READ COMMITTED mode, if a SELECT FOR UPDATE discovers it has to redo WHERE-clause checking on rows that have been updated since the SELECT's snapshot, it invokes EvalPlanQual processing to do that. If this first occurs within a non-first child table of an inheritance tree, the previous coding could accidentally re-return a matching row from an earlier, already-scanned child table. (And, to add insult to injury, I think this could make it miss returning a row that should have been returned, if the updated row that this happens on should still have passed the WHERE qual.) Per report from Kyotaro Horiguchi; the added isolation test is based on his test case. This has been broken for quite awhile, so back-patch to all supported branches.
* Further changes to REINDEX SCHEMASimon Riggs2014-12-11
| | | | | | | | | | Ensure we reindex indexes built on Mat Views. Based on patch from Micheal Paquier Add thorough tests to check that indexes on tables, toast tables and mat views are reindexed. Simon Riggs
* Fix assorted confusion between Oid and int32.Tom Lane2014-12-11
| | | | | | | | | | | In passing, also make some debugging elog's in pgstat.c a bit more consistently worded. Back-patch as far as applicable (9.3 or 9.4; none of these mistakes are really old). Mark Dilger identified and patched the type violations; the message rewordings are mine.
* Use correct macro for reltablespace.Heikki Linnakangas2014-12-11
| | | | | | | It's an OID. WRITE_UINT_FIELD is identical to WRITE_OID_FIELD, but let's be tidy. Mark Dilger
* Fix minor thinko in convertToJsonb().Tom Lane2014-12-10
| | | | | | | | | | The amount of space to reserve for the value's varlena header is VARHDRSZ, not sizeof(VARHDRSZ). The latter coding accidentally failed to fail because of the way the VARHDRSZ macro is currently defined; but if we ever change it to return size_t (as one might reasonably expect it to do), convertToJsonb() would have failed. Spotted by Mark Dilger.
* Silence REINDEXSimon Riggs2014-12-09
| | | | | | | Previously REINDEX DATABASE and REINDEX SCHEMA produced a stream of NOTICE messages. Removing that since it is inconsistent for such a command to produce output without a VERBOSE option.
* REINDEX SCHEMASimon Riggs2014-12-09
| | | | | | | | Add new SCHEMA option to REINDEX and reindexdb. Sawada Masahiko Reviewed by Michael Paquier and Fabrízio de Royes Mello
* Windows: use GetSystemTimePreciseAsFileTime if availableSimon Riggs2014-12-08
| | | | | | | | | | | | | | PostgreSQL on Windows 8 or Windows Server 2012 will now get high-resolution timestamps by dynamically loading the GetSystemTimePreciseAsFileTime function. It'll fall back to to GetSystemTimeAsFileTime if the higher precision variant isn't found, so the same binaries without problems on older Windows releases. No attempt is made to detect the Windows version. Only the presence or absence of the desired function is considered. Craig Ringer
* Remove duplicate code in heap_prune_chain()Simon Riggs2014-12-08
| | | | | | No need to set tuple tableOid twice Jim Nasby
* Event Trigger for table_rewriteSimon Riggs2014-12-08
| | | | | | | | | | | | | | Generate a table_rewrite event when ALTER TABLE attempts to rewrite a table. Provide helper functions to identify table and reason. Intended use case is to help assess or to react to schema changes that might hold exclusive locks for long periods. Dimitri Fontaine, triggering an edit by Simon Riggs Reviewed in detail by Michael Paquier
* Tweaks for recovery_target_actionSimon Riggs2014-12-07
| | | | | | | | | Rename parameter action_at_recovery_target to recovery_target_action suggested by Christoph Berg. Place into recovery.conf suggested by Fujii Masao, replacing (deprecating) earlier parameters, per Michael Paquier.
* Print new track_commit_timestamp in rm_desc of a parameter-change record.Heikki Linnakangas2014-12-05
| | | | Michael Paquier
* Print wal_log_hints in the rm_desc routing of a parameter-change record.Heikki Linnakangas2014-12-05
| | | | | | | | | It was an oversight in the original commit. Also note in the sample config file that changing wal_log_hints requires a restart. Michael Paquier. Backpatch to 9.4, where wal_log_hints was added.
* Don't dump core if pq_comm_reset() is called before pq_init().Robert Haas2014-12-04
| | | | | | | This can happen if an error occurs in a standalone backend. This bug was introduced by commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d. Reported by Álvaro Herrera.
* Keep track of transaction commit timestampsAlvaro Herrera2014-12-03
| | | | | | | | | | | | | | | | | | | | | Transactions can now set their commit timestamp directly as they commit, or an external transaction commit timestamp can be fed from an outside system using the new function TransactionTreeSetCommitTsData(). This data is crash-safe, and truncated at Xid freeze point, same as pg_clog. This module is disabled by default because it causes a performance hit, but can be enabled in postgresql.conf requiring only a server restart. A new test in src/test/modules is included. Catalog version bumped due to the new subdirectory within PGDATA and a couple of new SQL functions. Authors: Álvaro Herrera and Petr Jelínek Reviewed to varying degrees by Michael Paquier, Andres Freund, Robert Haas, Amit Kapila, Fujii Masao, Jaime Casanova, Simon Riggs, Steven Singer, Peter Eisentraut
* Fix typosAlvaro Herrera2014-12-03
|
* Improve error messages for malformed array input strings.Tom Lane2014-12-02
| | | | | | | | | | | | | | | Make the error messages issued by array_in() uniformly follow the style ERROR: malformed array literal: "actual input string" DETAIL: specific complaint here and rewrite many of the specific complaints to be clearer. The immediate motivation for doing this is a complaint from Josh Berkus that json_to_record() produced an unintelligible error message when dealing with an array item, because it tries to feed the JSON-format array value to array_in(). Really it ought to be smart enough to perform JSON-to-Postgres array conversion, but that's a future feature not a bug fix. In the meantime, this change is something we agreed we could back-patch into 9.4, and it should help de-confuse things a bit.
* Don't skip SQL backends in logical decoding for visibility computation.Andres Freund2014-12-02
| | | | | | | | | | | | | | | | | | The logical decoding patchset introduced PROC_IN_LOGICAL_DECODING flag PGXACT flag, that allows such backends to be skipped when computing the xmin horizon/snapshots. That's fine and sensible for walsenders streaming out logical changes, but not at all fine for SQL backends doing logical decoding. If the latter set that flag any change they have performed outside of logical decoding will not be regarded as visible - which e.g. can lead to that change being vacuumed away. Note that not setting the flag for SQL backends isn't particularly bothersome - the SQL backend doesn't do streaming, so it only runs for a limited amount of time. Per buildfarm member 'tick' and Alvaro. Backpatch to 9.4, where logical decoding was introduced.
* Fix JSON aggregates to work properly when final function is re-executed.Tom Lane2014-12-02
| | | | | | | | | | | | Davide S. reported that json_agg() sometimes produced multiple trailing right brackets. This turns out to be because json_agg_finalfn() attaches the final right bracket, and was doing so by modifying the aggregate state in-place. That's verboten, though unfortunately it seems there's no way for nodeAgg.c to check for such mistakes. Fix that back to 9.3 where the broken code was introduced. In 9.4 and HEAD, likewise fix json_object_agg(), which had copied the erroneous logic. Make some cosmetic cleanups as well.
* Minor cleanup of function declarations for BRIN.Tom Lane2014-12-02
| | | | | | | Get rid of PG_FUNCTION_INFO_V1() macros, which are quite inappropriate for built-in functions (possibly leftovers from testing as a loadable module?). Also, fix gratuitous inconsistency between SQL-level and C-level names of the minmax support functions.
* Guard against bad "dscale" values in numeric_recv().Tom Lane2014-12-01
| | | | | | | | | | | | | | | | | | | | | | | | We were not checking to see if the supplied dscale was valid for the given digit array when receiving binary-format numeric values. While dscale can validly be more than the number of nonzero fractional digits, it shouldn't be less; that case causes fractional digits to be hidden on display even though they're there and participate in arithmetic. Bug #12053 from Tommaso Sala indicates that there's at least one broken client library out there that sometimes supplies an incorrect dscale value, leading to strange behavior. This suggests that simply throwing an error might not be the best response; it would lead to failures in applications that might seem to be working fine today. What seems the least risky fix is to truncate away any digits that would be hidden by dscale. This preserves the existing behavior in terms of what will be printed for the transmitted value, while preventing subsequent arithmetic from producing results inconsistent with that. In passing, throw a specific error for the case of dscale being outside the range that will fit into a numeric's header. Before you got "value overflows numeric format", which is a bit misleading. Back-patch to all supported branches.
* Fix hstore_to_json_loose's detection of valid JSON number values.Andrew Dunstan2014-12-01
| | | | | | | | | | We expose a function IsValidJsonNumber that internally calls the lexer for json numbers. That allows us to use the same test everywhere, instead of inventing a broken test for hstore conversions. The new function is also used in datum_to_json, replacing the code that is now moved to the new function. Backpatch to 9.3 where hstore_to_json_loose was introduced.
* Update transaction README for persistent multixactsAlvaro Herrera2014-11-28
| | | | | Multixacts are now maintained during recovery, but the README didn't get the memo. Backpatch to 9.3, where the divergence was introduced.
* Add bms_get_singleton_member(), and use it where appropriate.Tom Lane2014-11-28
| | | | | | | | | | This patch adds a function that replaces a bms_membership() test followed by a bms_singleton_member() call, performing both the test and the extraction of a singleton set's member in one scan of the bitmapset. The performance advantage over the old way is probably minimal in current usage, but it seems worthwhile on notational grounds anyway. David Rowley
* Add bms_next_member(), and use it where appropriate.Tom Lane2014-11-28
| | | | | | | | | | | | This patch adds a way of iterating through the members of a bitmapset nondestructively, unlike the old way with bms_first_member(). While bms_next_member() is very slightly slower than bms_first_member() (at least for typical-size bitmapsets), eliminating the need to palloc and pfree a temporary copy of the target bitmapset is a significant win. So this method should be preferred in all cases where a temporary copy would be necessary. Tom Lane, with suggestions from Dean Rasheed and David Rowley
* Improve performance of OverrideSearchPathMatchesCurrent().Tom Lane2014-11-28
| | | | | | | | | This function was initially coded on the assumption that it would not be performance-critical, but that turns out to be wrong in workloads that are heavily dependent on the speed of plpgsql functions. Speed it up by hard-coding the comparison rules, thereby avoiding palloc/pfree traffic from creating and immediately freeing an OverrideSearchPath object. Per report from Scott Marlowe.
* Improve typcache: cache negative lookup results, add invalidation logic.Tom Lane2014-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if the typcache had for example tried and failed to find a hash opclass for a given data type, it would nonetheless repeat the unsuccessful catalog lookup each time it was asked again. This can lead to a significant amount of useless bufmgr traffic, as in a recent report from Scott Marlowe. Like the catalog caches, typcache should be able to cache negative results. This patch arranges that by making use of separate flag bits to remember whether a particular item has been looked up, rather than treating a zero OID as an indicator that no lookup has been done. Also, install a credible invalidation mechanism, namely watching for inval events in pg_opclass. The sole advantage of the lack of negative caching was that the code would cope if operators or opclasses got added for a type mid-session; to preserve that behavior we have to be able to invalidate stale lookup results. Updates in pg_opclass should be pretty rare in production systems, so it seems sufficient to just invalidate all the dependent data whenever one happens. Adding proper invalidation also means that this code will now react sanely if an opclass is dropped mid-session. Arguably, that's a back-patchable bug fix, but in view of the lack of complaints from the field I'll refrain from back-patching. (Probably, in most cases where an opclass is dropped, the data type itself is dropped soon after, so that this misfeasance has no bad consequences.)
* Fix assertion failure at end of PITR.Heikki Linnakangas2014-11-28
| | | | | | | | | | | | | InitXLogInsert() cannot be called in a critical section, because it allocates memory. But CreateCheckPoint() did that, when called for the end-of-recovery checkpoint by the startup process. In the passing, fix the scratch space allocation in InitXLogInsert to go to the right memory context. Also update the comment at InitXLOGAccess, which hasn't been totally accurate since hot standby was introduced (in a hot standby backend, InitXLOGAccess isn't called at backend startup). Reported by Michael Paquier