aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/common
Commit message (Collapse)AuthorAge
* Stop the executor if no more tuples can be sent from worker to leader.Robert Haas2016-06-06
| | | | | | | | | | | | | | | | | | | | | If a Gather node has read as many tuples as it needs (for example, due to Limit) it may detach the queue connecting it to the worker before reading all of the worker's tuples. Rather than let the worker continue to generate and send all of the results, have it stop after sending the next tuple. More could be done here to stop the worker even quicker, but this is about as well as we can hope to do for 9.6. This is in response to a problem report from Andreas Seltenreich. Commit 44339b892a04e94bbb472235882dc6f7023bdc65 should be actually be sufficient to fix that example even without this change, but it seems better to do this, too, since we might otherwise waste quite a large amount of effort in one or more workers. Discussion: CAA4eK1KOKGqmz9bGu+Z42qhRwMbm4R5rfnqsLCNqFs9j14jzEA@mail.gmail.com Amit Kapila
* Limit maximum parallel degree to 1024.Robert Haas2016-05-06
| | | | | | | | | | | | | | This new limit affects both the max_parallel_degree GUC and the parallel_degree reloption. There may some day be a use case for using more than 1024 CPUs for a single query, but that's surely not the case right now. Not only do not very many people have that many CPUs, but the code hasn't been tested at that kind of scale and is very unlikely to perform well, or even work at all, without a lot more work. The issue addressed by commit 06bd458cb812623c3f1fdd55216c4c08b06a8447 is probably just one problem of many. The idea of a more reasonable limit here was suggested by Tom Lane; the value of 1024 was suggested by Amit Kapila.
* Revert CREATE INDEX ... INCLUDING ...Teodor Sigaev2016-04-08
| | | | | | It's not ready yet, revert two commits 690c543550b0d2852060c18d270cdb534d339d9a - unstable test output 386e3d7609c49505e079c40c65919d99feb82505 - patch itself
* CREATE INDEX ... INCLUDING (column[, ...])Teodor Sigaev2016-04-08
| | | | | | | | | | Now indexes (but only B-tree for now) can contain "extra" column(s) which doesn't participate in index structure, they are just stored in leaf tuples. It allows to use index only scan by using single index instead of two or more indexes. Author: Anastasia Lubennikova with minor editorializing by me Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
* Add a 'parallel_degree' reloption.Robert Haas2016-04-08
| | | | | | | | | | The code that estimates what parallel degree should be uesd for the scan of a relation is currently rather stupid, so add a parallel_degree reloption that can be used to override the planner's rather limited judgement. Julien Rouhaud, reviewed by David Rowley, James Sewell, Amit Kapila, and me. Some further hacking by me.
* Reduce lock level for altering fillfactorSimon Riggs2016-03-10
| | | | Fabrízio de Royes Mello and Simon Riggs
* Restructure index access method API to hide most of it at the C level.Tom Lane2016-01-17
| | | | | | | | | | | | | | | | | | | | | | | | This patch reduces pg_am to just two columns, a name and a handler function. All the data formerly obtained from pg_am is now provided in a C struct returned by the handler function. This is similar to the designs we've adopted for FDWs and tablesample methods. There are multiple advantages. For one, the index AM's support functions are now simple C functions, making them faster to call and much less error-prone, since the C compiler can now check function signatures. For another, this will make it far more practical to define index access methods in installable extensions. A disadvantage is that SQL-level code can no longer see attributes of index AMs; in particular, some of the crosschecks in the opr_sanity regression test are no longer possible from SQL. We've addressed that by adding a facility for the index AM to perform such checks instead. (Much more could be done in that line, but for now we're content if the amvalidate functions more or less replace what opr_sanity used to do.) We might also want to expose some sort of reporting functionality, but this patch doesn't do that. Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily editorialized on by me.
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Re-Align *_freeze_max_age reloption limits with corresponding GUC limits.Andres Freund2015-10-05
| | | | | | | | | | | | | | In 020235a5754 I lowered the autovacuum_*freeze_max_age minimums to allow for easier testing of wraparounds. I did not touch the corresponding per-table limits. While those don't matter for the purpose of wraparound, it seems more consistent to lower them as well. It's noteworthy that the previous reloption lower limit for autovacuum_multixact_freeze_max_age was too high by one magnitude, even before 020235a5754. Discussion: 26377.1443105453@sss.pgh.pa.us Backpatch: back to 9.0 (in parts), like the prior patch
* Allow per-tablespace effective_io_concurrencyAlvaro Herrera2015-09-08
| | | | | | | | | | Per discussion, nowadays it is possible to have tablespaces that have wildly different I/O characteristics from others. Setting different effective_io_concurrency parameters for those has been measured to improve performance. Author: Julien Rouhaud Reviewed by: Andres Freund
* Fix misc typos.Heikki Linnakangas2015-09-05
| | | | Oskari Saarenmaa. Backpatch to stable branches where applicable.
* Reduce lock levels for ALTER TABLE SET autovacuum storage optionsSimon Riggs2015-08-14
| | | | | | | | | | | | Reduce lock levels down to ShareUpdateExclusiveLock for all autovacuum-related relation options when setting them using ALTER TABLE. Add infrastructure to allow varying lock levels for relation options in later patches. Setting multiple options together uses the highest lock level required for any option. Works for both main and toast tables. Fabrízio Mello, reviewed by Michael Paquier, mild edit and additional regression tests from myself
* Fix a number of places that produced XX000 errors in the regression tests.Tom Lane2015-08-02
| | | | | | | | | | | | | | | | | | | | It's against project policy to use elog() for user-facing errors, or to omit an errcode() selection for errors that aren't supposed to be "can't happen" cases. Fix all the violations of this policy that result in ERRCODE_INTERNAL_ERROR log entries during the standard regression tests, as errors that can reliably be triggered from SQL surely should be considered user-facing. I also looked through all the files touched by this commit and fixed other nearby problems of the same ilk. I do not claim to have fixed all violations of the policy, just the ones in these files. In a few places I also changed existing ERRCODE choices that didn't seem particularly appropriate; mainly replacing ERRCODE_SYNTAX_ERROR by something more specific. Back-patch to 9.5, but no further; changing ERRCODE assignments in stable branches doesn't seem like a good idea.
* Remove obsolete heap_formtuple/modifytuple/deformtuple functions.Heikki Linnakangas2015-07-02
| | | | | | | | | | These variants used the old-style 'n'/' ' NULL indicators. The new-style functions have been available since version 8.1. That should be long enough that if there is still any old external code using these functions, they can just switch to the new functions without worrying about backwards compatibility Peter Geoghegan
* Support "expanded" objects, particularly arrays, for better performance.Tom Lane2015-05-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the ability for complex datatypes to have an in-memory representation that is different from their on-disk format. On-disk formats are typically optimized for minimal size, and in any case they can't contain pointers, so they are often not well-suited for computation. Now a datatype can invent an "expanded" in-memory format that is better suited for its operations, and then pass that around among the C functions that operate on the datatype. There are also provisions (rudimentary as yet) to allow an expanded object to be modified in-place under suitable conditions, so that operations like assignment to an element of an array need not involve copying the entire array. The initial application for this feature is arrays, but it is not hard to foresee using it for other container types like JSON, XML and hstore. I have hopes that it will be useful to PostGIS as well. In this initial implementation, a few heuristics have been hard-wired into plpgsql to improve performance for arrays that are stored in plpgsql variables. We would like to generalize those hacks so that other datatypes can obtain similar improvements, but figuring out some appropriate APIs is left as a task for future work. (The heuristics themselves are probably not optimal yet, either, as they sometimes force expansion of arrays that would be better left alone.) Preliminary performance testing shows impressive speed gains for plpgsql functions that do element-by-element access or update of large arrays. There are other cases that get a little slower, as a result of added array format conversions; but we can hope to improve anything that's annoyingly bad. In any case most applications should see a net win. Tom Lane, reviewed by Andres Freund
* Fix postgres_fdw to return the right ctid value in EvalPlanQual cases.Tom Lane2015-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a postgres_fdw foreign table is a non-locked source relation in an UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, and the query selects its ctid column, the wrong value would be returned if an EvalPlanQual recheck occurred. This happened because the foreign table's result row was copied via the ROW_MARK_COPY code path, and EvalPlanQualFetchRowMarks just unconditionally set the reconstructed tuple's t_self to "invalid". To fix that, we can have EvalPlanQualFetchRowMarks copy the composite datum's t_ctid field, and be sure to initialize that along with t_self when postgres_fdw constructs a tuple to return. If we just did that much then EvalPlanQualFetchRowMarks would start returning "(0,0)" as ctid for all other ROW_MARK_COPY cases, which perhaps does not matter much, but then again maybe it might. The cause of that is that heap_form_tuple, which is the ultimate source of all composite datums, simply leaves t_ctid as zeroes in newly constructed tuples. That seems like a bad idea on general principles: a field that's really not been initialized shouldn't appear to have a valid value. So let's eat the trivial additional overhead of doing "ItemPointerSetInvalid(&(td->t_ctid))" in heap_form_tuple. This closes out our handling of Etsuro Fujita's report that tableoid and ctid weren't correctly set in postgres_fdw EvalPlanQual cases. Along the way we did a great deal of work to improve FDWs' ability to control row locking behavior; which was not wasted effort by any means, but it didn't end up being a fix for this problem because that feature would be too expensive for postgres_fdw to use all the time. Although the fix for the tableoid misbehavior was back-patched, I'm hesitant to do so here; it seems far less likely that people would care about remote ctid than tableoid, and even such a minor behavioral change as this in heap_form_tuple is perhaps best not back-patched. So commit to HEAD only, at least for the moment. Etsuro Fujita, with some adjustments by me
* Add log_min_autovacuum_duration per-table optionAlvaro Herrera2015-04-03
| | | | | | | | | This is useful to control autovacuum log volume, for situations where monitoring only a set of tables is necessary. Author: Michael Paquier Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa, Taiki Kondo, Huong Dangminh), Fujii Masao.
* Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[].Tom Lane2015-02-21
| | | | | | | This requires changing quite a few places that were depending on sizeof(HeapTupleHeaderData), but it seems for the best. Michael Paquier, some adjustments by me
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Rename pending_list_cleanup_size to gin_pending_list_limit.Fujii Masao2014-11-13
| | | | | Since this parameter is only for GIN index, it's better to add "gin" to the parameter name for easier understanding.
* Add GUC and storage parameter to set the maximum size of GIN pending list.Fujii Masao2014-11-11
| | | | | | | | | | | | | | | | Previously the maximum size of GIN pending list was controlled only by work_mem. But the reasonable value of work_mem and the reasonable size of the list are basically not the same, so it was not appropriate to control both of them by only one GUC, i.e., work_mem. This commit separates new GUC, pending_list_cleanup_size, from work_mem to allow users to control only the size of the list. Also this commit adds pending_list_cleanup_size as new storage parameter to allow users to specify the size of the list per index. This is useful, for example, when users want to increase the size of the list only for the GIN index which can be updated heavily, and decrease it otherwise. Reviewed by Etsuro Fujita.
* BRIN: Block Range IndexesAlvaro Herrera2014-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with the query quals; all pages in the range are returned in a lossy TID bitmap if the quals are consistent with the values in the summary tuple, otherwise not. Normal index scans are not supported because these indexes do not store TIDs. As new tuples are added into the index, the summary information is updated (if the block range in which the tuple is added is already summarized) or not; in the latter case, a subsequent pass of VACUUM or the brin_summarize_new_values() function will create the summary information. For data types with natural 1-D sort orders, the summary info consists of the maximum and the minimum values of each indexed column within each page range. This type of operator class we call "Minmax", and we supply a bunch of them for most data types with B-tree opclasses. Since the BRIN code is generalized, other approaches are possible for things such as arrays, geometric types, ranges, etc; even for things such as enum types we could do something different than minmax with better results. In this commit I only include minmax. Catalog version bumped due to new builtin catalog entries. There's more that could be done here, but this is a good step forwards. Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera, with contribution by Heikki Linnakangas. Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas. Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo. PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.
* Revert "Allow units to be specified in relation option setting value."Fujii Masao2014-08-29
| | | | | | | | | | | This reverts commit e23014f3d40f7d2c23bc97207fd28efbe5ba102b. As the side effect of the reverted commit, when the unit is specified, the reloption was stored in the catalog with the unit. This broke pg_dump (specifically, it prevented pg_dump from outputting restorable backup regarding the reloption) and turned the buildfarm red. Revert the commit until the fixed version is ready.
* Allow units to be specified in relation option setting value.Fujii Masao2014-08-28
| | | | | | | | | This introduces an infrastructure which allows us to specify the units like ms (milliseconds) in integer relation option, like GUC parameter. Currently only autovacuum_vacuum_cost_delay reloption can accept the units. Reviewed by Michael Paquier
* revert "Throw error for ALTER TABLE RESET of an invalid option"Bruce Momjian2014-08-25
| | | | | | Reverts commits 73d78e11a0f7183c80b93eefbbb6026fe9664015 and b0488e5c4fbfdce8acc989bdc17d9f0ec09ac281. Also reverts pg_upgrade changes.
* Throw error for ALTER TABLE RESET of an invalid optionBruce Momjian2014-08-25
| | | | | | | Also adjust pg_upgrade to not use this method for optional TOAST table creation. Patch by Fabrízio de Royes Mello
* Prevent memory leaks in parseRelOptions().Tom Lane2014-08-13
| | | | | | | | | | | | | | | | parseRelOptions() tended to leak memory in the caller's context. Most of the time this doesn't really matter since the caller's context is at most query-lifespan, and the function won't be invoked very many times. However, when testing with CLOBBER_CACHE_RECURSIVELY, the same relcache entry can get rebuilt a *lot* of times in one query, leading to significant intraquery memory bloat if it has any reloptions. Noted while investigating a related report from Tomas Vondra. In passing, get rid of some Asserts that are redundant with the one done by deconstruct_array(). As with other patches to avoid leaks in CLOBBER_CACHE testing, it doesn't really seem worth back-patching this.
* Move view reloptions into their own varlena structAlvaro Herrera2014-07-14
| | | | | | | Per discussion after a gripe from me in http://www.postgresql.org/message-id/20140611194633.GH18688@eldon.alvh.no-ip.org Jaime Casanova
* Avoid copying index tuples when building an index.Robert Haas2014-07-01
| | | | | | | | | | | | The previous code, perhaps out of concern for avoid memory leaks, formed the tuple in one memory context and then copied it to another memory context. However, this doesn't appear to be necessary, since index_form_tuple and the functions it calls take precautions against leaking memory. In my testing, building the tuple directly inside the sort context shaves several percent off the index build time. Rearrange things so we do that. Patch by me. Review by Amit Kapila, Tom Lane, Andres Freund.
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Fix failure to detoast fields in composite elements of structured types.Tom Lane2014-05-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have an array of records stored on disk, the individual record fields cannot contain out-of-line TOAST pointers: the tuptoaster.c mechanisms are only prepared to deal with TOAST pointers appearing in top-level fields of a stored row. The same applies for ranges over composite types, nested composites, etc. However, the existing code only took care of expanding sub-field TOAST pointers for the case of nested composites, not for other structured types containing composites. For example, given a command such as UPDATE tab SET arraycol = ARRAY[(ROW(x,42)::mycompositetype] ... where x is a direct reference to a field of an on-disk tuple, if that field is long enough to be toasted out-of-line then the TOAST pointer would be inserted as-is into the array column. If the source record for x is later deleted, the array field value would become a dangling pointer, leading to errors along the line of "missing chunk number 0 for toast value ..." when the value is referenced. A reproducible test case for this was provided by Jan Pecek, but it seems likely that some of the "missing chunk number" reports we've heard in the past were caused by similar issues. Code-wise, the problem is that PG_DETOAST_DATUM() is not adequate to produce a self-contained Datum value if the Datum is of composite type. Seen in this light, the problem is not just confined to arrays and ranges, but could also affect some other places where detoasting is done in that way, for example form_index_tuple(). I tried teaching the array code to apply toast_flatten_tuple_attribute() along with PG_DETOAST_DATUM() when the array element type is composite, but this was messy and imposed extra cache lookup costs whether or not any TOAST pointers were present, indeed sometimes when the array element type isn't even composite (since sometimes it takes a typcache lookup to find that out). The idea of extending that approach to all the places that currently use PG_DETOAST_DATUM() wasn't attractive at all. This patch instead solves the problem by decreeing that composite Datum values must not contain any out-of-line TOAST pointers in the first place; that is, we expand out-of-line fields at the point of constructing a composite Datum, not at the point where we're about to insert it into a larger tuple. This rule is applied only to true composite Datums, not to tuples that are being passed around the system as tuples, so it's not as invasive as it might sound at first. With this approach, the amount of code that has to be touched for a full solution is greatly reduced, and added cache lookup costs are avoided except when there actually is a TOAST pointer that needs to be inlined. The main drawback of this approach is that we might sometimes dereference a TOAST pointer that will never actually be used by the query, imposing a rather large cost that wasn't there before. On the other side of the coin, if the field value is used multiple times then we'll come out ahead by avoiding repeat detoastings. Experimentation suggests that common SQL coding patterns are unaffected either way, though. Applications that are very negatively affected could be advised to modify their code to not fetch columns they won't be using. In future, we might consider reverting this solution in favor of detoasting only at the point where data is about to be stored to disk, using some method that can drill down into multiple levels of nested structured types. That will require defining new APIs for structured types, though, so it doesn't seem feasible as a back-patchable fix. Note that this patch changes HeapTupleGetDatum() from a macro to a function call; this means that any third-party code using that macro will not get protection against creating TOAST-pointer-containing Datums until it's recompiled. The same applies to any uses of PG_RETURN_HEAPTUPLEHEADER(). It seems likely that this is not a big problem in practice: most of the tuple-returning functions in core and contrib produce outputs that could not possibly be toasted anyway, and the same probably holds for third-party extensions. This bug has existed since TOAST was invented, so back-patch to all supported branches.
* Address ccvalid/ccnoinherit in TupleDesc support functions.Noah Misch2014-03-23
| | | | | | | | | equalTupleDescs() neglected both of these ConstrCheck fields, and CreateTupleDescCopyConstr() neglected ccnoinherit. At this time, the only known behavior defect resulting from these omissions is constraint exclusion disregarding a CHECK constraint validated by an ALTER TABLE VALIDATE CONSTRAINT statement issued earlier in the same transaction. Back-patch to 9.2, where these fields were introduced.
* Separate multixact freezing parameters from xid'sAlvaro Herrera2014-02-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we were piggybacking on transaction ID parameters to freeze multixacts; but since there isn't necessarily any relationship between rates of Xid and multixact consumption, this turns out not to be a good idea. Therefore, we now have multixact-specific freezing parameters: vacuum_multixact_freeze_min_age: when to remove multis as we come across them in vacuum (default to 5 million, i.e. early in comparison to Xid's default of 50 million) vacuum_multixact_freeze_table_age: when to force whole-table scans instead of scanning only the pages marked as not all visible in visibility map (default to 150 million, same as for Xids). Whichever of both which reaches the 150 million mark earlier will cause a whole-table scan. autovacuum_multixact_freeze_max_age: when for cause emergency, uninterruptible whole-table scans (default to 400 million, double as that for Xids). This means there shouldn't be more frequent emergency vacuuming than previously, unless multixacts are being used very rapidly. Backpatch to 9.3 where multixacts were made to persist enough to require freezing. To avoid an ABI break in 9.3, VacuumStmt has a couple of fields in an unnatural place, and StdRdOptions is split in two so that the newly added fields can go at the end. Patch by me, reviewed by Robert Haas, with additional input from Andres Freund and Tom Lane.
* Revert C comment change in slot_attisnull()Bruce Momjian2014-01-28
| | | | Revert 89774b58b0ea2874765cae10c094bb6aaf707feb
* Adjust C comment in slot_attisnull() regarding nulls.Bruce Momjian2014-01-25
|
* Allow use of "z" flag in our printf calls, and use it where appropriate.Tom Lane2014-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | Since C99, it's been standard for printf and friends to accept a "z" size modifier, meaning "whatever size size_t has". Up to now we've generally dealt with printing size_t values by explicitly casting them to unsigned long and using the "l" modifier; but this is really the wrong thing on platforms where pointers are wider than longs (such as Win64). So let's start using "z" instead. To ensure we can do that on all platforms, teach src/port/snprintf.c to understand "z", and add a configure test to force use of that implementation when the platform's version doesn't handle "z". Having done that, modify a bunch of places that were using the unsigned-long hack to use "z" instead. This patch doesn't pretend to have gotten everyplace that could benefit, but it catches many of them. I made an effort in particular to ensure that all uses of the same error message text were updated together, so as not to increase the number of translatable strings. It's possible that this change will result in format-string warnings from pre-C99 compilers. We might have to reconsider if there are any popular compilers that will warn about this; but let's start by seeing what the buildfarm thinks. Andres Freund, with a little additional work by me
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Revise documentation for new freezing method.Robert Haas2013-12-23
| | | | | | | | Commit 37484ad2aacef5ec794f4dd3d5cf814475180a78 invalidated a good chunk of documentation, so patch it up to reflect the new state of play. Along the way, patch remaining documentation references to FrozenXID to say instead FrozenTransactionId, so that they match the way we actually spell it in the code.
* Change the way we mark tuples as frozen.Robert Haas2013-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | Instead of changing the tuple xmin to FrozenTransactionId, the combination of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never set together, is now defined as HEAP_XMIN_FROZEN. A variety of previous proposals to freeze tuples opportunistically before vacuum_freeze_min_age is reached have foundered on the objection that replacing xmin by FrozenTransactionId might hinder debugging efforts when things in this area go awry; this patch is intended to solve that problem by keeping the XID around (but largely ignoring the value to which it is set). Third-party code that checks for HEAP_XMIN_INVALID on tuples where HEAP_XMIN_COMMITTED might be set will be broken by this change. To fix, use the new accessor macros in htup_details.h rather than consulting the bits directly. HeapTupleHeaderGetXmin has been modified to return FrozenTransactionId when the infomask bits indicate that the tuple is frozen; use HeapTupleHeaderGetRawXmin when you already know that the tuple isn't marked commited or frozen, or want the raw value anyway. We currently do this in routines that display the xmin for user consumption, in tqual.c where it's known to be safe and important for the avoidance of extra cycles, and in the function-caching code for various procedural languages, which shouldn't invalidate the cache just because the tuple gets frozen. Robert Haas and Andres Freund
* Add a new reloption, user_catalog_table.Robert Haas2013-12-10
| | | | | | | | | | | | | | | When this reloption is set and wal_level=logical is configured, we'll record the CIDs stamped by inserts, updates, and deletes to the table just as we would for an actual catalog table. This will allow logical decoding to use historical MVCC snapshots to access such tables just as they access ordinary catalog tables. Replication solutions built around the logical decoding machinery will likely need to set this operation for their configuration tables; it might also be needed by extensions which perform table access in their output functions. Andres Freund, reviewed by myself and others.
* Support multi-argument UNNEST(), and TABLE() syntax for multiple functions.Tom Lane2013-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the ability to write TABLE( function1(), function2(), ...) as a single FROM-clause entry. The result is the concatenation of the first row from each function, followed by the second row from each function, etc; with NULLs inserted if any function produces fewer rows than others. This is believed to be a much more useful behavior than what Postgres currently does with multiple SRFs in a SELECT list. This syntax also provides a reasonable way to combine use of column definition lists with WITH ORDINALITY: put the column definition list inside TABLE(), where it's clear that it doesn't control the ordinality column as well. Also implement SQL-compliant multiple-argument UNNEST(), by turning UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)). The SQL standard specifies TABLE() with only a single function, not multiple functions, and it seems to require an implicit UNNEST() which is not what this patch does. There may be something wrong with that reading of the spec, though, because if it's right then the spec's TABLE() is just a pointless alternative spelling of UNNEST(). After further review of that, we might choose to adopt a different syntax for what this patch does, but in any case this functionality seems clearly worthwhile. Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and significantly revised by me
* Prevent memory leaks from accumulating across printtup() calls.Tom Lane2013-11-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically, printtup() has assumed that it could prevent memory leakage by pfree'ing the string result of each output function and manually managing detoasting of toasted values. This amounts to assuming that datatype output functions never leak any memory internally; an assumption we've already decided to be bogus elsewhere, for example in COPY OUT. range_out in particular is known to leak multiple kilobytes per call, as noted in bug #8573 from Godfried Vanluffelen. While we could go in and fix that leak, it wouldn't be very notationally convenient, and in any case there have been and undoubtedly will again be other leaks in other output functions. So what seems like the best solution is to run the output functions in a temporary memory context that can be reset after each row, as we're doing in COPY OUT. Some quick experimentation suggests this is actually a tad faster than the retail pfree's anyway. This patch fixes all the variants of printtup, except for debugtup() which is used in standalone mode. It doesn't seem worth worrying about query-lifespan leaks in standalone mode, and fixing that case would be a bit tedious since debugtup() doesn't currently have any startup or shutdown functions. While at it, remove manual detoast management from several other output-function call sites that had copied it from printtup(). This doesn't make a lot of difference right now, but in view of recent discussions about supporting "non-flattened" Datums, we're going to want that code gone eventually anyway. Back-patch to 9.2 where range_out was introduced. We might eventually decide to back-patch this further, but in the absence of known major leaks in older output functions, I'll refrain for now.
* Prevent using strncpy with src == dest in TupleDescInitEntry.Tom Lane2013-10-28
| | | | | | | | | | | | | | | | | The C and POSIX standards state that strncpy's behavior is undefined when source and destination areas overlap. While it remains dubious whether any implementations really misbehave when the pointers are exactly equal, some platforms are now starting to force the issue by complaining when an undefined call occurs. (In particular OS X 10.9 has been seen to dump core here, though the exact set of circumstances needed to trigger that remain elusive. Similar behavior can be expected to be optional on Linux and other platforms in the near future.) So tweak the code to explicitly do nothing when nothing need be done. Back-patch to all active branches. In HEAD, this also lets us get rid of an exception in valgrind.supp. Per discussion of a report from Matthias Schmitt.
* Add SQL Standard WITH ORDINALITY support for UNNEST (and any other SRF)Greg Stark2013-07-29
| | | | | Author: Andrew Gierth, David Fetter Reviewers: Dean Rasheed, Jeevan Chalke, Stephen Frost
* WITH CHECK OPTION support for auto-updatable VIEWsStephen Frost2013-07-18
| | | | | | | | | | | | | For simple views which are automatically updatable, this patch allows the user to specify what level of checking should be done on records being inserted or updated. For 'LOCAL CHECK', new tuples are validated against the conditionals of the view they are being inserted into, while for 'CASCADED CHECK' the new tuples are validated against the conditionals for all views involved (from the top down). This option is part of the SQL specification. Dean Rasheed, reviewed by Pavel Stehule
* Cooperate with the Valgrind instrumentation framework.Noah Misch2013-06-26
| | | | | | | | | | | | | | | | Valgrind "client requests" in aset.c and mcxt.c teach Valgrind and its Memcheck tool about the PostgreSQL allocator. This makes Valgrind roughly as sensitive to memory errors involving palloc chunks as it is to memory errors involving malloc chunks. Further client requests in PageAddItem() and printtup() verify that all bits being added to a buffer page or furnished to an output function are predictably-defined. Those tests catch failures of C-language functions to fully initialize the bits of a Datum, which in turn stymie optimizations that rely on _equalConst(). Define the USE_VALGRIND symbol in pg_config_manual.h to enable these additions. An included "suppression file" silences nominal errors we don't plan to fix. Reviewed in earlier versions by Peter Geoghegan and Korry Douglas.
* Add a materialized view relations.Kevin Grittner2013-03-03
| | | | | | | | | | | | | | | | | | | | | | A materialized view has a rule just like a view and a heap and other physical properties like a table. The rule is only used to populate the table, references in queries refer to the materialized data. This is a minimal implementation, but should still be useful in many cases. Currently data is only populated "on demand" by the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements. It is expected that future releases will add incremental updates with various timings, and that a more refined concept of defining what is "fresh" data will be developed. At some point it may even be possible to have queries use a materialized in place of references to underlying tables, but that requires the other above-mentioned features to be working first. Much of the documentation work by Robert Haas. Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja Security review by KaiGai Kohei, with a decision on how best to implement sepgsql still pending.
* Improve concurrency of foreign key lockingAlvaro Herrera2013-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces two additional lock modes for tuples: "SELECT FOR KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each other, in contrast with already existing "SELECT FOR SHARE" and "SELECT FOR UPDATE". UPDATE commands that do not modify the values stored in the columns that are part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety. Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this means the concurrency improvement applies to them, which is the whole point of this patch. The added tuple lock semantics require some rejiggering of the multixact module, so that the locking level that each transaction is holding can be stored alongside its Xid. Also, multixacts now need to persist across server restarts and crashes, because they can now represent not only tuple locks, but also tuple updates. This means we need more careful tracking of lifetime of pg_multixact SLRU files; since they now persist longer, we require more infrastructure to figure out when they can be removed. pg_upgrade also needs to be careful to copy pg_multixact files over from the old server to the new, or at least part of multixact.c state, depending on the versions of the old and new servers. Tuple time qualification rules (HeapTupleSatisfies routines) need to be careful not to consider tuples with the "is multi" infomask bit set as being only locked; they might need to look up MultiXact values (i.e. possibly do pg_multixact I/O) to find out the Xid that updated a tuple, whereas they previously were assured to only use information readily available from the tuple header. This is considered acceptable, because the extra I/O would involve cases that would previously cause some commands to block waiting for concurrent transactions to finish. Another important change is the fact that locking tuples that have previously been updated causes the future versions to be marked as locked, too; this is essential for correctness of foreign key checks. This causes additional WAL-logging, also (there was previously a single WAL record for a locked tuple; now there are as many as updated copies of the tuple there exist.) With all this in place, contention related to tuples being checked by foreign key rules should be much reduced. As a bonus, the old behavior that a subtransaction grabbing a stronger tuple lock than the parent (sub)transaction held on a given tuple and later aborting caused the weaker lock to be lost, has been fixed. Many new spec files were added for isolation tester framework, to ensure overall behavior is sane. There's probably room for several more tests. There were several reviewers of this patch; in particular, Noah Misch and Andres Freund spent considerable time in it. Original idea for the patch came from Simon Riggs, after a problem report by Joel Jacobson. Most code is from me, with contributions from Marti Raudsepp, Alexander Shulgin, Noah Misch and Andres Freund. This patch was discussed in several pgsql-hackers threads; the most important start at the following message-ids: AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com 1290721684-sup-3951@alvh.no-ip.org 1294953201-sup-2099@alvh.no-ip.org 1320343602-sup-2290@alvh.no-ip.org 1339690386-sup-8927@alvh.no-ip.org 4FE5FF020200002500048A3D@gw.wicourts.gov 4FEAB90A0200002500048B7D@gw.wicourts.gov
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Split tuple struct defs from htup.h to htup_details.hAlvaro Herrera2012-08-30
| | | | | | | | | | | | This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.