aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Enable logical slots to follow timeline switchesAlvaro Herrera2016-03-30
| | | | | | | | | | | | | | | | | | When decoding from a logical slot, it's necessary for xlog reading to be able to read xlog from historical (i.e. not current) timelines; otherwise, decoding fails after failover, because the archives are in the historical timeline. This is required to make "failover logical slots" possible; it currently has no other use, although theoretically it could be used by an extension that creates a slot on a standby and continues to replay from the slot when the standby is promoted. This commit includes a module in src/test/modules with functions to manipulate the slots (which is not otherwise possible in SQL code) in order to enable testing, and a new test in src/test/recovery to ensure that the behavior is as expected. Author: Craig Ringer Reviewed-By: Oleksii Kliukin, Andres Freund, Petr Jelínek
* XLogReader general code cleanupAlvaro Herrera2016-03-30
| | | | | | | | | Some minor tweaks and comment additions, for cleanliness sake and to avoid having the upcoming timeline-following patch be polluted with unrelated cleanup. Extracted from a larger patch by Craig Ringer, reviewed by Andres Freund, with some additions by myself.
* Introduce traversalValue for SP-GiST scanTeodor Sigaev2016-03-30
| | | | | | | | | | | | | During scan sometimes it would be very helpful to know some information about parent node or all ancestor nodes. Right now reconstructedValue could be used but it's not a right usage of it (range opclass uses that). traversalValue is arbitrary piece of memory in separate MemoryContext while reconstructedVale should have the same type as indexed column. Subsequent patches for range opclass and quad4d tree will use it. Author: Alexander Lebedev, Teodor Sigaev
* Add new replication mode synchronous_commit = 'remote_apply'.Robert Haas2016-03-29
| | | | | | | | | | | | | | | | | | | In this mode, the master waits for the transaction to be applied on the remote side, not just written to disk. That means that you can count on a transaction started on the standby to see all commits previously acknowledged by the master. To make this work, the standby sends a reply after replaying each commit record generated with synchronous_commit >= 'remote_apply'. This introduces a small inefficiency: the extra replies will be sent even by standbys that aren't the current synchronous standby. But previously-existing synchronous_commit levels make no attempt at all to optimize which replies are sent based on what the primary cares about, so this is no worse, and at least avoids any extra replies for people not using the feature at all. Thomas Munro, reviewed by Michael Paquier and by me. Some additional tweaks by me.
* Don't use !! but != 0/NULL to force boolean evaluation.Andres Freund2016-03-27
| | | | | | | I introduced several uses of !! to force bit arithmetic to be boolean, but per discussion the project prefers != 0/NULL. Discussion: CA+TgmoZP5KakLGP6B4vUjgMBUW0woq_dJYi0paOz-My0Hwt_vQ@mail.gmail.com
* Support CREATE ACCESS METHODAlvaro Herrera2016-03-23
| | | | | | | | | | | | | | | This enables external code to create access methods. This is useful so that extensions can add their own access methods which can be formally tracked for dependencies, so that DROP operates correctly. Also, having explicit support makes pg_dump work correctly. Currently only index AMs are supported, but we expect different types to be added in the future. Authors: Alexander Korotkov, Petr Jelínek Reviewed-By: Teodor Sigaev, Petr Jelínek, Jim Nasby Commitfest-URL: https://commitfest.postgresql.org/9/353/ Discussion: https://www.postgresql.org/message-id/CAPpHfdsXwZmojm6Dx+TJnpYk27kT4o7Ri6X_4OSWcByu1Rm+VA@mail.gmail.com
* Merge wal_level "archive" and "hot_standby" into new name "replica"Peter Eisentraut2016-03-18
| | | | | | | | | | | | | | | | | The distinction between "archive" and "hot_standby" existed only because at the time "hot_standby" was added, there was some uncertainty about stability. This is now a long time ago. We would like to move forward with simplifying the replication configuration, but this distinction is in the way, because a primary server cannot tell (without asking a standby or predicting the future) which one of these would be the appropriate level. Pick a new name for the combined setting to make it clearer that it covers all (non-logical) backup and replication uses. The old values are still accepted but are converted internally. Reviewed-by: Michael Paquier <michael.paquier@gmail.com> Reviewed-by: David Steele <david@pgmasters.net>
* Fix typos.Robert Haas2016-03-15
| | | | Oskari Saarenmaa
* Fix typos in commentsAlvaro Herrera2016-03-15
|
* Fix memory leak in repeated GIN index searches.Tom Lane2016-03-13
| | | | | | | | | | | | | | | | | | | | | | | | Commit d88976cfa1302e8d removed this code from ginFreeScanKeys(): - if (entry->list) - pfree(entry->list); evidently in the belief that that ItemPointer array is allocated in the keyCtx and so would be reclaimed by the following MemoryContextReset. Unfortunately, it isn't and it won't. It'd likely be a good idea for that to become so, but as a simple and back-patchable fix in the meantime, restore this code to ginFreeScanKeys(). Also, add a similar pfree to where startScanEntry() is about to zero out entry->list. I am not sure if there are any code paths where this change prevents a leak today, but it seems like cheap future-proofing. In passing, make the initial allocation of so->entries[] use palloc not palloc0. The code doesn't depend on unused entries being zero; if it did, the array-enlargement code in ginFillScanEntry() would be wrong. So using palloc0 initially can only serve to confuse readers about what the invariant is. Per report from Felipe de Jesús Molina Bravo, via Jaime Casanova in <CAJGNTeMR1ndMU2Thpr8GPDUfiHTV7idELJRFusA5UXUGY1y-eA@mail.gmail.com>
* Fix a typo, and remove unnecessary pgstat_report_wait_end().Robert Haas2016-03-11
| | | | Per Amit Kapila.
* Provide much better wait information in pg_stat_activity.Robert Haas2016-03-10
| | | | | | | | | | | | When a process is waiting for a heavyweight lock, we will now indicate the type of heavyweight lock for which it is waiting. Also, you can now see when a process is waiting for a lightweight lock - in which case we will indicate the individual lock name or the tranche, as appropriate - or for a buffer pin. Amit Kapila, Ildus Kurbangaliev, reviewed by me. Lots of helpful discussion and suggestions by many others, including Alexander Korotkov, Vladimir Borodin, and many others.
* Reduce size of two phase file headerSimon Riggs2016-03-10
| | | | | | | Previously 2PC header was fixed at 200 bytes, which in most cases wasted WAL space for a workload using 2PC heavily. Pavan Deolasee, reviewed by Petr Jelinek
* Reduce lock level for altering fillfactorSimon Riggs2016-03-10
| | | | Fabrízio de Royes Mello and Simon Riggs
* Avoid unlikely data-loss scenarios due to rename() without fsync.Andres Freund2016-03-09
| | | | | | | | | | | | | | | | | | | | | Renaming a file using rename(2) is not guaranteed to be durable in face of crashes. Use the previously added durable_rename()/durable_link_or_rename() in various places where we previously just renamed files. Most of the changed call sites are arguably not critical, but it seems better to err on the side of too much durability. The most prominent known case where the previously missing fsyncs could cause data loss is crashes at the end of a checkpoint. After the actual checkpoint has been performed, old WAL files are recycled. When they're filled, their contents are fdatasynced, but we did not fsync the containing directory. An OS/hardware crash in an unfortunate moment could then end up leaving that file with its old name, but new content; WAL replay would thus not replay it. Reported-By: Tomas Vondra Author: Michael Paquier, Tomas Vondra, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches
* Fix incorrect handling of NULL index entries in indexed ROW() comparisons.Tom Lane2016-03-09
| | | | | | | | | | | | | | | | | | | | | | An index search using a row comparison such as ROW(a, b) > ROW('x', 'y') would stop upon reaching a NULL entry in the "b" column, ignoring the fact that there might be non-NULL "b" values associated with later values of "a". This happens because _bt_mark_scankey_required() marks the subsidiary scankey for "b" as required, which is just wrong: it's for a column after the one with the first inequality key (namely "a"), and thus can't be considered a required match. This bit of brain fade dates back to the very beginnings of our support for indexed ROW() comparisons, in 2006. Kind of astonishing that no one came across it before Glen Takahashi, in bug #14010. Back-patch to all supported versions. Note: the given test case doesn't actually fail in unpatched 9.1, evidently because the fix for bug #6278 (i.e., stopping at nulls in either scan direction) is required to make it fail. I'm sure I could devise a case that fails in 9.1 as well, perhaps with something involving making a cursor back up; but it doesn't seem worth the trouble.
* Add a generic command progress reporting facility.Robert Haas2016-03-09
| | | | | | | | | | | | | | | | | Using this facility, any utility command can report the target relation upon which it is operating, if there is one, and up to 10 64-bit counters; the intent of this is that users should be able to figure out what a utility command is doing without having to resort to ugly hacks like attaching strace to a backend. As a demonstration, this adds very crude reporting to lazy vacuum; we just report the target relation and nothing else. A forthcoming patch will make VACUUM report a bunch of additional data that will make this much more interesting. But this gets the basic framework in place. Vinayak Pokale, Rahila Syed, Amit Langote, Robert Haas, reviewed by Kyotaro Horiguchi, Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao, and Masanori Oyama.
* Add new flags argument for xl_heap_visible to heap2_desc.Robert Haas2016-03-08
| | | | Masahiko Sawada
* Department of second thoughts: remove PD_ALL_FROZEN.Robert Haas2016-03-08
| | | | | | | | | | | Commit a892234f830e832110f63fc0a2afce2fb21d1584 added a second bit per page to the visibility map, which still seems like a good idea, but it also added a second page-level bit alongside PD_ALL_VISIBLE to track whether the visibility map bit was set. That no longer seems like a clever plan, because we don't really need that bit for anything. We always clear both bits when the page is modified anyway. Patch by me, reviewed by Kyotaro Horiguchi and Masahiko Sawada.
* Ignore recovery_min_apply_delay until recovery has reached consistent stateFujii Masao2016-03-06
| | | | | | | | | | | | | | | | | | | Previously recovery_min_apply_delay was applied even before recovery had reached consistency. This could cause us to wait a long time unexpectedly for read-only connections to be allowed. It's problematic because the standby was useless during that wait time. This patch changes recovery_min_apply_delay so that it's applied once the database has reached the consistent state. That is, even if the delay is set, the standby tries to replay WAL records as fast as possible until it has reached consistency. Author: Michael Paquier Reviewed-By: Julien Rouhaud Reported-By: Greg Clough Backpatch: 9.4, where recovery_min_apply_delay was added Bug: #13770 Discussion: http://www.postgresql.org/message-id/20151111155006.2644.84564@wrigleys.postgresql.org
* Minor improvements to transaction manager README.Robert Haas2016-03-04
| | | | | | | | A simple SELECT is handled by PortalRunSelect, not ProcessQuery. Also, the previous indentation was unclear: change it so that a deeper level of indentation indicates that the outer function calls the inner one. Stas Kelvich
* Minor optimizations based on ParallelContext having nworkers_launched.Robert Haas2016-03-04
| | | | | | | | | | | Originally, we didn't have nworkers_launched, so code that used parallel contexts had to be preprared for the possibility that not all of the workers requested actually got launched. But now we can count on knowing the number of workers that were successfully launched, which can shave off a few cycles and simplify some code slightly. Amit Kapila, reviewed by Haribabu Kommi, per a suggestion from Peter Geoghegan.
* Revert buggy optimization of index scansSimon Riggs2016-03-03
| | | | | | | 606c0123d627 attempted to reduce cost of index scans using > and < strategies, though got that completely wrong in a few complex cases. Revert whole patch until we find a safe optimization.
* Change the format of the VM fork to add a second bit per page.Robert Haas2016-03-01
| | | | | | | | | | | | | | | | | | | | | | | The new bit indicates whether every tuple on the page is already frozen. It is cleared only when the all-visible bit is cleared, and it can be set only when we vacuum a page and find that every tuple on that page is both visible to every transaction and in no need of any future vacuuming. A future commit will use this new bit to optimize away full-table scans that would otherwise be triggered by XID wraparound considerations. A page which is merely all-visible must still be scanned in that case, but a page which is all-frozen need not be. This commit does not attempt that optimization, although that optimization is the goal here. It seems better to get the basic infrastructure in place first. Per discussion, it's very desirable for pg_upgrade to automatically migrate existing VM forks from the old format to the new format. That, too, will be handled in a follow-on patch. Masahiko Sawada, reviewed by Kyotaro Horiguchi, Fujii Masao, Amit Kapila, Simon Riggs, Andres Freund, and others, and substantially revised by me.
* Correct StartupSUBTRANS for page wraparoundSimon Riggs2016-02-19
| | | | | | | | | | StartupSUBTRANS() incorrectly handled cases near the max pageid in the subtrans data structure, which in some cases could lead to errors in startup for Hot Standby. This patch wraps the pageids correctly, avoiding any such errors. Identified by exhaustive crash testing by Jeff Janes. Jeff Janes
* Allow the WAL writer to flush WAL at a reduced rate.Andres Freund2016-02-16
| | | | | | | | | | | | | | | | | | | | | | | | Commit 4de82f7d7 increased the WAL flush rate, mainly to increase the likelihood that hint bits can be set quickly. More quickly set hint bits can reduce contention around the clog et al. But unfortunately the increased flush rate can have a significant negative performance impact, I have measured up to a factor of ~4. The reason for this slowdown is that if there are independent writes to the underlying devices, for example because shared buffers is a lot smaller than the hot data set, or because a checkpoint is ongoing, the fdatasync() calls force cache flushes to be emitted to the storage. This is achieved by flushing WAL only if the last flush was longer than wal_writer_delay ago, or if more than wal_writer_flush_after (new GUC) unflushed blocks are pending. Based on some tests the default for wal_writer_delay is 1MB, which seems to work well both on SSD and rotational media. To avoid negative performance impact due to 4de82f7d7 an earlier commit (db76b1e) made SetHintBits() more likely to succeed; preventing performance regressions in the pgbench tests I performed. Discussion: 20160118163908.GW10941@awork2.anarazel.de
* Change delimiter used for display of NextXIDJoe Conway2016-02-12
| | | | | | | | | | | | | NextXID has been rendered in the form of a pg_lsn even though it really is not. This can cause confusion, so change the format from %u/%u to %u:%u, per discussion on hackers. Complaint by me, patch by me and Bruce, reviewed by Michael Paquier and Alvaro. Applied to HEAD only. Author: Joe Conway, Bruce Momjian Reviewed-by: Michael Paquier, Alvaro Herrera Backpatch-through: master
* Make builtin lwlock tranche names consistent.Robert Haas2016-02-12
| | | | | | Previously, we had a mix of styles. Amit Kapila
* Shift the responsibility for emitting "database system is shut down".Tom Lane2016-02-11
| | | | | | | | | | | | | | | | | | Historically this message has been emitted at the end of ShutdownXLOG(). That's not an insane place for it in a standalone backend, but in the postmaster environment we've grown a fair amount of stuff that happens later, including archiver/walsender shutdown, stats collector shutdown, etc. Recent buildfarm experimentation showed that on slower machines there could be many seconds' delay between finishing ShutdownXLOG() and actual postmaster exit. That's fairly confusing, both for testing purposes and for DBAs. Hence, move the code that prints this message into UnlinkLockFiles(), so that it comes out just after we remove the postmaster's pidfile. That is a more appropriate definition of "is shut down" from the point of view of "pg_ctl stop", for example. In general, removing the pidfile should be the last externally-visible action of either a postmaster or a standalone backend; compare commit d73d14c271653dff10c349738df79ea03b85236c for instance. So this seems like a reasonably future-proof approach.
* Revert "Temporarily make pg_ctl and server shutdown a whole lot chattier."Tom Lane2016-02-10
| | | | | | This reverts commit 3971f64843b02e4a55d854156bd53e46a0588e45 and a couple of followon debugging commits; I think we've learned what we can from them.
* Add more chattiness in server shutdown.Tom Lane2016-02-09
| | | | | | | Early returns from the buildfarm show that there's a bit of a gap in the logging I added in 3971f64843b02e4a: the portion of CreateCheckPoint() after CheckPointGuts() can take a fair amount of time. Add a few more log messages in that section of code. This too shall be reverted later.
* Temporarily make pg_ctl and server shutdown a whole lot chattier.Tom Lane2016-02-08
| | | | | | | | | | | | | This is a quick hack, due to be reverted when its purpose has been served, to try to gather information about why some of the buildfarm critters regularly fail with "postmaster does not shut down" complaints. Maybe they are just really overloaded, but maybe something else is going on. Hence, instrument pg_ctl to print the current time when it starts waiting for postmaster shutdown and when it gives up, and add a lot of logging of the current time in the server's checkpoint and shutdown code paths. No attempt has been made to make this pretty. I'm not even totally sure if it will build on Windows, but we'll soon find out.
* Introduce a new GUC force_parallel_mode for testing purposes.Robert Haas2016-02-07
| | | | | | | | | | | | | | | When force_parallel_mode = true, we enable the parallel mode restrictions for all queries for which this is believed to be safe. For the subset of those queries believed to be safe to run entirely within a worker, we spin up a worker and run the query there instead of running it in the original process. When force_parallel_mode = regress, make additional changes to allow the regression tests to run cleanly even though parallel workers have been injected under the hood. Taken together, this facilitates both better user testing and better regression testing of the parallelism code. Robert Haas, with help from Amit Kapila and Rushabh Lathia.
* Introduce group locking to prevent parallel processes from deadlocking.Robert Haas2016-02-07
| | | | | | | | | | | | | For locking purposes, we now regard heavyweight locks as mutually non-conflicting between cooperating parallel processes. There are some possible pitfalls to this approach that are not to be taken lightly, but it works OK for now and can be changed later if we find a better approach. Without this, it's very easy for parallel queries to silently self-deadlock if the user backend holds strong relation locks. Robert Haas, with help from Amit Kapila. Thanks to Noah Misch and Andres Freund for extensive discussion of possible issues with this approach.
* Fix lossy KNN GiST when ordering operator returns non-float8 value.Teodor Sigaev2016-02-02
| | | | | | | | | | | | | | | | | KNN GiST with recheck flag should return to executor the same type as ordering operator, GiST detects this type by looking to return type of function which implements ordering operator. But occasionally detecting code works after replacing ordering operator function to distance support function. Distance support function always returns float8, so, detecting code get float8 instead of actual return type of ordering operator. Built-in opclasses don't have ordering operator which doesn't return non-float8 value, so, tests are impossible here, at least now. Backpatch to 9.5 where lozzy KNN was introduced. Author: Alexander Korotkov Report by: Artur Zakirov
* Make all built-in lwlock tranche IDs fixed.Robert Haas2016-02-02
| | | | | | | This makes the values more stable, which seems like a good thing for anybody who needs to look at at them. Alexander Korotkov and Amit Kapila
* Fix misspelled function name in comment.Heikki Linnakangas2016-02-01
|
* Fix whitespacePeter Eisentraut2016-01-30
|
* Add gin_clean_pending_list function to clean up GIN pending listFujii Masao2016-01-28
| | | | | | | | | | | | | | | | This function cleans up the pending list of the GIN index by moving entries in it to the main GIN data structure in bulk. It returns the number of pages cleaned up from the pending list. This function is useful, for example, when the pending list needs to be cleaned up *quickly* to improve the performance of the search using GIN index. VACUUM can do the same thing, too, but it may take days to run on a large table. Jeff Janes, reviewed by Julien Rouhaud, Jaime Casanova, Alvaro Herrera and me. Discussion: CAMkU=1x8zFkpfnozXyt40zmR3Ub_kHu58LtRmwHUKRgQss7=iQ@mail.gmail.com
* Improve ResourceOwners' behavior for large numbers of owned objects.Tom Lane2016-01-26
| | | | | | | | | | | | | | | | | | The original coding was quite fast so long as objects were always released in reverse order of addition; otherwise, it degenerated into O(N^2) behavior due to searching for the array element to delete. Improve matters by switching to hashed storage when the number of objects of a given type exceeds 64. (The cutover point is open to discussion, of course, but some simple performance testing suggests that hashing has enough overhead to be a loser below there.) Also, refactor resowner.c so that we don't need N copies of the array management code. Since all the resource IDs the code currently needs to deal with are either pointers or integers, it seems sufficient to create a one-size-fits-all infrastructure in which everything is converted to a Datum for storage. Aleksander Alekseev, reviewed by Stas Kelvich, further fixes by me
* Suppress compiler warning.Tom Lane2016-01-21
| | | | | | | | Given the limited range of i, these shifts should not cause any problem, but that apparently doesn't stop some compilers from whining about them. David Rowley
* Improve index AMs' opclass validation procedures.Tom Lane2016-01-21
| | | | | | | | | | | | | | | | | | | | The amvalidate functions added in commit 65c5fcd353a859da were on the crude side. Improve them in a few ways: * Perform signature checking for operators and support functions. * Apply more thorough checks for missing operators and functions, where possible. * Instead of reporting problems as ERRORs, report most problems as INFO messages and make the amvalidate function return FALSE. This allows more than one problem to be discovered per run. * Report object names rather than OIDs, and work a bit harder on making the messages understandable. Also, remove a few more opr_sanity regression test queries that are now superseded by the amvalidate checks.
* Remove unused argument from ginInsertCleanup()Fujii Masao2016-01-22
| | | | It's an oversight in commit dc943ad.
* Refactor headers to split out standby defsSimon Riggs2016-01-20
| | | | Jeff Janes
* Speedup 2PC by skipping two phase state files in normal pathSimon Riggs2016-01-20
| | | | | | | | | | | | | | | | | 2PC state info is written only to WAL at PREPARE, then read back from WAL at COMMIT PREPARED/ABORT PREPARED. Prepared transactions that live past one bufmgr checkpoint cycle will be written to disk in the same form as previously. Crash recovery path is not altered. Measured performance gains of 50-100% for short 2PC transactions by completely avoiding writing files and fsyncing. Other optimizations still available, further patches in related areas expected. Stas Kelvich and heavily edited by Simon Riggs Based upon earlier ideas and patches by Michael Paquier and Heikki Linnakangas, a concrete example of how Postgres-XC has fed back ideas into PostgreSQL. Reviewed by Michael Paquier, Jeff Janes and Andres Freund Performance testing by Jesper Pedersen
* Refactor to create generic WAL page read callbackSimon Riggs2016-01-20
| | | | | | | | | | Previously we didn’t have a generic WAL page read callback function, surprisingly. Logical decoding has logical_read_local_xlog_page(), which was actually generic, so move that to xlogfunc.c and rename to read_local_xlog_page(). Maintain logical_read_local_xlog_page() so existing callers still work. As requested by Michael Paquier, Alvaro Herrera and Andres Freund
* Fix assorted inconsistencies in GiST opclass support function declarations.Tom Lane2016-01-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conventions specified by the GiST SGML documentation were widely ignored. For example, the strategy-number argument for "consistent" and "distance" functions is specified to be a smallint, but most of the built-in support functions declared it as an integer, and for that matter the core code passed it using Int32GetDatum not Int16GetDatum. None of that makes any real difference at runtime, but it's quite confusing for newcomers to the code, and it makes it very hard to write an amvalidate() function that checks support function signatures. So let's try to instill some consistency here. Another similar issue is that the "query" argument is not of a single well-defined type, but could have different types depending on the strategy (corresponding to search operators with different righthand-side argument types). Some of the functions threw up their hands and declared the query argument as being of "internal" type, which surely isn't right ("any" would have been more appropriate); but the majority position seemed to be to declare it as being of the indexed data type, corresponding to a search operator with both input types the same. So I've specified a convention that that's what to do always. Also, the result of the "union" support function actually must be of the index's storage type, but the documentation suggested declaring it to return "internal", and some of the functions followed that. Standardize on telling the truth, instead. Similarly, standardize on declaring the "same" function's inputs as being of the storage type, not "internal". Also, somebody had forgotten to add the "recheck" argument to both the documentation of the "distance" support function and all of their SQL declarations, even though the C code was happily using that argument. Clean that up too. Fix up some other omissions in the docs too, such as documenting that union's second input argument is vestigial. So far as the errors in core function declarations go, we can just fix pg_proc.h and bump catversion. Adjusting the erroneous declarations in contrib modules is more debatable: in principle any change in those scripts should involve an extension version bump, which is a pain. However, since these changes are purely cosmetic and make no functional difference, I think we can get away without doing that.
* Restructure index access method API to hide most of it at the C level.Tom Lane2016-01-17
| | | | | | | | | | | | | | | | | | | | | | | | This patch reduces pg_am to just two columns, a name and a handler function. All the data formerly obtained from pg_am is now provided in a C struct returned by the handler function. This is similar to the designs we've adopted for FDWs and tablesample methods. There are multiple advantages. For one, the index AM's support functions are now simple C functions, making them faster to call and much less error-prone, since the C compiler can now check function signatures. For another, this will make it far more practical to define index access methods in installable extensions. A disadvantage is that SQL-level code can no longer see attributes of index AMs; in particular, some of the crosschecks in the opr_sanity regression test are no longer possible from SQL. We've addressed that by adding a facility for the index AM to perform such checks instead. (Much more could be done in that line, but for now we're content if the amvalidate functions more or less replace what opr_sanity used to do.) We might also want to expose some sort of reporting functionality, but this patch doesn't do that. Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily editorialized on by me.
* Add new user fn pg_current_xlog_flush_location()Simon Riggs2016-01-12
| | | | | Tomas Vondra, reviewed by Michael Paquier and Amit Kapila Minor edits by me
* Maintain local LogwrtResult consistentlySimon Riggs2016-01-12
| | | | | Teach GetFlushRecPtr() to update LogwrtResult cache as performed by all other functions in xlog.c