postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
...
*	Silence compiler warnings	Alvaro Herrera	2016-09-28
\| \| \| \|	Reported by Peter Eisentraut. Coding suggested by Tom Lane.
*	Rationalize format-picture caching logic in formatting.c.	Tom Lane	2016-09-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a validity flag to DCHCacheEntry and NUMCacheEntry entries, and do not set it true until after we've parsed the supplied format string. This allows dealing with possible errors while parsing the format without the baroque hack that was there before (which only covered errors within NUMDesc_prepare, anyway). We can get rid of the PG_TRY in NUMDesc_prepare, as well as last_NUMCacheEntry and NUM_cache_remove. (Essentially, this reverts commit ff783fbae in favor of a less fragile solution; the problems with that approach are well illustrated by later hacking such as 55f927a46.) In passing, define the size of these caches as DCH_CACHE_ENTRIES not DCH_CACHE_FIELDS + 1 (whoever thought that was a good definition?) and likewise for the NUM cache. Also const-ify format string parameters where convenient, and merge duplicated cache lookup logic. This is primarily driven by a proposed patch from Artur Zakirov, which introduced some ereport's into format string parsing for the datetime case. He proposed preventing the creation of invalid cache entries by parsing the format string first into a local-variable array, and then copying that to a cache entry. That seemed a bit ugly to me, and anyway randomly different from the way the identical problem had been solved for the numeric case. Let's make the two sets of code more similar not less so. I'm not sure whether we'll adopt the new error conditions Artur proposes, but this patch seems like good code cleanup and future-proofing in any case. The existing code is critically (and undocumented-ly) dependent on no elog being thrown out of several nontrivial functions, which is trouble waiting to happen, though it doesn't seem to be actively broken today. Discussion: <b2a39359-3282-b402-f4a3-057aae500ee7@postgrespro.ru>
*	Make to_timestamp() and to_date() range-check fields of their input.	Tom Lane	2016-09-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Historically, something like to_date('2009-06-40','YYYY-MM-DD') would return '2009-07-10' because there was no prohibition on out-of-range month or day numbers. This has been widely panned, and it also turns out that Oracle throws an error in such cases. Since these functions are nominally Oracle-compatibility features, let's change that. There's no particular restriction on year (modulo the fact that the scanner may not believe that more than 4 digits are year digits, a matter to be addressed separately if at all). But we now check month, day, hour, minute, second, and fractional-second fields, as well as day-of-year and second-of-day fields if those are used. Currently, no checks are made on ISO-8601-style week numbers or day numbers; it's not very clear what the appropriate rules would be there, and they're probably so little used that it's not worth sweating over. Artur Zakirov, reviewed by Amul Sul, further adjustments by me Discussion: <1873520224.1784572.1465833145330.JavaMail.yahoo@mail.yahoo.com> See-Also: <57786490.9010201@wars-nicht.de>
*	Remove dead line of code	Peter Eisentraut	2016-09-28
\|
*	Fix CRC check handling in get_controlfile	Peter Eisentraut	2016-09-28
\| \| \| \| \| \| \| \|	The previous patch broke this by returning NULL for a failed CRC check, which pg_controldata would then try to read. Fix by returning the result of the CRC check in a separate argument. Michael Paquier and myself
*	Fix dangling pointer problem in ReorderBufferSerializeChange.	Robert Haas	2016-09-28
\| \| \| \| \| \| \| \| \|	Commit 3fe3511d05127cc024b221040db2eeb352e7d716 introduced a new case into this function, but neglected to ensure that the "ondisk" pointer got updated after a possible reallocation as the code does in other cases. Stas Kelvich, per diagnosis by Konstantin Knizhnik.
*	Turn password_encryption GUC into an enum.	Heikki Linnakangas	2016-09-28
\| \| \| \| \| \| \| \| \| \| \| \| \|	This makes the parameter easier to extend, to support other password-based authentication protocols than MD5. (SCRAM is being worked on.) The GUC still accepts on/off as aliases for "md5" and "plain", although we may want to remove those once we actually add support for another password hash type. Michael Paquier, reviewed by David Steele, with some further edits by me. Discussion: <CAB7nPqSMXU35g=W9X74HVeQp0uvgJxvYOuA4A-A3M+0wfEBv-w@mail.gmail.com>
*	Disallow pushing volatile quals past set-returning functions.	Tom Lane	2016-09-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pushing an upper-level restriction clause into an unflattened subquery-in-FROM is okay when the subquery contains no SRFs in its targetlist, or when it does but the SRFs are unreferenced by the clause and the clause is not volatile. Otherwise, we're changing the number of times the clause is evaluated, which is bad for volatile quals, and possibly changing the result, since a volatile qual might succeed for some SRF output rows and not others despite not referencing any of the changing columns. (Indeed, if the clause is something like "random() > 0.5", the user is probably expecting exactly that behavior.) We had most of these restrictions down, but not the one about the upper clause not being volatile. Fix that, and add a regression test to illustrate the expected behavior. Although this is definitely a bug, it doesn't seem like back-patch material, since possibly some users don't realize that the broken behavior is broken and are relying on what happens now. Also, while the added test is quite cheap in the wake of commit a4c35ea1c, it would be much more expensive (or else messier) in older branches. Per report from Tom van Tilburg. Discussion: <CAP3PPDiucxYCNev52=YPVkrQAPVF1C5PFWnrQPT7iMzO1fiKFQ@mail.gmail.com>
*	Include <sys/select.h> where needed	Alvaro Herrera	2016-09-27
\| \| \| \| \| \| \| \| \| \| \| \|	<sys/select.h> is required by POSIX.1-2001 to get the prototype of select(2), but nearly no systems enforce that because older standards let you get away with including some other headers. Recent OpenBSD hacking has removed that frail touch of friendliness, however, which broke some compiles; fix all the way back to 9.1 by adding the required standard. Only vacuumdb.c was reported to fail, but it seems easier to fix the whole lot in a fell swoop. Per bug #14334 by Sean Farrell.
*	Replace the built-in GIN array opclasses with a single polymorphic opclass.	Tom Lane	2016-09-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We had thirty different GIN array opclasses sharing the same operators and support functions. That still didn't cover all the built-in types, nor did it cover arrays of extension-added types. What we want is a single polymorphic opclass for "anyarray". There were two missing features needed to make this possible: 1. We have to be able to declare the index storage type as ANYELEMENT when the opclass is declared to index ANYARRAY. This just takes a few more lines in index_create(). Although this currently seems of use only for GIN, there's no reason to make index_create() restrict it to that. 2. We have to be able to identify the proper GIN compare function for the index storage type. This patch proceeds by making the compare function optional in GIN opclass definitions, and specifying that the default btree comparison function for the index storage type will be looked up when the opclass omits it. Again, that seems pretty generically useful. Since the comparison function lookup is done in initGinState(), making use of the second feature adds an additional cache lookup to GIN index access setup. It seems unlikely that that would be very noticeable given the other costs involved, but maybe at some point we should consider making GinState data persist longer than it now does --- we could keep it in the index relcache entry, perhaps. Rather fortuitously, we don't seem to need to do anything to get this change to play nice with dump/reload or pg_upgrade scenarios: the new opclass definition is automatically selected to replace existing index definitions, and the on-disk data remains compatible. Also, if a user has created a custom opclass definition for a non-builtin type, this doesn't break that, since CREATE INDEX will prefer an exact match to opcintype over a match to ANYARRAY. However, if there's anyone out there with handwritten DDL that explicitly specifies _bool_ops or one of the other replaced opclass names, they'll need to adjust that. Tom Lane, reviewed by Enrique Meneses Discussion: <14436.1470940379@sss.pgh.pa.us>
*	Refer to OS X as "macOS", except for the port name which is still "darwin".	Tom Lane	2016-09-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We weren't terribly consistent about whether to call Apple's OS "OS X" or "Mac OS X", and the former is probably confusing to people who aren't Apple users. Now that Apple has rebranded it "macOS", follow their lead to establish a consistent naming pattern. Also, avoid the use of the ancient project name "Darwin", except as the port code name which does not seem desirable to change. (In short, this patch touches documentation and comments, but no actual code.) I didn't touch contrib/start-scripts/osx/, either. I suspect those are obsolete and due for a rewrite, anyway. I dithered about whether to apply this edit to old release notes, but those were responsible for quite a lot of the inconsistencies, so I ended up changing them too. Anyway, Apple's being ahistorical about this, so why shouldn't we be?
*	Remove useless code.	Tom Lane	2016-09-23
\| \| \| \| \| \| \| \| \|	Apparent copy-and-pasteo in standby_desc_invalidations() had two entries for msg->id == SHAREDINVALRELMAP_ID. Aleksander Alekseev Discussion: <20160923090814.GB1238@e733>
*	Don't trust CreateFileMapping() to clear the error code on success.	Tom Lane	2016-09-23
\| \| \| \| \| \| \| \| \| \| \| \| \|	We must test GetLastError() even when CreateFileMapping() returns a non-null handle. If that value were left over from some previous system call, we might be fooled into thinking the segment already existed. Experimentation on Windows 7 suggests that CreateFileMapping() clears the error code on success, but it is not documented to do so, so let's not rely on that happening in all Windows releases. Amit Kapila Discussion: <20811.1474390987@sss.pgh.pa.us>
*	Avoid using PostmasterRandom() for DSM control segment ID.	Tom Lane	2016-09-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commits 470d886c3 et al intended to fix the problem that the postmaster selected the same "random" DSM control segment ID on every start. But using PostmasterRandom() for that destroys the intended property that the delay between random_start_time and random_stop_time will be unpredictable. (Said delay is probably already more predictable than we could wish, but that doesn't mean that reducing it by a couple orders of magnitude is OK.) Revert the previous patch and add a comment warning against misuse of PostmasterRandom. Fix the original problem by calling srandom() early in PostmasterMain, using a low-security seed that will later be overwritten by PostmasterRandom. Discussion: <20789.1474390434@sss.pgh.pa.us>
*	C comment: fix function header comment	Bruce Momjian	2016-09-22
\| \| \| \| \| \|	Fix for transformOnConflictClause(). Author: Tomonari Katsumata
*	Remove nearly-unused SizeOfIptrData macro.	Tom Lane	2016-09-22
\| \| \| \| \| \| \| \| \| \| \| \| \|	Past refactorings have removed all but one reference to SizeOfIptrData (and that one place was in a pretty noncritical spot). Since nobody's complained, it seems probable that there are no supported compilers that don't think sizeof(ItemPointerData) is 6. If there are, we're wasting MAXALIGN per heap tuple anyway, so it's rather silly to worry about whether we can shave space in places like WAL records. Pavan Deolasee Discussion: <CABOikdOOawDda4hwLOT6zdA6MFfPLu3Z2YBZkX0JdayNS6JOeQ@mail.gmail.com>
*	Be sure to rewind the tuplestore read pointer in non-leader CTEScan nodes.	Tom Lane	2016-09-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ExecInitCteScan supposed that it didn't have to do anything to the extra tuplestore read pointer it gets from tuplestore_alloc_read_pointer. However, it needs this read pointer to be positioned at the start of the tuplestore, while tuplestore_alloc_read_pointer is actually defined as cloning the current position of read pointer 0. In normal situations that accidentally works because we initialize the whole plan tree at once, before anything gets read. But it fails in an EvalPlanQual recheck, as illustrated in bug #14328 from Dima Pavlov. To fix, just forcibly rewind the pointer after tuplestore_alloc_read_pointer. The cost of doing so is negligible unless the tuplestore is already in TSS_READFILE state, which wouldn't happen in normal cases. We could consider altering tuplestore's API to make that case cheaper, but that would make for a more invasive back-patch and it doesn't seem worth it. This has been broken probably for as long as we've had CTEs, so back-patch to all supported branches. Discussion: <32468.1474548308@sss.pgh.pa.us>
*	Delay updating control file to "in production"	Peter Eisentraut	2016-09-21
\| \| \| \| \| \| \| \| \| \|	Move the updating of the control file to "in production" status until the point where WAL writes are allowed. Before, there could be a significant gap between the control file update and write transactions actually being allowed. This makes it more reliable to use the control status to verify the end of a promotion. From: Michael Paquier <michael.paquier@gmail.com>
*	pg_ctl: Detect current standby state from pg_control	Peter Eisentraut	2016-09-21
\| \| \| \| \| \| \| \| \| \|	pg_ctl used to determine whether a server was in standby mode by looking for a recovery.conf file. With this change, it instead looks into pg_control, which is potentially more accurate. There are also occasional discussions about removing recovery.conf, so this removes one dependency. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
*	Use PostmasterRandom(), not random(), for DSM control segment ID.	Robert Haas	2016-09-20
\| \| \| \| \|	Otherwise, every startup gets the same "random" value, which is definitely not what was intended.
*	Retry DSM control segment creation if Windows indicates access denied.	Robert Haas	2016-09-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, attempts to run multiple postmasters running on the same machine may fail, because Windows sometimes returns ERROR_ACCESS_DENIED rather than ERROR_ALREADY_EXISTS when there is an existing segment. Hitting this bug is much more likely because of another defect not fixed by this patch, namely that dsm_postmaster_startup() uses random() which returns the same value every time. But that's not a reason not to fix this. Kyotaro Horiguchi and Amit Kapila, reviewed by Michael Paquier Discussion: <CAA4eK1JyNdMeF-dgrpHozDecpDfsRZUtpCi+1AbtuEkfG3YooQ@mail.gmail.com>
*	Fix outdated comments, GIST search queue is not an RBTree anymore.	Heikki Linnakangas	2016-09-20
\| \| \| \| \| \|	The GiST search queue is implemented as a pairing heap rather than as Red-Black Tree, since 9.5 (commit e7032610). I neglected these comments in that commit.
*	Add debugging aid "bmsToString(Bitmapset *bms)".	Tom Lane	2016-09-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function has no direct callers at present, but it's convenient for manual use in a debugger, rather than having to inspect memory and do bit-counting in your head. In passing, get rid of useless outBitmapset() wrapper around _outBitmapset(); let's just export the function that does the work. Likewise for outToken(). Ashutosh Bapat, tweaked a bit by me Discussion: <CAFjFpRdiht8e1HTVirbubr4YzaON5iZTzFJjq909y4sU8M_6eA@mail.gmail.com>
*	Clarify policy on marking inherited constraints as valid.	Robert Haas	2016-09-15
\| \| \| \|	Amit Langote and Robert Haas
*	Fix building with LibreSSL.	Heikki Linnakangas	2016-09-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LibreSSL defines OPENSSL_VERSION_NUMBER to claim that it is version 2.0.0, but it doesn't have the functions added in OpenSSL 1.1.0. Add autoconf checks for the individual functions we need, and stop relying on OPENSSL_VERSION_NUMBER. Backport to 9.5 and 9.6, like the patch that broke this. In the back-branches, there are still a few OPENSSL_VERSION_NUMBER checks left, to check for OpenSSL 0.9.8 or 0.9.7. I left them as they were - LibreSSL has all those functions, so they work as intended. Per buildfarm member curculio. Discussion: <2442.1473957669@sss.pgh.pa.us>
*	Fix typo in comment.	Robert Haas	2016-09-15
\| \| \| \|	Amit Langote
*	Make min_parallel_relation_size's default value platform-independent.	Tom Lane	2016-09-15
\| \| \| \| \| \| \| \| \| \|	The documentation states that the default value is 8MB, but this was only true at BLCKSZ = 8kB, because the default was hard-coded as 1024. Make the code match the docs by computing the default as 8MB/BLCKSZ. Oversight in commit 75be66464, noted pursuant to a gripe from Peter E. Discussion: <90634e20-097a-e4fd-67d5-fb2c42f0dd71@2ndquadrant.com>
*	Support OpenSSL 1.1.0.	Heikki Linnakangas	2016-09-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changes needed to build at all: - Check for SSL_new in configure, now that SSL_library_init is a macro. - Do not access struct members directly. This includes some new code in pgcrypto, to use the resource owner mechanism to ensure that we don't leak OpenSSL handles, now that we can't embed them in other structs anymore. - RAND_SSLeay() -> RAND_OpenSSL() Changes that were needed to silence deprecation warnings, but were not strictly necessary: - RAND_pseudo_bytes() -> RAND_bytes(). - SSL_library_init() and OpenSSL_config() -> OPENSSL_init_ssl() - ASN1_STRING_data() -> ASN1_STRING_get0_data() - DH_generate_parameters() -> DH_generate_parameters() - Locking callbacks are not needed with OpenSSL 1.1.0 anymore. (Good riddance!) Also change references to SSLEAY_VERSION_NUMBER with OPENSSL_VERSION_NUMBER, for the sake of consistency. OPENSSL_VERSION_NUMBER has existed since time immemorial. Fix SSL test suite to work with OpenSSL 1.1.0. CA certificates must have the "CA:true" basic constraint extension now, or OpenSSL will refuse them. Regenerate the test certificates with that. The "openssl" binary, used to generate the certificates, is also now more picky, and throws an error if an X509 extension is specified in "req_extensions", but that section is empty. Backpatch to all supported branches, per popular demand. In back-branches, we still support OpenSSL 0.9.7 and above. OpenSSL 0.9.6 should still work too, but I didn't test it. In master, we only support 0.9.8 and above. Patch by Andreas Karlsson, with additional changes by me. Discussion: <20160627151604.GD1051@msg.df7cb.de>
*	Fix and clarify comments on replacement selection.	Heikki Linnakangas	2016-09-15
\| \| \| \| \|	These were modified by the patch to only use replacement selection for the first run in an external sort.
*	Add overflow checks to money type input function	Peter Eisentraut	2016-09-14
\| \| \| \| \| \| \| \| \| \|	The money type input function did not have any overflow checks at all. There were some regression tests that purported to check for overflow, but they actually checked for the overflow behavior of the int8 type before casting to money. Remove those unnecessary checks and add some that actually check the money input function. Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
*	Be pickier about converting between Name and Datum.	Tom Lane	2016-09-13
\| \| \| \| \| \| \| \| \| \| \| \| \|	We were misapplying NameGetDatum() to plain C strings in some places. This worked, because it was just a pointer cast anyway, but it's a type cheat in some sense. Use CStringGetDatum instead, and modify the NameGetDatum macro so it won't compile if applied to something that's not a pointer to NameData. This should result in no changes to generated code, but it is logically cleaner. Mark Dilger, tweaked a bit by me Discussion: <EFD8AC94-4C1F-40C1-A5EA-304080089C1B@gmail.com>
*	Fix executor/README to reflect disallowing SRFs in UPDATE.	Tom Lane	2016-09-13
\| \| \| \| \|	The parenthetical comment here is obsoleted by commit a4c35ea1c. Noted by Andres Freund.
*	Improve parser's and planner's handling of set-returning functions.	Tom Lane	2016-09-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Teach the parser to reject misplaced set-returning functions during parse analysis using p_expr_kind, in much the same way as we do for aggregates and window functions (cf commit eaccfded9). While this isn't complete (it misses nesting-based restrictions), it's much better than the previous error reporting for such cases, and it allows elimination of assorted ad-hoc expression_returns_set() error checks. We could add nesting checks later if it seems important to catch all cases at parse time. There is one case the parser will now throw error for although previous versions allowed it, which is SRFs in the tlist of an UPDATE. That never behaved sensibly (since it's ill-defined which generated row should be used to perform the update) and it's hard to see why it should not be treated as an error. It's a release-note-worthy change though. Also, add a new Query field hasTargetSRFs reporting whether there are any SRFs in the targetlist (including GROUP BY/ORDER BY expressions). The parser can now set that basically for free during parse analysis, and we can use it in a number of places to avoid expression_returns_set searches. (There will be more such checks soon.) In some places, this allows decontorting the logic since it's no longer expensive to check for SRFs in the tlist --- so I made the checks parallel to the handling of hasAggs/hasWindowFuncs wherever it seemed appropriate. catversion bump because adding a Query field changes stored rules. Andres Freund and Tom Lane Discussion: <24639.1473782855@sss.pgh.pa.us>
*	Have heapam.h include lockdefs.h rather than lock.h.	Robert Haas	2016-09-13
\| \| \| \| \| \| \| \| \|	lockdefs.h was only split from lock.h relatively recently, and represents a minimal subset of the old lock.h. heapam.h only needs that smaller subset, so adjust it to include only that. This requires some corresponding adjustments elsewhere. Peter Geoghegan
*	Fix copy/pasto in file identification	Simon Riggs	2016-09-12
\| \| \| \|	Daniel Gustafsson
*	Identify walsenders in pg_stat_activity	Simon Riggs	2016-09-12
\| \| \| \| \| \| \| \| \|	Following 8299471c37fff0b walsender procs are now visible in pg_stat_activity. Set query to ‘walsender’ for walsender procs to allow them to be identified. Discussion:CAB7nPqS8c76KPSufK_HSDeYrbtg+zZ7D0EEkjeM6txSEuCB_jA@mail.gmail.com Michael Paquier, issue raised by Fujii Masao, reviewed by Tom Lane
*	Raise max setting of checkpoint_timeout to 1d	Simon Riggs	2016-09-11
\| \| \| \| \| \| \|	Previously checkpoint_timeout was capped at 3600s New max setting is 86400s = 24h = 1d Discussion: 32558.1454471895@sss.pgh.pa.us
*	Allow CREATE EXTENSION to follow extension update paths.	Tom Lane	2016-09-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, to update an extension you had to produce both a version-update script and a new base installation script. It's become more and more obvious that that's tedious, duplicative, and error-prone. This patch attempts to improve matters by allowing the new base installation script to be omitted. CREATE EXTENSION will install a requested version if it can find a base script and a chain of update scripts that will get there. As in the existing update logic, shorter chains are preferred if there's more than one possibility, with an arbitrary tie-break rule for chains of equal length. Also adjust the pg_available_extension_versions view to show such versions as installable. While at it, refactor the code so that CASCADE processing works for extensions requested during ApplyExtensionUpdates(). Without this, addition of a new requirement in an updated extension would require creating a new base script, even if there was no other reason to do that. (It would be easy at this point to add a CASCADE option to ALTER EXTENSION UPDATE, to allow the same thing to happen during a manually-commanded version update, but I have not done that here.) Tom Lane, reviewed by Andres Freund Discussion: <20160905005919.jz2m2yh3und2dsuy@alap3.anarazel.de>
*	Implement binary heap replace-top operation in a smarter way.	Heikki Linnakangas	2016-09-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In external sort's merge phase, we maintain a binary heap holding the next tuple from each input tape. On each step, the topmost tuple is returned, and replaced with the next tuple from the same tape. We were doing the replacement by deleting the top node in one operation, and inserting the next tuple after that. However, you can do a "replace-top" operation more efficiently, in one "sift-up". A deletion will always walk the heap from top to bottom, but in a replacement, we can stop as soon as we find the right place for the new tuple. This is particularly helpful, if the tapes are not in completely random order, so that the next tuple from a tape is likely to land near the top of the heap. Peter Geoghegan, reviewed by Claudio Freire, with some editing by me. Discussion: <CAM3SWZRhBhiknTF_=NjDSnNZ11hx=U_SEYwbc5vd=x7M4mMiCw@mail.gmail.com>
*	Fix miserable coding in pg_stat_get_activity().	Tom Lane	2016-09-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit dd1a3bccc replaced a test on whether a subroutine returned a null pointer with a test on whether &pointer->backendStatus was null. This accidentally failed to fail, at least on common compilers, because backendStatus is the first field in the struct; but it was surely trouble waiting to happen. Commit f91feba87 then messed things up further, changing the logic to local_beentry = pgstat_fetch_stat_local_beentry(curr_backend); if (!local_beentry) continue; beentry = &local_beentry->backendStatus; if (!beentry) { where the second "if" is now dead code, so that the intended behavior of printing a row with "<backend information not available>" cannot occur. I suspect this is all moot because pgstat_fetch_stat_local_beentry will never actually return null in this function's usage, but it's still very poor coding. Repair back to 9.4 where the original problem was introduced.
*	Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple.	Tom Lane	2016-09-09
\| \| \| \| \| \| \| \| \| \| \| \|	The full generality of deleting an arbitrary number of tuples is no longer needed, so let's save some code and cycles by replacing the original coding with an implementation based on PageIndexTupleDelete. We can always get back the old code from git if we need it again for new callers (though I don't care for its willingness to mess with line pointers it wasn't told to mess with). Discussion: <552.1473445163@sss.pgh.pa.us>
*	Convert PageAddItem into a macro to save a few cycles.	Tom Lane	2016-09-09
\| \| \| \| \| \| \| \| \| \|	Nowadays this is just a backwards-compatibility wrapper around PageAddItemExtended, so let's avoid the extra level of function call. In addition, because pretty much all callers are passing constants for the two bool arguments, compilers will be able to constant-fold the conversion to a flags bitmask. Discussion: <552.1473445163@sss.pgh.pa.us>
*	Invent PageIndexTupleOverwrite, and teach BRIN and GiST to use it.	Tom Lane	2016-09-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PageIndexTupleOverwrite performs approximately the same function as PageIndexTupleDelete (or PageIndexDeleteNoCompact) followed by PageAddItem targeting the same item pointer offset. But in the case where the new tuple is the same size as the old, it avoids shuffling other data around on the page, because the new tuple is placed where the old one was rather than being appended to the end of the page. This has been shown to provide a substantial speedup for some GiST use-cases. Also, this change allows some API simplifications: we can get rid of the rather klugy and error-prone PAI_ALLOW_FAR_OFFSET flag for PageAddItemExtended, since that was used only to cover a corner case for BRIN that's better expressed by using PageIndexTupleOverwrite. Note that this patch causes a rather subtle WAL incompatibility: the physical page content change represented by certain WAL records is now different than it was before, because while the tuples have the same itempointer line numbers, the tuples themselves are in different places. I have not bumped the WAL version number because I think it doesn't matter unless you are trying to do bitwise comparisons of original and replayed pages, and in any case we're early in a devel cycle and there will probably be more WAL changes before v10 gets out the door. There is probably room to make use of PageIndexTupleOverwrite in SP-GiST and GIN too, but that is left for a future patch. Andrey Borodin, reviewed by Anastasia Lubennikova, whacked around a bit by me Discussion: <CAJEAwVGQjGGOj6mMSgMwGvtFd5Kwe6VFAxY=uEPZWMDjzbn4VQ@mail.gmail.com>
*	Fix locking a tuple updated by an aborted (sub)transaction	Alvaro Herrera	2016-09-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When heap_lock_tuple decides to follow the update chain, it tried to also lock any version of the tuple that was created by an update that was subsequently rolled back. This is pointless, since for all intents and purposes that tuple exists no more; and moreover it causes misbehavior, as reported independently by Marko Tiikkaja and Marti Raudsepp: some SELECT FOR UPDATE/SHARE queries may fail to return the tuples, and assertion-enabled builds crash. Fix by having heap_lock_updated_tuple test the xmin and return success immediately if the tuple was created by an aborted transaction. The condition where tuples become invisible occurs when an updated tuple chain is followed by heap_lock_updated_tuple, which reports the problem as HeapTupleSelfUpdated to its caller heap_lock_tuple, which in turn propagates that code outwards possibly leading the calling code (ExecLockRows) to believe that the tuple exists no longer. Backpatch to 9.3. Only on 9.5 and newer this leads to a visible failure, because of commit 27846f02c176; before that, heap_lock_tuple skips the whole dance when the tuple is already locked by the same transaction, because of the ancient HeapTupleSatisfiesUpdate behavior. Still, the buggy condition may also exist in more convoluted scenarios involving concurrent transactions, so it seems safer to fix the bug in the old branches too. Discussion: https://www.postgresql.org/message-id/CABRT9RC81YUf1=jsmWopcKJEro=VoeG2ou6sPwyOUTx_qteRsg@mail.gmail.com https://www.postgresql.org/message-id/48d3eade-98d3-8b9a-477e-1a8dc32a724d@joh.to
*	In PageIndexTupleDelete, don't assume stored item lengths are MAXALIGNed.	Tom Lane	2016-09-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PageAddItem stores the item length as-is. It MAXALIGN's the amount of space actually allocated for each tuple, but not the stored length. PageRepairFragmentation, PageIndexMultiDelete, and PageIndexDeleteNoCompact are all on board with this and MAXALIGN item lengths after fetching them. But PageIndexTupleDelete expects the stored length to be a MAXALIGN multiple already. This accidentally works for existing index AMs because they all maxalign their tuple sizes internally; but we don't do that for heap tuples, and it shouldn't be a requirement for index tuples either. So, sync PageIndexTupleDelete with the rest of bufpage.c by having it maxalign the item size after fetching. Also add a check that pd_special is maxaligned, to ensure that the test "(offset + size) > phdr->pd_special" is still doing the right thing. (If offset and pd_special are aligned, it doesn't matter whether size is.) Again, this is in sync with the rest of the routines here, except for PageAddItem which doesn't test because it doesn't actually do anything for which pd_special alignment matters. This shouldn't have any immediate functional impact; it just adds the flexibility to use PageIndexTupleDelete on index tuples with non-aligned lengths. Discussion: <3814.1473366762@sss.pgh.pa.us>
*	Avoid reporting "cache lookup failed" for some user-reachable cases.	Tom Lane	2016-09-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have a not-terribly-thoroughly-enforced-yet project policy that internal errors with SQLSTATE XX000 (ie, plain elog) should not be triggerable from SQL. record_in, domain_in, and PL validator functions all failed to meet this standard, because they threw plain elog("cache lookup failed for XXX") errors on bad OIDs, and those are all invokable from SQL. For record_in, the best fix is to upgrade typcache.c (lookup_type_cache) to throw a user-facing error for this case. That seems consistent because it was more than halfway there already, having user-facing errors for shell types and non-composite types. Having done that, tweak domain_in to rely on the typcache to throw an appropriate error. (This costs little because InitDomainConstraintRef would fetch the typcache entry anyway.) For the PL validator functions, we already have a single choke point at CheckFunctionValidatorAccess, so just fix its error to be user-facing. Dilip Kumar, reviewed by Haribabu Kommi Discussion: <87wpxfygg9.fsf@credativ.de>
*	Fix corruption of 2PC recovery with subxacts	Simon Riggs	2016-09-09
\| \| \| \| \| \| \| \|	Reading 2PC state files during recovery was borked, causing corruptions during recovery. Effect limited to servers with 2PC, subtransactions and recovery/replication. Stas Kelvich, reviewed by Michael Paquier and Pavan Deolasee
*	Improve scalability of md.c for large relations.	Andres Freund	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	So far md.c used a linked list of segments. That proved to be a problem when processing large relations, because every smgr.c/md.c level access to a page incurred walking through a linked list of all preceding segments. Thus making accessing pages O(#segments). Replace the linked list of segments hanging off SMgrRelationData with an array of opened segments. That allows O(1) access to individual segments, if they've previously been opened. Discussion: <20140331101001.GE13135@alap3.anarazel.de> Reviewed-By: Peter Geoghegan, Tom Lane (in an older version)
*	Faster PageIsVerified() for the all zeroes case.	Andres Freund	2016-09-08
\| \| \| \| \| \| \| \|	That's primarily useful for testing very large relations, using sparse files. Discussion: <20140331101001.GE13135@alap3.anarazel.de> Reviewed-By: Peter Geoghegan
*	Fix mdtruncate() to close fd.c handle of deleted segments.	Andres Freund	2016-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mdtruncate() forgot to FileClose() a segment's mdfd_vfd, when deleting it. That lead to a fd.c handle to a truncated file being kept open until backend exit. The issue appears to have been introduced way back in 1a5c450f3024ac5, before that the handle was closed inside FileUnlink(). The impact of this bug is limited - only VACUUM and ON COMMIT TRUNCATE for temporary tables, truncate files in place (i.e. TRUNCATE itself is not affected), and the relation has to be bigger than 1GB. The consequences of a leaked fd.c handle aren't severe either. Discussion: <20160908220748.oqh37ukwqqncbl3n@alap3.anarazel.de> Backpatch: all supported releases