aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Remove unnecessary cleanup code.Robert Haas2014-05-22
| | | | | | | This is all inside a block guarded by op == DSM_OP_ATTACH, so it can never be the case that op == DSM_OP_CREATE. Reported by Coverity.
* Fix typos in comments.Fujii Masao2014-05-22
|
* Fix typos in comments.Heikki Linnakangas2014-05-21
|
* Prevent auto_explain from changing the output of a user's EXPLAIN.Tom Lane2014-05-20
| | | | | | | | | | | | | | | | | | Commit af7914c6627bcf0b0ca614e9ce95d3f8056602bf, which introduced the EXPLAIN (TIMING) option, for some reason coded explain.c to look at planstate->instrument->need_timer rather than es->timing to decide whether to print timing info. However, the former flag might get set as a result of contrib/auto_explain wanting timing information. We certainly don't want activation of auto_explain to change user-visible statement behavior, so fix that. Also fix an independent bug introduced in the same patch: in the code path for a never-executed node with a machine-friendly output format, if timing was selected, it would fail to print the Actual Rows and Actual Loops items. Per bug #10404 from Tomonari Katsumata. Back-patch to 9.2 where the faulty code was introduced.
* Update obsolete comment.Tom Lane2014-05-19
| | | | Peter Geoghegan
* Fix backup-block numbering in redo of b-tree split.Heikki Linnakangas2014-05-19
| | | | | | | | | | | | I got the backup block numbers off-by-one in the commit that changed the way incomplete-splits are handled. I blame the comments, which said "backup block 1" and "backup block 2", even though the backup blocks are numbered starting from 0, in the macros and functions used in replay. Fix the comments and the code. Per Jeff Janes' bug report about corruption caused by torn page writes. The incorrect code is new in git master, but backpatch the comment change down to 9.0, where the numbering in the redo-side macros was changed.
* Ooops, I broke initdb with that last patch.Tom Lane2014-05-18
| | | | | | That's what I get for not fully retesting the final version of the patch. The replace_allowed cross-check needs an additional special case for bootstrapping.
* Fix two ancient memory-leak bugs in relcache.c.Tom Lane2014-05-18
| | | | | | | | | | | | | | | | | | | | | RelationCacheInsert() ignored the possibility that hash_search(HASH_ENTER) might find a hashtable entry already present for the same OID. However, that can in fact occur during recursive relcache load scenarios. When it did happen, we overwrote the pointer to the pre-existing Relation, causing a session-lifespan leakage of that entire structure. As far as is known, the pre-existing Relation would always have reference count zero by the time we arrive back at the outer insertion, so add code that deletes the pre-existing Relation if so. If by some chance its refcount is positive, elog a WARNING and allow the pre-existing Relation to be leaked as before. Also, AttrDefaultFetch() was sloppy about leaking the cstring form of the pg_attrdef.adbin value it's copying into the relcache structure. This is only a query-lifespan leakage, and normally not very significant, but it adds up during CLOBBER_CACHE testing. These bugs are of very ancient vintage, but I'll refrain from back-patching since there's no evidence that these leaks amount to anything in ordinary usage.
* Make fallback implementation of pg_memory_barrier() work.Tom Lane2014-05-17
| | | | | | | | | | | | | The fallback implementation involves acquiring and releasing a spinlock variable that is otherwise unreferenced --- not even to the extent of initializing it. This accidentally fails to fail on platforms where spinlocks should be initialized to zeroes, but elsewhere it results in a "stuck spinlock" failure during startup. I griped about this last July, and put in a hack that worked for gcc on HPPA, but didn't get around to fixing the general case. Per the discussion back then, the best thing to do seems to be to initialize dummy_spinlock in main.c.
* Fix a bunch of functions that were declared static then defined not-static.Tom Lane2014-05-17
| | | | Per testing with a compiler that whines about this.
* Fix unaligned accesses in DecodeUpdate().Tom Lane2014-05-17
| | | | | | | | | | | | The xl_heap_header_len structures in an XLOG_HEAP_UPDATE record aren't necessarily aligned adequately. The regular replay function for these records is aware of that, but decode.c didn't get the memo. I'm not sure why the buildfarm failed to catch this; the test_decoding test certainly blows up real good on my old HPPA box. Also, I'm pretty sure that the address arithmetic was wrong for the case of XLOG_HEAP_CONTAINS_OLD and not XLOG_HEAP_CONTAINS_NEW_TUPLE, though this apparently can't happen when logical decoding is active.
* Update README, we don't do post-recovery cleanup actions anymore.Heikki Linnakangas2014-05-17
| | | | | | | transam/README explained how B-tree incomplete splits were tracked and fixed after recovery, as an example of handling complex actions that need multiple WAL records, but that's not how it works anymore. Explain the new paradigm.
* Make sure chr(int) can't create invalid UTF8 sequences.Tom Lane2014-05-16
| | | | | | | | | | | | | | | | | | Several years ago we changed chr(int) so that if the database encoding is UTF8, it would interpret its argument as a Unicode code point and expand it into the appropriate multibyte sequence. However, we weren't sufficiently careful about checking validity of the input. According to RFC3629, UTF8 disallows code points above U+10FFFF (note that the predecessor standard RFC2279 was more liberal). Also, both versions of the UTF8 spec agree that Unicode surrogate-pair codes should never appear in UTF8. Because our encoding validity checks follow RFC3629, our failure to enforce these restrictions in chr() means it could be used to produce text strings that will be rejected when the database is dumped and reloaded. To ensure consistency with the input functions, let's actually apply pg_utf8_islegal() to the proposed output of chr(). Per discussion, this seems like too much of a behavioral change to back-patch, but it's not too late to squeeze it into 9.4.
* Fix thinko in logical decoding of commit-prepared records.Heikki Linnakangas2014-05-16
| | | | | | | The decoding of prepared transaction commits accidentally used the XID of the transaction performing the COMMIT PREPARED, not the XID of the prepared transaction. Before bb38fb0d43c8d that lead to those transactions not being decoded, afterwards to a assertion failure.
* Initialize tsId and dbId fields in WAL record of COMMIT PREPARED.Heikki Linnakangas2014-05-16
| | | | | | | | | | | | | | | | Commit dd428c79 added dbId and tsId to the xl_xact_commit struct but missed that prepared transaction commits reuse that struct. Fix that. Because those fields were left unitialized, replaying a commit prepared WAL record in a hot standby node would fail to remove the relcache init file. That can lead to "could not open file" errors on the standby. Relcache init file only needs to be removed when a system table/index is rewritten in the transaction using two phase commit, so that should be rare in practice. In HEAD, the incorrect dbId/tsId values are also used for filtering in logical replication code, causing the transaction to always be filtered out. Analysis and fix by Andres Freund. Backpatch to 9.0 where hot standby was introduced.
* Fix unportable setvbuf() usage in initdb.Tom Lane2014-05-15
| | | | | | | | | | | | In yesterday's commit 2dc4f011fd61501cce507be78c39a2677690d44b, I tried to force buffering of stdout/stderr in initdb to be what it is by default when the program is run interactively on Unix (since that's how most manual testing is done). This tripped over the fact that Windows doesn't support _IOLBF mode. We dealt with that a long time ago in syslogger.c by falling back to unbuffered mode on Windows. Export that solution in port.h and use it in initdb. Back-patch to 8.4, like the previous commit.
* Handle duplicate XIDs in txid_snapshot.Heikki Linnakangas2014-05-15
| | | | | | | | | | The proc array can contain duplicate XIDs, when a transaction is just being prepared for two-phase commit. To cope, remove any duplicates in txid_current_snapshot(). Also ignore duplicates in the input functions, so that if e.g. you have an old pg_dump file that already contains duplicates, it will be accepted. Report and fix by Jan Wieck. Backpatch to all supported versions.
* Fix race condition in preparing a transaction for two-phase commit.Heikki Linnakangas2014-05-15
| | | | | | | | | | | | | | | | | | | | To lock a prepared transaction's shared memory entry, we used to mark it with the XID of the backend. When the XID was no longer active according to the proc array, the entry was implicitly considered as not locked anymore. However, when preparing a transaction, the backend's proc array entry was cleared before transfering the locks (and some other state) to the prepared transaction's dummy PGPROC entry, so there was a window where another backend could finish the transaction before it was in fact fully prepared. To fix, rewrite the locking mechanism of global transaction entries. Instead of an XID, just have simple locked-or-not flag in each entry (we store the locking backend's backend id rather than a simple boolean, but that's just for debugging purposes). The backend is responsible for explicitly unlocking the entry, and to make sure that that happens, install a callback to unlock it on abort or process exit. Backpatch to all supported versions.
* Misc message style and doc fixes.Heikki Linnakangas2014-05-15
| | | | Euler Taveira
* Code review for recent changes in relcache.c.Tom Lane2014-05-14
| | | | | | | | | | | | | | | | | | | | | | | rd_replidindex should be managed the same as rd_oidindex, and rd_keyattr and rd_idattr should be managed like rd_indexattr. Omissions in this area meant that the bitmapsets computed for rd_keyattr and rd_idattr would be leaked during any relcache flush, resulting in a slow but permanent leak in CacheMemoryContext. There was also a tiny probability of relcache entry corruption if we ran out of memory at just the wrong point in RelationGetIndexAttrBitmap. Otherwise, the fields were not zeroed where expected, which would not bother the code any AFAICS but could greatly confuse anyone examining the relcache entry while debugging. Also, create an API function RelationGetReplicaIndex rather than letting non-relcache code be intimate with the mechanisms underlying caching of that value (we won't even mention the memory leak there). Also, fix a relcache flush hazard identified by Andres Freund: RelationGetIndexAttrBitmap must not assume that rd_replidindex stays valid across index_open. The aspects of this involving rd_keyattr date back to 9.3, so back-patch those changes.
* Fix harmless access to uninitialized memory.Heikki Linnakangas2014-05-13
| | | | | | | | | When cache invalidations arrive while ri_LoadConstraintInfo() is busy filling a new cache entry, InvalidateConstraintCacheCallBack() compares the - not yet initialized - oidHashValue field with the to-be-invalidated hash value. To fix, check whether the entry is already marked as invalid. Andres Freund
* Find postgresql.auto.conf in PGDATA even when postgresql.conf is elsewhere.Tom Lane2014-05-11
| | | | | | | | | | | | | | | | | | | | | | | | The original coding for ALTER SYSTEM made a fundamentally bogus assumption that postgresql.auto.conf could be sought relative to the main config file if we hadn't yet determined the value of data_directory. This fails for common arrangements with the config file elsewhere, as reported by Christoph Berg. The simplest fix is to not try to read postgresql.auto.conf until after SelectConfigFiles has chosen (and locked down) the data_directory setting. Because of the logic in ProcessConfigFile for handling resetting of GUCs that've been removed from the config file, we cannot easily read the main and auto config files separately; so this patch adopts a brute force approach of reading the main config file twice during postmaster startup. That's a tad ugly, but the actual time cost is likely to be negligible, and there's no time for a more invasive redesign before beta. With this patch, any attempt to set data_directory via ALTER SYSTEM will be silently ignored. It would probably be better to throw an error, but that can be dealt with later. This bug, however, would prevent any testing of ALTER SYSTEM by a significant fraction of the userbase, so it seems important to get it fixed before beta.
* Rename jsonb_hash_ops to jsonb_path_ops.Tom Lane2014-05-11
| | | | | | | | | | | | | There's no longer much pressure to switch the default GIN opclass for jsonb, but there was still some unhappiness with the name "jsonb_hash_ops", since hashing is no longer a distinguishing property of that opclass, and anyway it seems like a relatively minor detail. At the suggestion of Heikki Linnakangas, we'll use "jsonb_path_ops" instead; that captures the important characteristic that each index entry depends on the entire path from the document root to the indexed value. Also add a user-facing explanation of the implementation properties of these two opclasses.
* Translation updatesPeter Eisentraut2014-05-10
|
* Rename min_recovery_apply_delay to recovery_min_apply_delay.Tom Lane2014-05-10
| | | | | | | Per discussion, this seems like a more consistent choice of name. Fabrízio de Royes Mello, after a suggestion by Peter Eisentraut; some additional documentation wordsmithing by me
* Fix bug in lossy-page handling in GINHeikki Linnakangas2014-05-10
| | | | | | | | | | | When returning rows from a bitmap, as done with partial match queries, we would get stuck in an infinite loop if the bitmap contained a lossy page reference. This bug is new in master, it was introduced by the patch to allow skipping items refuted by other entries in GIN scans. Report and fix by Alexander Korotkov
* Fix broken allocation logic in recently-rewritten jsonb_util.c.Tom Lane2014-05-09
| | | | | | | | | | | | reserveFromBuffer() failed to consider the possibility that it needs to more-than-double the current buffer size. Beyond that, it seems likely that we'd someday need to worry about integer overflow of the buffer length variable. Rather than reinvent the logic that's already been debugged in stringinfo.c, let's go back to using that logic. We can still have the same targeted API, but we'll rely on stringinfo.c to manage reallocation. Per report from Alexander Korotkov.
* Get rid of bogus dependency on typcategory in to_json() and friends.Tom Lane2014-05-09
| | | | | | | | | | | | | | | | | | | These functions were relying on typcategory to identify arrays and composites, which is not reliable and not the normal way to do it. Using typcategory to identify boolean, numeric types, and json itself is also pretty questionable, though the code in those cases didn't seem to be at risk of anything worse than wrong output. Instead, use the standard lsyscache functions to identify arrays and composites, and rely on a direct check of the type OID for the other cases. In HEAD, also be sure to look through domains so that a domain is treated the same as its base type for conversions to JSON. However, this is a small behavioral change; given the lack of field complaints, we won't back-patch it. In passing, refactor so that there's only one copy of the code that decides which conversion strategy to apply, not multiple copies that could (and have) gotten out of sync.
* Code review for logical decoding patch.Robert Haas2014-05-09
| | | | | | | | | | Post-commit review identified a number of places where addition was used instead of multiplication or memory wasn't zeroed where it should have been. This commit also fixes one case where a structure member was mis-initialized, and moves another memory allocation closer to the place where the allocated storage is used for clarity. Andres Freund
* Remove overeager assertion in logical_heap_begin_rewrite.Robert Haas2014-05-09
| | | | | | | It's legal to configure wal_level=logical and max_replication_slots=0 simultaneously. Andres Freund
* Teach add_json() that jsonb is of TYPCATEGORY_JSON.Tom Lane2014-05-09
| | | | | | | This code really needs to be refactored so that there aren't so many copies that can diverge. Not to mention that this whole approach is probably wrong. But for the moment I'll just stick my finger in the dike. Per report from Michael Paquier.
* More jsonb cleanup.Heikki Linnakangas2014-05-09
| | | | | | | | | | | | | | | | Fix JSONB_MAX_ELEMS and JSONB_MAX_PAIRS macros to use CB_MASK in the calculation. JENTRY_POSMASK happens to have the same value at the moment, but that's just coincidental. Refactor jsonb iterator functions, for readability. Get rid of the JENTRY_ISFIRST flag. Whenever we handle JEntrys, we have access to the whole array and have enough context information to know which entry is the first. This frees up one bit in the JEntry header for future use. While we're at it, shuffle the JEntry bits so that boolean true and false go together, for aesthetic reasons. Bump catalog version as this changes the on-disk format slightly.
* Improve key representation for GIN jsonb_ops, and fix existence-search bug.Tom Lane2014-05-09
| | | | | | | | | | | | | | | | | | | | | | Change the key representation so that values that would exceed 127 bytes are hashed into short strings, and so that the original JSON datatype of each value is recorded in the index. The hashing rule eliminates the major objection to having this opclass be the default for jsonb, namely that it could fail for plausible input data (due to GIN's restrictions on maximum key length). Preserving datatype information doesn't really buy us much right now, but it requires no extra space compared to the previous way, and it might be useful later. Also, change the consistency-checking functions to request recheck for exists (jsonb ? text) and related operators. The original analysis that this is an exactly checkable query was incorrect, since the index does not preserve information about whether a key appears at top level in the indexed JSON object. Add a test case demonstrating the problem. Make some other, mostly cosmetic improvements to the code in jsonb_gin.c as well. catversion bump due to on-disk data format change in jsonb_ops indexes.
* Minor cleanup of jsonb_util.cHeikki Linnakangas2014-05-09
| | | | | | Move the functions around to group related functions together. Remove binequal argument from lengthCompareJsonbStringValue, moving that responsibility to lengthCompareJsonbPair. Fix typo in comment.
* Avoid some pnstrdup()s when constructing jsonbHeikki Linnakangas2014-05-09
| | | | | This speeds up text to jsonb parsing and hstore to jsonb conversions somewhat.
* Increase the default value of effective_cache_size to 4GB.Tom Lane2014-05-08
| | | | | | | | | | Per discussion, the old value of 128MB is ridiculously small on modern machines; in fact, it's not even any larger than the default value of shared_buffers, which it certainly should be. Increase to 4GB, which is unlikely to be any worse than the old default for anyone, and should be noticeably better for most. Eventually we might have an autotuning scheme for this setting, but the recent attempt crashed and burned, so for now just do this.
* Revert "Auto-tune effective_cache size to be 4x shared buffers"Tom Lane2014-05-08
| | | | | | | | | | | | | | This reverts commit ee1e5662d8d8330726eaef7d3110cb7add24d058, as well as a remarkably large number of followup commits, which were mostly concerned with the fact that the implementation didn't work terribly well. It still doesn't: we probably need some rather basic work in the GUC infrastructure if we want to fully support GUCs whose default varies depending on the value of another GUC. Meanwhile, it also emerged that there wasn't really consensus in favor of the definition the patch tried to implement (ie, effective_cache_size should default to 4 times shared_buffers). So whack it all back to where it was. In a followup commit, I'll do what was recently agreed to, which is to simply change the default to a higher value.
* Protect against torn pages when deleting GIN list pages.Heikki Linnakangas2014-05-08
| | | | | | | | | To-be-deleted list pages contain no useful information, as they are being deleted, but we must still protect the writes from being torn by a crash after a partial write. To do that, re-initialize the pages on WAL replay. Jeff Janes caught this with a test program to test partial writes. Backpatch to all supported versions.
* When a background worker exists with code 0, unregister it.Robert Haas2014-05-07
| | | | | | | The previous behavior was to restart immediately, which was generally viewed as less useful. Petr Jelinek, with some adjustments by me.
* When a bgworker exits, always call ReleasePostmasterChildSlot.Robert Haas2014-05-07
| | | | | Commit e2ce9aa27bf20eff2d991d0267a15ea5f7024cd7 was insufficiently well thought out. Repair.
* Restart bgworkers immediately after a crash-and-restart cycle.Robert Haas2014-05-07
| | | | | | | Just as we would start bgworkers immediately after an initial startup of the server, we should restart them immediately when reinitializing. Petr Jelinek and Robert Haas
* Clean up jsonb code.Heikki Linnakangas2014-05-07
| | | | | | | | | | | | | | | | | | | | | | | The main target of this cleanup is the convertJsonb() function, but I also touched a lot of other things that I spotted into in the process. The new convertToJsonb() function uses an output buffer that's resized on demand, so the code to estimate of the size of JsonbValue is removed. The on-disk format was not changed, even though I refactored the structs used to handle it. The term "superheader" is replaced with "container". The jsonb_exists_any and jsonb_exists_all functions no longer sort the input array. That was a premature optimization, the idea being that if there are duplicates in the input array, you only need to check them once. Also, sorting the array saves some effort in the binary search used to find a key within an object. But there were drawbacks too: the sorting and deduplicating obviously isn't free, and in the typical case there are no duplicates to remove, and the gain in the binary search was minimal. Remove all that, which makes the code simpler too. This includes a bug-fix; the total length of the elements in a jsonb array or object mustn't exceed 2^28. That is now checked.
* Detach shared memory from bgworkers without shmem access.Robert Haas2014-05-07
| | | | | | | | Since the postmaster won't perform a crash-and-restart sequence for background workers which don't request shared memory access, we'd better make sure that they can't corrupt shared memory. Patch by me, review by Tom Lane.
* Fix failure to set ActiveSnapshot while rewinding a cursor.Tom Lane2014-05-07
| | | | | | | | | | | | | | ActiveSnapshot needs to be set when we call ExecutorRewind because some plan node types may execute user-defined functions during their ReScan calls (nodeLimit.c does so, at least). The wisdom of that is somewhat debatable, perhaps, but for now the simplest fix is to make sure the required context is valid. Failure to do this typically led to a null-pointer-dereference core dump, though it's possible that in more complex cases a function could be executed with the wrong snapshot leading to very subtle misbehavior. Per report from Leif Jensen. It's been broken for a long time, so back-patch to all active branches.
* Never crash-and-restart for bgworkers without shared memory access.Robert Haas2014-05-07
| | | | | | | | | | | | | | | The motivation for a crash and restart cycle when a backend dies is that it might have corrupted shared memory on the way down; and we can't recover reliably except by reinitializing everything. But that doesn't apply to processes that don't touch shared memory. Currently, there's nothing to prevent a background worker that doesn't request shared memory access from touching shared memory anyway, but that's a separate bug. Previous to this commit, the coding in postmaster.c was inconsistent: an exit status other than 0 or 1 didn't provoke a crash-and-restart, but failure to release the postmaster child slot did. This change makes those cases consistent.
* Fix some more confusion between uint32 and Datum.Tom Lane2014-05-06
|
* hash_any returns Datum, not uint32 (and definitely not "int").Tom Lane2014-05-06
| | | | | | | | | | The coding in JsonbHashScalarValue might have accidentally failed to fail given current representational choices, but the key word there would be "accidental". Insert the appropriate datatype conversion macro. And use the right conversion macro for hash_numeric's result, too. In passing make the code a bit cleaner and less repetitive by factoring out the xor step from the switch.
* Improve comment for tricky aspect of index-only scans.Jeff Davis2014-05-06
| | | | | | | | | Index-only scans avoid taking a lock on the VM buffer, which would cause a lot of contention. To be correct, that requires some intricate assumptions that weren't completely documented in the previous comment. Reviewed by Robert Haas.
* With ecpg exclusion removed, re-run pgindent for 9.4Bruce Momjian2014-05-06
| | | | Report by Tom Lane
* Fix logic bug in dsm_attach().Robert Haas2014-05-06
| | | | | | | The previous coding would potentially cause attaching to segment A to fail if segment B was at the same time in the process of going away. Andres Freund, with a comment tweak by me