aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Various Coverity-spotted fixesStephen Frost2014-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | A number of issues were identified by the Coverity scanner and are addressed in this patch. None of these appear to be security issues and many are mostly cosmetic changes. Short comments for each of the changes follows. Correct the semi-colon placement in be-secure.c regarding SSL retries. Remove a useless comparison-to-NULL in proc.c (value is dereferenced prior to this check and therefore can't be NULL). Add checking of chmod() return values to initdb. Fix a couple minor memory leaks in initdb. Fix memory leak in pg_ctl- involves free'ing the config file contents. Use an int to capture fgetc() return instead of an enum in pg_dump. Fix minor memory leaks in pg_dump. (note minor change to convertOperatorReference()'s API) Check fclose()/remove() return codes in psql. Check fstat(), find_my_exec() return codes in psql. Various ECPG memory leak fixes. Check find_my_exec() return in ECPG. Explicitly ignore pqFlush return in libpq error-path. Change PQfnumber() to avoid doing an strdup() when no changes required. Remove a few useless check-against-NULL's (value deref'd beforehand). Check rmtree(), malloc() results in pg_regress. Also check get_alternative_expectfile() return in pg_regress.
* Allow regex operations to be terminated early by query cancel requests.Tom Lane2014-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | The regex code didn't have any provision for query cancel; which is unsurprising given its non-Postgres origin, but still problematic since some operations can take a long time. Introduce a callback function to check for a pending query cancel or session termination request, and call it in a couple of strategic spots where we can make the regex code exit with an error indicator. If we ever actually split out the regex code as a standalone library, some additional work will be needed to let the cancel callback function be specified externally to the library. But that's straightforward (certainly so by comparison to putting the locale-dependent character classification logic on a similar arms-length basis), and there seems no need to do it right now. A bigger issue is that there may be more places than these two where we need to check for cancels. We can always add more checks later, now that the infrastructure is in place. Since there are known examples of not-terribly-long regexes that can lock up a backend for a long time, back-patch to all supported branches. I have hopes of fixing the known performance problems later, but adding query cancel ability seems like a good idea even if they were all fixed.
* Remove bogus while-loop.Heikki Linnakangas2014-02-28
| | | | | | | | | | Commit abf5c5c9a4f142b3343614746bb9e99a794f8e7b added a bogus while- statement after the for(;;)-loop. It went unnoticed in testing, because it was dead code. Report by KONDO Mitsumasa. Backpatch to 9.3. The commit that introduced this was also applied to 9.2, but not the bogus while-loop part, because the code in 9.2 looks quite different.
* Allow BASE_BACKUP to be throttledAlvaro Herrera2014-02-27
| | | | | | | | | | | | | A new MAX_RATE option allows imposing a limit to the network transfer rate from the server side. This is useful to limit the stress that taking a base backup has on the server. pg_basebackup is now able to specify a value to the server, too. Author: Antonin Houska Patch reviewed by Stefan Radomski, Andres Freund, Zoltán Böszörményi, Fujii Masao, and Álvaro Herrera.
* Fix WAL replay of locking an updated tupleAlvaro Herrera2014-02-27
| | | | | | | | | | | | | | | We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on WAL replay of a tuple-lock operation, which is incorrect when the tuple is already updated. Back-patch to 9.3. The clearing of both header elements was there previously, but since no update could be present on a tuple that was being locked, it was harmless. Bug reported by Peter Geoghegan and Greg Stark in CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=O67w0cSswPvV7XfUcU5g@mail.gmail.com and CAM-w4HPTOeMT4KP0OJK+mGgzgcTOtLRTvFZyvD0O4aH-7dxo3Q@mail.gmail.com respectively; diagnosis by Andres Freund.
* btbuild no longer calls _bt_doinsert(), update comment.Heikki Linnakangas2014-02-26
| | | | Peter Geoghegan
* Fix crash in json_to_record().Jeff Davis2014-02-26
| | | | | | | | | | | | json_to_record() depends on get_call_result_type() for the tuple descriptor of the record that should be returned, but in some cases that cannot be determined. Add a guard to check if the tuple descriptor has been properly resolved, similar to other callers of get_call_result_type(). Also add guard for two other callers of get_call_result_type() in jsonfuncs.c. Although json_to_record() is the only actual bug, it's a good idea to follow convention.
* Use SnapshotDirty rather than an active snapshot to probe index endpoints.Tom Lane2014-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there are lots of uncommitted tuples at the end of the index range, get_actual_variable_range() ends up fetching each one and doing an MVCC visibility check on it, until it finally hits a visible tuple. This is bad enough in isolation, considering that we don't need an exact answer only an approximate one. But because the tuples are not yet committed, each visibility check does a TransactionIdIsInProgress() test, which involves scanning the ProcArray. When multiple sessions do this concurrently, the ensuing contention results in horrid performance loss. 20X overall throughput loss on not-too-complicated queries is easy to demonstrate in the back branches (though someone's made it noticeably less bad in HEAD). We can dodge the problem fairly effectively by using SnapshotDirty rather than a normal MVCC snapshot. This will cause the index probe to take uncommitted tuples as good, so that we incur only one tuple fetch and test even if there are many such tuples. The extent to which this degrades the estimate is debatable: it's possible the result is actually a more accurate prediction than before, if the endmost tuple has become committed by the time we actually execute the query being planned. In any case, it's not very likely that it makes the estimate a lot worse. SnapshotDirty will still reject tuples that are known committed dead, so we won't give bogus answers if an invalid outlier has been deleted but not yet vacuumed from the index. (Because btrees know how to mark such tuples dead in the index, we shouldn't have a big performance problem in the case that there are many of them at the end of the range.) This consideration motivates not using SnapshotAny, which was also considered as a fix. Note: the back branches were using SnapshotNow instead of an MVCC snapshot, but the problem and solution are the same. Per performance complaints from Bartlomiej Romanski, Josh Berkus, and others. Back-patch to 9.0, where the issue was introduced (by commit 40608e7f949fb7e4025c0ddd5be01939adc79eec).
* Update a few comments to mention materialized views.Robert Haas2014-02-25
| | | | Etsuro Fujita
* Show xid and xmin in pg_stat_activity and pg_stat_replication.Robert Haas2014-02-25
| | | | | Christian Kruse, reviewed by Andres Freund and myself, with further minor adjustments by me.
* pg_basebackup: Skip only the *contents* of pg_replslot.Robert Haas2014-02-25
| | | | | | Include the directory itself. Fujii Masao
* Update and clarify ssl_ciphers defaultPeter Eisentraut2014-02-24
| | | | | | | | | | | | - Write HIGH:MEDIUM instead of DEFAULT:!LOW:!EXP for clarity. - Order 3DES last to work around inappropriate OpenSSL default. - Remove !MD5 and @STRENGTH, because they are irrelevant. - Add clarifying documentation. Effectively, the new default is almost the same as the old one, but it is arguably easier to understand and modify. Author: Marko Kreen <markokr@gmail.com>
* Increase work_mem and maintenance_work_mem defaults by 4xBruce Momjian2014-02-24
| | | | New defaults are 4MB and 64MB.
* Allow single-point polygons to be converted to circlesBruce Momjian2014-02-24
| | | | | | | This allows finding the center of a single-point polygon and converting it to a point. Per report from Josef Grahn
* docs: document behavior of CHAR() comparisons with chars < spaceBruce Momjian2014-02-24
| | | | | | | Space trimming rather than space-padding causes unusual behavior, which might not be standards-compliant. Also remove recently-added now-redundant C comment.
* Use pg_lsn data type in pg_stat_replication, too.Robert Haas2014-02-24
| | | | Michael Paquier, per a suggestion from Andres Freund
* Prefer pg_any_to_server/pg_server_to_any over pg_do_encoding_conversion.Tom Lane2014-02-23
| | | | | | | | | | | | | | | | | | | | A large majority of the callers of pg_do_encoding_conversion were specifying the database encoding as either source or target of the conversion, meaning that we can use the less general functions pg_any_to_server/pg_server_to_any instead. The main advantage of using the latter functions is that they can make use of a cached conversion-function lookup in the common case that the other encoding is the current client_encoding. It's notationally cleaner too in most cases, not least because of the historical artifact that the latter functions use "char *" rather than "unsigned char *" in their APIs. Note that pg_any_to_server will apply an encoding verification step in some cases where pg_do_encoding_conversion would have just done nothing. This seems to me to be a good idea at most of these call sites, though it partially negates the performance benefit. Per discussion of bug #9210.
* Plug some more holes in encoding conversion.Tom Lane2014-02-23
| | | | | | | | | | | | | | | | | | | | | | | | Various places assume that pg_do_encoding_conversion() and pg_server_to_any() will ensure encoding validity of their results; but they failed to do so in the case that the source encoding is SQL_ASCII while the destination is not. We cannot perform any actual "conversion" in that scenario, but we should still validate the string according to the destination encoding. Per bug #9210 from Digoal Zhou. Arguably this is a back-patchable bug fix, but on the other hand adding more enforcing of encoding checks might break existing applications that were being sloppy. On balance there doesn't seem to be much enthusiasm for a back-patch, so fix in HEAD only. While at it, remove some apparently-no-longer-needed provisions for letting pg_do_encoding_conversion() "work" outside a transaction --- if you consider it "working" to silently fail to do the requested conversion. Also, make a few cosmetic improvements in mbutils.c, notably removing some Asserts that are certainly dead code since the variables they assert aren't null are never null, even at process start. (I think this wasn't true at one time, but it is now.)
* Do ScalarArrayOp estimation correctly when array is a stable expression.Tom Lane2014-02-21
| | | | | | | | | | | | | | Most estimation functions apply estimate_expression_value to see if they can reduce an expression to a constant; the key difference is that it allows evaluation of stable as well as immutable functions in hopes of ending up with a simple Const node. scalararraysel didn't get the memo though, and neither did gincost_opexpr/gincost_scalararrayopexpr. Fix that, and remove a now-unnecessary estimate_expression_value step in the subsidiary function scalararraysel_containment. Per complaint from Alexey Klyukin. Back-patch to 9.3. The problem goes back further, but I'm hesitant to change estimation behavior in long-stable release branches.
* Improve comment on setting data_checksum GUC.Heikki Linnakangas2014-02-20
| | | | There was an extra space there, and "fixed" wasn't very descriptive.
* Switch various builtin functions to use pg_lsn instead of text.Robert Haas2014-02-19
| | | | | | | | | | | The functions in slotfuncs.c don't exist in any released version, but the changes to xlogfuncs.c represent backward-incompatibilities. Per discussion, we're hoping that the queries using these functions are few enough and simple enough that this won't cause too much breakage for users. Michael Paquier, reviewed by Andres Freund and further modified by me.
* Further code review for pg_lsn data type.Robert Haas2014-02-19
| | | | | | | | | Change input function error messages to be more consistent with what is done elsewhere. Remove a bunch of redundant type casts, so that the compiler will warn us if we screw up. Don't pass LSNs by value on platforms where a Datum is only 32 bytes, per buildfarm. Move macros for packing and unpacking LSNs to pg_lsn.h so that we can include access/xlogdefs.h, to avoid an unsatisfied dependency on XLogRecPtr.
* pg_lsn macro naming and type behavior revisions.Robert Haas2014-02-19
| | | | | Change pg_lsn_mi so that it can return negative values when subtracting LSNs, and clean up some perhaps ill-considered macro names.
* Add a pg_lsn data type, to represent an LSN.Robert Haas2014-02-19
| | | | Robert Haas and Michael Paquier
* Remove broken code that tried to handle OVERLAPS with a single argument.Tom Lane2014-02-18
| | | | | | | | | | | | | | | | | | | | | | | | The SQL standard says that OVERLAPS should have a two-element row constructor on each side. The original coding of OVERLAPS support in our grammar attempted to extend that by allowing a single-element row constructor, which it internally duplicated ... or tried to, anyway. But that code has certainly not worked since our List infrastructure was rewritten in 2004, and I'm none too sure it worked before that. As it stands, it ends up building a List that includes itself, leading to assorted undesirable behaviors later in the parser. Even if it worked as intended, it'd be a bit evil because of the possibility of duplicate evaluation of a volatile function that the user had written only once. Given the lack of documentation, test cases, or complaints, let's just get rid of the idea and only support the standard syntax. While we're at it, improve the error cursor positioning for the wrong-number-of-arguments errors, and inline the makeOverlaps() function since it's only called in one place anyway. Per bug #9227 from Joshua Yanovski. Initial patch by Joshua Yanovski, extended a bit by me.
* Fix comment; checkpointer, not bgwriter, performs checkpoints since 9.2.Heikki Linnakangas2014-02-18
| | | | Amit Langote
* Fix capitalization in README.Robert Haas2014-02-17
| | | | Vik Fearing
* Prevent potential overruns of fixed-size buffers.Tom Lane2014-02-17
| | | | | | | | | | | | | | | | | | | | | | | Coverity identified a number of places in which it couldn't prove that a string being copied into a fixed-size buffer would fit. We believe that most, perhaps all of these are in fact safe, or are copying data that is coming from a trusted source so that any overrun is not really a security issue. Nonetheless it seems prudent to forestall any risk by using strlcpy() and similar functions. Fixes by Peter Eisentraut and Jozef Mlich based on Coverity reports. In addition, fix a potential null-pointer-dereference crash in contrib/chkpass. The crypt(3) function is defined to return NULL on failure, but chkpass.c didn't check for that before using the result. The main practical case in which this could be an issue is if libc is configured to refuse to execute unapproved hashing algorithms (e.g., "FIPS mode"). This ideally should've been a separate commit, but since it touches code adjacent to one of the buffer overrun changes, I included it in this commit to avoid last-minute merge issues. This issue was reported by Honza Horak. Security: CVE-2014-0065 for buffer overruns, CVE-2014-0066 for crypt()
* Predict integer overflow to avoid buffer overruns.Noah Misch2014-02-17
| | | | | | | | | | | | | | | | | Several functions, mostly type input functions, calculated an allocation size such that the calculation wrapped to a small positive value when arguments implied a sufficiently-large requirement. Writes past the end of the inadvertent small allocation followed shortly thereafter. Coverity identified the path_in() vulnerability; code inspection led to the rest. In passing, add check_stack_depth() to prevent stack overflow in related functions. Back-patch to 8.4 (all supported versions). The non-comment hstore changes touch code that did not exist in 8.4, so that part stops at 9.0. Noah Misch and Heikki Linnakangas, reviewed by Tom Lane. Security: CVE-2014-0064
* Avoid repeated name lookups during table and index DDL.Robert Haas2014-02-17
| | | | | | | | | | | | | | | | | | | | | | If the name lookups come to different conclusions due to concurrent activity, we might perform some parts of the DDL on a different table than other parts. At least in the case of CREATE INDEX, this can be used to cause the permissions checks to be performed against a different table than the index creation, allowing for a privilege escalation attack. This changes the calling convention for DefineIndex, CreateTrigger, transformIndexStmt, transformAlterTableStmt, CheckIndexCompatible (in 9.2 and newer), and AlterTable (in 9.1 and older). In addition, CheckRelationOwnership is removed in 9.2 and newer and the calling convention is changed in older branches. A field has also been added to the Constraint node (FkConstraint in 8.4). Third-party code calling these functions or using the Constraint node will require updating. Report by Andres Freund. Patch by Robert Haas and Andres Freund, reviewed by Tom Lane. Security: CVE-2014-0062
* Prevent privilege escalation in explicit calls to PL validators.Noah Misch2014-02-17
| | | | | | | | | | | | | | The primary role of PL validators is to be called implicitly during CREATE FUNCTION, but they are also normal functions that a user can call explicitly. Add a permissions check to each validator to ensure that a user cannot use explicit validator calls to achieve things he could not otherwise achieve. Back-patch to 8.4 (all supported versions). Non-core procedural language extensions ought to make the same two-line change to their own validators. Andres Freund, reviewed by Tom Lane and Noah Misch. Security: CVE-2014-0061
* Shore up ADMIN OPTION restrictions.Noah Misch2014-02-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Granting a role without ADMIN OPTION is supposed to prevent the grantee from adding or removing members from the granted role. Issuing SET ROLE before the GRANT bypassed that, because the role itself had an implicit right to add or remove members. Plug that hole by recognizing that implicit right only when the session user matches the current role. Additionally, do not recognize it during a security-restricted operation or during execution of a SECURITY DEFINER function. The restriction on SECURITY DEFINER is not security-critical. However, it seems best for a user testing his own SECURITY DEFINER function to see the same behavior others will see. Back-patch to 8.4 (all supported versions). The SQL standards do not conflate roles and users as PostgreSQL does; only SQL roles have members, and only SQL users initiate sessions. An application using PostgreSQL users and roles as SQL users and roles will never attempt to grant membership in the role that is the session user, so the implicit right to add or remove members will never arise. The security impact was mostly that a role member could revoke access from others, contrary to the wishes of his own grantor. Unapproved role member additions are less notable, because the member can still largely achieve that by creating a view or a SECURITY DEFINER function. Reviewed by Andres Freund and Tom Lane. Reported, independently, by Jonas Sundman and Noah Misch. Security: CVE-2014-0060
* Fix unportable coding in BackgroundWorkerStateChange().Tom Lane2014-02-15
| | | | | PIDs aren't necessarily ints; our usual practice for printing them is to explicitly cast to long. Per buildfarm member rover_firefly.
* Fix unportable coding in DetermineSleepTime().Tom Lane2014-02-15
| | | | | | | | | We should not assume that struct timeval.tv_sec is a long, because it ain't necessarily. (POSIX says that it's a time_t, which might well be 64 bits now or in the future; or for that matter might be 32 bits on machines with 64-bit longs.) Per buildfarm member panther. Back-patch to 9.3 where the dubious coding was introduced.
* Centralize getopt-related declarations in a new header file pg_getopt.h.Tom Lane2014-02-15
| | | | | | | | | | | | We used to have externs for getopt() and its API variables scattered all over the place. Now that we find we're going to need to tweak the variable declarations for Cygwin, it seems like a good idea to have just one place to tweak. In this commit, the variables are declared "#ifndef HAVE_GETOPT_H". That may or may not work everywhere, but we'll soon find out. Andres Freund
* Change the order that pg_xlog and WAL archive are polled for WAL segments.Heikki Linnakangas2014-02-14
| | | | | | | | | | | | | | | | | | | If there is a WAL segment with same ID but different TLI present in both the WAL archive and pg_xlog, prefer the one with higher TLI. Before this patch, the archive was polled first, for all expected TLIs, and only if no file was found was pg_xlog scanned. This was a change in behavior from 9.3, which first scanned archive and pg_xlog for the highest TLI, then archive and pg_xlog for the next highest TLI and so forth. This patch reverts the behavior back to what it was in 9.2. The reason for this is that if for example you try to do archive recovery to timeline 2, which branched off timeline 1, but the WAL for timeline 2 is not archived yet, we would replay past the timeline switch point on timeline 1 using the archived files, before even looking timeline 2's files in pg_xlog Report and patch by Kyotaro Horiguchi. Backpatch to 9.3 where the behavior was changed.
* Add C comment about problems with CHAR() space trimmingBruce Momjian2014-02-13
|
* Separate multixact freezing parameters from xid'sAlvaro Herrera2014-02-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we were piggybacking on transaction ID parameters to freeze multixacts; but since there isn't necessarily any relationship between rates of Xid and multixact consumption, this turns out not to be a good idea. Therefore, we now have multixact-specific freezing parameters: vacuum_multixact_freeze_min_age: when to remove multis as we come across them in vacuum (default to 5 million, i.e. early in comparison to Xid's default of 50 million) vacuum_multixact_freeze_table_age: when to force whole-table scans instead of scanning only the pages marked as not all visible in visibility map (default to 150 million, same as for Xids). Whichever of both which reaches the 150 million mark earlier will cause a whole-table scan. autovacuum_multixact_freeze_max_age: when for cause emergency, uninterruptible whole-table scans (default to 400 million, double as that for Xids). This means there shouldn't be more frequent emergency vacuuming than previously, unless multixacts are being used very rapidly. Backpatch to 9.3 where multixacts were made to persist enough to require freezing. To avoid an ABI break in 9.3, VacuumStmt has a couple of fields in an unnatural place, and StdRdOptions is split in two so that the newly added fields can go at the end. Patch by me, reviewed by Robert Haas, with additional input from Andres Freund and Tom Lane.
* Fix length checking for Unicode identifiers containing escapes (U&"...").Tom Lane2014-02-13
| | | | | | | | | | | We used the length of the input string, not the de-escaped string, as the trigger for NAMEDATALEN truncation. AFAICS this would only result in sometimes printing a phony truncation warning; but it's just luck that there was no worse problem, since we were violating the API spec for truncate_identifier(). Per bug #9204 from Joshua Yanovski. This has been wrong since the Unicode-identifier support was added, so back-patch to all supported branches.
* Rename 'gmake' to 'make' in docs and recommended commandsBruce Momjian2014-02-12
| | | | This simplifies the docs and makes it easier to cut/paste command lines.
* In XLogReadBufferExtended, don't assume P_NEW yields consecutive pages.Tom Lane2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | In a database that's not yet reached consistency, it's possible that some segments of a relation are not full-size but are not the last ones either. Because of the way smgrnblocks() works, asking for a new page with P_NEW will fill in the last not-full-size segment --- and if that makes it full size, the apparent EOF of the relation will increase by more than one page, so that the next P_NEW request will yield a page past the next consecutive one. This breaks the relation-extension logic in XLogReadBufferExtended, possibly allowing a page update to be applied to some page far past where it was intended to go. This appears to be the explanation for reports of table bloat on replication slaves compared to their masters, and probably explains some corrupted-slave reports as well. Fix the loop to check the page number it actually got, rather than merely Assert()'ing that dead reckoning got it to the desired place. AFAICT, there are no other places that make assumptions about exactly which page they'll get from P_NEW. Problem identified by Greg Stark, though this is not the same as his proposed patch. It's been like this for a long time, so back-patch to all supported branches.
* Get rid of use of dlltool in Mingw builds.Tom Lane2014-02-11
| | | | | | We are almost completely out of the dlltool game, if this works. Hiroshi Inoue
* Cygwin build fixes.Tom Lane2014-02-11
| | | | | | | | | | | | | | Get rid of use of dlltool for linking the main postgres executable. dlltool is obsolete and we'd prefer to stop depending on it. Also, include $(LDAP_LIBS_FE) in $(libpq_pgport). (It's not clear that this is really needed, or why it's not a linker bug if it is needed. But reports are that it's needed on current Cygwin.) We might want to back-patch this if it works, but first let's see what the buildfarm thinks. Marco Atzeri
* Fix WakeupWaiters() to not wake up an exclusive locker unnecessarily.Heikki Linnakangas2014-02-10
| | | | | | | | WakeupWaiters() is supposed to wake up all LW_WAIT_UNTIL_FREE waiters of the slot, but the loop incorrectly also woke up the first LW_EXCLUSIVE waiter, if there was no LW_WAIT_UNTIL_FREE waiters in the queue. Noted by Andres Freund. This code is new in 9.4, so no backpatching.
* Use memmove() instead of memcpy() for copying overlapping regions.Heikki Linnakangas2014-02-10
| | | | | In commit d2495f272cd164ff075bee5c4ce95aed11338a36, I fixed this bug in to_tsquery(), but missed the fact that plainto_tsquery() has the same bug.
* Mark some more variables as static or include the appropriate headerPeter Eisentraut2014-02-08
| | | | | | Detected by clang's -Wmissing-variable-declarations. From: Andres Freund <andres@anarazel.de>
* Initialize the entryRes array between each call to triConsistent.Heikki Linnakangas2014-02-07
| | | | | | | | | | | | | | | | | | | | | | The shimTriConstistentFn, which calls the opclass's consistent function with all combinations of TRUE/FALSE for any MAYBE argument, modifies the entryRes array passed by the caller. Change startScanKey to re-initialize it between each call to accommodate that. It's actually a bad habit by shimTriConsistentFn to modify its argument. But the only caller that doesn't already re-initialize the entryRes array was startScanKey, and it's easy for startScanKey to do so. Add a comment to shimTriConsistentFn about that. Note: this does not give a free pass to opclass-provided consistent functions to modify the entryRes argument; shimTriConsistent assumes that they don't, even though it does it itself. While at it, refactor startScanKey to allocate the requiredEntries and additionalEntries after it knows exactly how large they need to be. Saves a little bit of memory, and looks nicer anyway. Per complaint by Tom Lane, buildfarm and the pg_trgm regression test.
* Speed up "rare & frequent" type GIN queries.Heikki Linnakangas2014-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | If you have a GIN query like "rare & frequent", we currently fetch all the items that match either rare or frequent, call the consistent function for each item, and let the consistent function filter out items that only match one of the terms. However, if we can deduce that "rare" must be present for the overall qual to be true, we can scan all the rare items, and for each rare item, skip over to the next frequent item with the same or greater TID. That greatly speeds up "rare & frequent" type queries. To implement that, introduce the concept of a tri-state consistent function, where the 3rd value is MAYBE, indicating that we don't know if that term is present. Operator classes only provide a boolean consistent function, so we simulate the tri-state consistent function by calling the boolean function several times, with the MAYBE arguments set to all combinations of TRUE and FALSE. Testing all combinations is only feasible for a small number of MAYBE arguments, but it is envisioned that we'll provide a way for operator classes to provide a native tri-state consistent function, which can be much more efficient. But that is not included in this patch. We were already using that trick to for lossy pages, calling the consistent function with the lossy entry set to TRUE and FALSE. Now that we have the tri-state consistent function, use it for lossy pages too. Alexander Korotkov, with fair amount of refactoring by me.
* Fix thinko in comment.Heikki Linnakangas2014-02-07
| | | | Amit Langote
* In RelationClearRelation, postpone cache reload if !IsTransactionState().Tom Lane2014-02-06
| | | | | | | | | | | | | | | | | | We may process relcache flush requests during transaction startup or shutdown. In general it's not terribly safe to do catalog access at those times, so the code's habit of trying to immediately revalidate unflushable relcache entries is risky. Although there are no field trouble reports that are positively traceable to this, we have been able to demonstrate failure of the assertions recently added in RelationIdGetRelation() and SearchCatCache(). On the other hand, it seems safe to just postpone revalidation of the cache entry until we're inside a valid transaction. The one case where this is questionable is where we're exiting a subtransaction and the outer transaction is holding the relcache entry open --- but if we made any significant changes to the rel inside such a subtransaction, we've got problems anyway. There are mechanisms in place to prevent that (to wit, locks for cross-session cases and CheckTableNotInUse() for intra-session cases), so let's trust to those mechanisms to keep us out of trouble.