aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
* Clean up password authentication code a bit.Heikki Linnakangas2016-12-08
| | | | | | | | | | | | | | | Commit fe0a0b59, which moved code to do MD5 authentication to a separate CheckMD5Auth() function, left behind a comment that really belongs inside the function, too. Also move the check for db_user_namespace inside the function, seems clearer that way. Now that the md5 salt is passed as argument to md5_crypt_verify, it's a bit silly that it peeks into the Port struct to see if MD5 authentication was used. Seems more straightforward to treat it as an MD5 authentication, if the md5 salt argument is given. And after that, md5_crypt_verify only used the Port argument to look at port->user_name, but that is redundant, because it is also passed as a separate 'role' argument. So remove the Port argument altogether.
* Fix accounting of memory needed for merge heap.Heikki Linnakangas2016-12-08
| | | | | | | | | | | | | | We allegedly allocated all remaining memory for the read buffers of the sort tapes, but we allocated the merge heap only after that. That means that the allocation of the merge heap was guaranteed to go over the memory limit. Fix by allocating the merge heap first. This makes little difference in practice, because the merge heap is tiny, but let's tidy. While we're at it, add a safeguard for the case that we are already over the limit when allocating the read buffers. That shouldn't happen, but better safe than sorry. The memory accounting error was reported off-list by Peter Geoghegan.
* Replace references to COLLATE "en_CA" with COLLATE "POSIX".Robert Haas2016-12-07
| | | | | Another attmempt to fix the tests which were added by commit f0e44751d7175fa3394da2c8f85e3ceb3cdbfe63.
* Replace references to COLLATE "en_US" with COLLATE "C".Robert Haas2016-12-07
| | | | | Commit f0e44751d7175fa3394da2c8f85e3ceb3cdbfe63 is turning the buildfarm red; let's try something hopefully more portable.
* Implement table partitioning.Robert Haas2016-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Table partitioning is like table inheritance and reuses much of the existing infrastructure, but there are some important differences. The parent is called a partitioned table and is always empty; it may not have indexes or non-inherited constraints, since those make no sense for a relation with no data of its own. The children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint. Multiple inheritance is not allowed, and partitioning and inheritance can't be mixed. Partitions can't have extra columns and may not allow nulls unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so tuple-routing ON INSERT triggers are not needed. Tuple routing isn't yet supported for partitions which are foreign tables, and it doesn't handle updates that cross partition boundaries. Currently, tables can be range-partitioned or list-partitioned. List partitioning is limited to a single column, but range partitioning can involve multiple columns. A partitioning "column" can be an expression. Because table partitioning is less general than table inheritance, it is hoped that it will be easier to reason about properties of partitions, and therefore that this will serve as a better foundation for a variety of possible optimizations, including query planner optimizations. The tuple routing based which this patch does based on the implicit partitioning constraints is an example of this, but it seems likely that many other useful optimizations are also possible. Amit Langote, reviewed and tested by Robert Haas, Ashutosh Bapat, Amit Kapila, Rajkumar Raghuwanshi, Corey Huinker, Jaime Casanova, Rushabh Lathia, Erik Rijkers, among others. Minor revisions by me.
* Restore psql's SIGPIPE setting if popen() fails.Tom Lane2016-12-07
| | | | | | Ancient oversight in PageOutput(): if popen() fails, we'd better reset the SIGPIPE handler before returning stdout, because ClosePager() won't. Noticed while fixing the empty-PAGER issue.
* Handle empty or all-blank PAGER setting more sanely in psql.Tom Lane2016-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the PAGER environment variable is set but contains an empty string, psql would pass it to "sh" which would silently exit, causing whatever query output we were printing to vanish entirely. This is quite mystifying; it took a long time for us to figure out that this was the cause of Joseph Brenner's trouble report. Rather than allowing that to happen, we should treat this as another way to specify "no pager". (We could alternatively treat it as selecting the default pager, but it seems more likely that the former is what the user meant to achieve by setting PAGER this way.) Nonempty, but all-white-space, PAGER values have the same behavior, and it's pretty easy to test for that, so let's handle that case the same way. Most other cases of faulty PAGER values will result in the shell printing some kind of complaint to stderr, which should be enough to diagnose the problem, so we don't need to work harder than this. (Note that there's been an intentional decision not to be very chatty about apparent failure returns from the pager process, since that may happen if, eg, the user quits the pager with control-C or some such. I'd just as soon not start splitting hairs about which exit codes might merit making our own report.) libpq's old PQprint() function was already on board with ignoring empty PAGER values, but for consistency, make it ignore all-white-space values as well. It's been like this a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAFfgvXWLOE2novHzYjmQK8-J6TmHz42G8f3X0SORM44+stUGmw@mail.gmail.com
* Fix query cancellation.Heikki Linnakangas2016-12-07
| | | | | | | | | In commit fe0a0b59, the datatype used for MyCancelKey and other variables that store cancel keys were changed from long to uint32, but I missed this one. That broke query cancellation on platforms where long is wider than 32 bits. Report by Andres Freund, fix by Michael Paquier.
* Fix whitespace.Heikki Linnakangas2016-12-07
| | | | Thomas Munro
* Silence compiler warningsStephen Frost2016-12-06
| | | | | | | | | | | | | | | | Rearrange a bit of code to ensure that 'mode' in LWLockRelease is obviously always set, which seems a bit cleaner and avoids a compiler warning (thanks to Robert for the suggestion!). In GetCachedPlan(), initialize 'plan' to silence a compiler warning, but also add an Assert() to make sure we don't ever actually fall through with 'plan' still being set to NULL, since we are about to dereference it. Neither of these appear to be live bugs but at least gcc 5.4.0-6ubuntu1~16.04.4 doesn't quite have the smarts to realize that. Discussion: https://www.postgresql.org/message-id/20161129152102.GR13284%40tamriel.snowman.net
* Fix unsafe assumption that struct timeval.tv_sec is a "long".Tom Lane2016-12-06
| | | | | | | | | | | | It typically is a "long", but it seems possible that on some platforms it wouldn't be. In any case, this silences a compiler warning on OpenBSD (cf buildfarm member curculio). While at it, use snprintf not sprintf. This format string couldn't possibly overrun the supplied buffer, but that doesn't seem like a good reason not to use the safer style. Oversight in commit f828654e1. Back-patch to 9.6 where that came in.
* Fix interaction of parallel query with prepared statements.Robert Haas2016-12-06
| | | | | | | | | | | | | | | | | Previously, a prepared statement created via a Parse message could get a parallel plan, but one created with a PREPARE statement could not. This state of affairs was due to confusion on my (rhaas) part: I erroneously believed that a CREATE TABLE .. AS EXECUTE statement could only be performed with a prepared statement by PREPARE, but in fact one created by a Prepare message works just as well. Therefore, it makes no sense to allow parallel query in one case but not the other. To fix, allow parallel query with all prepared statements, but run the parallel plan serially (i.e. without workers) in the case of CREATE TABLE .. AS EXECUTE. Also, document this. Amit Kapila and Tobias Bussman, plus an extra sentence of documentation by me.
* Bump catversion for restrictive RLS changesStephen Frost2016-12-06
| | | | | | Mea culpa. Pointed out by Andres.
* Remove extraneous semicolon from uses of relptr_declare().Tom Lane2016-12-05
| | | | | | | | | | | | If we're going to write a semicolon after calls of relptr_declare(), then we don't need one inside the macro, and removing it suppresses "empty declaration" warnings from pickier compilers (eg pademelon). While at it, we might as well use relptr() inside relptr_declare(), because otherwise that macro would likely go unused altogether. Also improve the comment, which I for one found unclear, and provide a specific example of intended usage.
* Ensure gatherstate->nextreader is properly initialized.Robert Haas2016-12-05
| | | | | | | | | The previously code worked OK as long as a Gather node was never rescanned, or if it was rescanned, as long as it got at least as many workers on rescan as it had originally. But if the number of workers ever decreased on a rescan, then it could crash. Andreas Seltenreich
* Add support for restrictive RLS policiesStephen Frost2016-12-05
| | | | | | | | | | | | | | | | We have had support for restrictive RLS policies since 9.5, but they were only available through extensions which use the appropriate hooks. This adds support into the grammer, catalog, psql and pg_dump for restrictive RLS policies, thus reducing the cases where an extension is necessary. In passing, also move away from using "AND"d and "OR"d in comments. As pointed out by Alvaro, it's not really appropriate to attempt to make verbs out of "AND" and "OR", so reword those comments which attempted to. Reviewed By: Jeevan Chalke, Dean Rasheed Discussion: https://postgr.es/m/20160901063404.GY4028@tamriel.snowman.net
* dsa: Cope with the possibility that SIZE_MAX is not defined.Robert Haas2016-12-05
| | | | Per buildfarm member gaur and Tom Lane.
* libpq: Fix another bug in 721f7bd3cbccaf8c07cad2707826b83f84694832.Robert Haas2016-12-05
| | | | | | | | If we failed to connect to one or more hosts, and then afterwards we find one that fails to be read-write, the latter error message was clobbering any earlier ones. Repair. Mithun Cy, slightly revised by me.
* Fix race introduced by 6d46f4783efe457f74816a75173eb23ed8930020.Robert Haas2016-12-05
| | | | | | | | | It's possible for the metapage contents to change after we release the lock, so we must read them before releasing the lock. Amit Kapila. Submitted in response to a trouble report from Andreas Seltenreich, though it is not certain this fixes the problem.
* Reduce the default for max_worker_processes back to 8.Robert Haas2016-12-05
| | | | | | | | Commit b460f5d6693103076dc554aa7cbb96e1e53074f9 -- at my suggestion -- increased the default value of max_worker_processes from 8 to 16, on the theory that this would be harmless and convenient for users. Unfortunately, this caused some buildfarm machines with low connection limits to start failing, so apparently it's not harmless after all.
* Fix more DSA problems uncovered by the buildfarm.Robert Haas2016-12-05
| | | | | | | | | | On 32-bit systems, don't try to use 64-bit DSA pointers, because the computation of DSA_MAX_SEGMENT_SIZE overflows Size. Cast 1 to Size before shifting it, so that the compiler doesn't produce a result of the wrong width. In passing, change one use of size_t to Size.
* Try to fix some DSA-related compiler warnings.Robert Haas2016-12-05
| | | | | | | | | Commit 13df76a537cca3b8884911d8fdf7c89a457a8dd3 was overconfident about how portable %016lx is. Some compilers complain because they need %016llx, while platforms where DSA pointers are only 32 bits get unhappy about using a 64-bit format for a 32-bit quantity. Thomas Munro, per an off-list suggestion from me.
* Replace PostmasterRandom() with a stronger source, second attempt.Heikki Linnakangas2016-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new routine, pg_strong_random() for generating random bytes, for use in both frontend and backend. At the moment, it's only used in the backend, but the upcoming SCRAM authentication patches need strong random numbers in libpq as well. pg_strong_random() is based on, and replaces, the existing implementation in pgcrypto. It can acquire strong random numbers from a number of sources, depending on what's available: - OpenSSL RAND_bytes(), if built with OpenSSL - On Windows, the native cryptographic functions are used - /dev/urandom Unlike the current pgcrypto function, the source is chosen by configure. That makes it easier to test different implementations, and ensures that we don't accidentally fall back to a less secure implementation, if the primary source fails. All of those methods are quite reliable, it would be pretty surprising for them to fail, so we'd rather find out by failing hard. If no strong random source is available, we fall back to using erand48(), seeded from current timestamp, like PostmasterRandom() was. That isn't cryptographically secure, but allows us to still work on platforms that don't have any of the above stronger sources. Because it's not very secure, the built-in implementation is only used if explicitly requested with --disable-strong-random. This replaces the more complicated Fortuna algorithm we used to have in pgcrypto, which is unfortunate, but all modern platforms have /dev/urandom, so it doesn't seem worth the maintenance effort to keep that. pgcrypto functions that require strong random numbers will be disabled with --disable-strong-random. Original patch by Magnus Hagander, tons of further work by Michael Paquier and me. Discussion: https://www.postgresql.org/message-id/CAB7nPqRy3krN8quR9XujMVVHYtXJ0_60nqgVc6oUk8ygyVkZsA@mail.gmail.com Discussion: https://www.postgresql.org/message-id/CAB7nPqRWkNYRRPJA7-cF+LfroYV10pvjdz6GNvxk-Eee9FypKA@mail.gmail.com
* Fix incorrect output from gin_desc().Fujii Masao2016-12-05
| | | | | | | | | | | | | | | | | | | | | Previously gin_desc() displayed incorrect output "unknown action 0" for XLOG_GIN_INSERT and XLOG_GIN_VACUUM_DATA_LEAF_PAGE records with valid actions. The cause of this problem was that gin_desc() wrongly used XLogRecGetData() to extract data from those records. Since they were registered by XLogRegisterBufData(), gin_desc() should have used XLogRecGetBlockData(), instead, like gin_redo(). Also there were other differences about how to treat XLOG_GIN_INSERT record between gin_desc() and gin_redo(). This commit fixes gin_desc() routine so that it treats those records in the same way as gin_redo(). Batch-patch to 9.5 where WAL record format was revamped and XLogRegisterBufData() was added. Reported-By: Andres Freund Reviewed-By: Tom Lane Discussion: <20160509194645.7lewnpw647zegx2m@alap3.anarazel.de>
* Don't mess up pstate->p_next_resno in transformOnConflictClause().Tom Lane2016-12-04
| | | | | | | | | | | | | | | | | | | | | | | | | transformOnConflictClause incremented p_next_resno while generating the phony targetlist for the EXCLUDED pseudo-rel. Then that field got incremented some more during transformTargetList, possibly leading to free_parsestate concluding that we'd overrun the allowed length of a tlist, as reported by Justin Pryzby. We could fix this by resetting p_next_resno to 1 after using it for the EXCLUDED pseudo-rel tlist, but it seems easier and less coupled to other places if we just don't use that field at all in this loop. (Note that this doesn't change anything about the resnos that end up appearing in the main target list, because those are all replaced with target-column numbers by updateTargetListEntry.) In passing, fix incorrect type OID assigned to the whole-row Var for "EXCLUDED.*" (somehow this escaped having any bad consequences so far, but it's certainly wrong); remove useless assignment to var->location; pstrdup the column names in case of a relcache flush; and improve nearby comments. Back-patch to 9.5 where ON CONFLICT was introduced. Report: https://postgr.es/m/20161204163237.GA8030@telsasoft.com
* Document recipe for testing compatibility with old Perl.Noah Misch2016-12-04
| | | | Craig Ringer, reviewed by Kyotaro HORIGUCHI and Michael Paquier.
* Make pgwin32_putenv() probe every known CRT, regardless of compiler.Noah Misch2016-12-04
| | | | | | | This extends to MinGW builds the provision for MSVC-built libraries to see putenv() effects. Doing so repairs, for example, the handling of the krb_server_keyfile parameter when linked with MSVC-built MIT Kerberos. Like the previous commit, no back-patch.
* Make pgwin32_putenv() follow DLL loading and unloading.Noah Misch2016-12-03
| | | | | | | | | | | | | | | | | | | | | | | Until now, the first putenv() call of a given postgres.exe process would cache the set of loaded CRTs. If a CRT unloaded after that call, the next putenv() would crash. That risk was largely theoretical, because the first putenv() precedes all PostgreSQL-initiated module loading. However, this might explain bad interactions with antivirus and other software that injects threads asynchronously. If an additional CRT loaded after the first putenv(), pgwin32_putenv() would not discover it. That CRT would have all environment changes predating its load, but it would not receive later PostgreSQL-initiated changes. An additional CRT loading concurrently with the first putenv() might miss that change in addition to missing later changes. Fix all those problems. This removes the cache mechanism from pgwin32_putenv(); the cost, less than 100 μs per backend startup, is negligible. No resulting misbehavior was known to be user-visible given the core distribution alone, but one can readily construct an affected extension module. No back-patch given the lack of complaints and the potential for behavior changes in non-PostgreSQL code running in the backend. Christian Ullrich, reviewed by Michael Paquier.
* Make pgwin32_putenv() visit debug CRTs.Noah Misch2016-12-03
| | | | | | | | | | This has no effect in the most conventional case, where no relevant DLL uses a debug build. For an example where it does matter, given a debug build of MIT Kerberos, the krb_server_keyfile parameter usually had no effect. Since nobody wants a Heisenbug, back-patch to 9.2 (all supported versions). Christian Ullrich, reviewed by Michael Paquier.
* Remove wrong CloseHandle() call.Noah Misch2016-12-03
| | | | | | | | | | | In accordance with its own documentation, invoke CloseHandle() only when directed in the documentation for the function that furnished the handle. GetModuleHandle() does not so direct. We have been issuing this call only in the rare event that a CRT DLL contains no "_putenv" symbol, so lack of bug reports is uninformative. Back-patch to 9.2 (all supported versions). Christian Ullrich, reviewed by Michael Paquier.
* Refine win32env.c cosmetics.Noah Misch2016-12-03
| | | | | | | | | Replace use of plain 0 as a null pointer constant. In comments, update terminology and lessen redundancy. Back-patch to 9.2 (all supported versions) for the convenience of back-patching the next two commits. Christian Ullrich and Noah Misch, reviewed (in earlier versions) by Michael Paquier.
* Fix broken wait-for-previous-process-to-exit loop in regression test.Tom Lane2016-12-02
| | | | | | Must do pg_stat_clear_snapshot() inside test's loop, or our snapshot of pg_stat_activity will never change :-(. Thinko in b3427dade -- evidently my workstation never really iterated the loop in testing. Per buildfarm.
* Fix thinko in b3427dade14cc31eb48740bc9ea98b5954470b24.Robert Haas2016-12-02
|
* Delete deleteWhatDependsOn() in favor of more performDeletion() flag bits.Tom Lane2016-12-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | deleteWhatDependsOn() had grown an uncomfortably large number of assumptions about what it's used for. There are actually only two minor differences between what it does and what a regular performDeletion() call can do, so let's invent additional bits in performDeletion's existing flags argument that specify those behaviors, and get rid of deleteWhatDependsOn() as such. (We'd probably have done it this way from the start, except that performDeletion didn't originally have a flags argument, IIRC.) Also, add a SKIP_EXTENSIONS flag bit that prevents ever recursing to an extension, and use that when dropping temporary objects at session end. This provides a more general solution to the problem addressed in a hacky way in commit 08dd23cec: if an extension script creates temp objects and forgets to remove them again, the whole extension went away when its contained temp objects were deleted. The previous solution only covered temp relations, but this solves it for all object types. These changes require minor additions in dependency.c to pass the flags to subroutines that previously didn't get them, but it's still a net savings of code, and it seems cleaner than before. Having done this, revert the special-case code added in 08dd23cec that prevented addition of pg_depend records for temp table extension membership, because that caused its own oddities: dropping an extension that had created such a table didn't automatically remove the table, leading to a failure if the table had another dependency on the extension (such as use of an extension data type), or to a duplicate-name failure if you then tried to recreate the extension. But we keep the part that prevents the pg_temp_nnn schema from becoming an extension member; we never want that to happen. Add a regression test case covering these behaviors. Although this fixes some arguable bugs, we've heard few field complaints, and any such problems are easily worked around by explicitly dropping temp objects at the end of extension scripts (which seems like good practice anyway). So I won't risk a back-patch. Discussion: https://postgr.es/m/e51f4311-f483-4dd0-1ccc-abec3c405110@BlueTreble.com
* Introduce dynamic shared memory areas.Robert Haas2016-12-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | Programmers discovered decades ago that it was useful to have a simple interface for allocating and freeing memory, which is why malloc() and free() were invented. Unfortunately, those handy tools don't work with dynamic shared memory segments because those are specific to PostgreSQL and are not necessarily mapped at the same address in every cooperating process. So invent our own allocator instead. This makes it possible for processes cooperating as part of parallel query execution to allocate and free chunks of memory without having to reserve them prior to the start of execution. It could also be used for longer lived objects; for example, we could consider storing data for pg_stat_statements or the stats collector in shared memory using these interfaces, rather than writing them to files. Basically, anything that needs shared memory but can't predict in advance how much it's going to need might find this useful. Thomas Munro and Robert Haas. The original code (of mine) on which Thomas based his work was actually designed to be a new backend-local memory allocator for PostgreSQL, but that hasn't gone anywhere - or not yet, anyway. Thomas took that work and performed major refactoring and extensive modifications to make it work with dynamic shared memory, including the addition of appropriate locking. Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
* Management of free memory pages.Robert Haas2016-12-02
| | | | | | | | | | | | | | | | | This is intended as infrastructure for a full-fledged allocator for dynamic shared memory. The interface looks a bit like a real allocator, but only supports allocating and freeing memory in multiples of the 4kB page size. Further, to free memory, you must know the size of the span you wish to free, in pages. While these are make it unsuitable as an allocator in and of itself, it still serves as very useful scaffolding for a full-fledged allocator. Robert Haas and Thomas Munro. This code is mostly the same as my 2014 submission, but Thomas fixed quite a few bugs and made some changes to the interface. Discussion: CA+TgmobkeWptGwiNa+SGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A@mail.gmail.com Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
* Add a crude facility for dealing with relative pointers.Robert Haas2016-12-02
| | | | | | | | | | | | | | | | | | C doesn't have any sort of built-in understanding of a pointer relative to some arbitrary base address, but dynamic shared memory segments can be mapped at different addresses in different processes, so any sort of shared data structure stored within a dynamic shared memory segment can't use absolute pointers. We could use something like Size to represent a relative pointer, but then the compiler provides no type-checking. Use stupid macro tricks to get some type-checking. Patch originally by me. Concept suggested by Andres Freund. Recently resubmitted as part of Thomas Munro's work on dynamic shared memory allocation. Discussion: 20131205144434.GG12398@alap2.anarazel.de Discussion: CAEepm=1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA@mail.gmail.com
* Fix outdated commentsAlvaro Herrera2016-12-02
| | | | | | | Commit 597a87ccc9a6f neglected to update some comments; fix. Report and patch by Thomas Munro. Reviewed by Petr Jelínek.
* Add max_parallel_workers GUC.Robert Haas2016-12-02
| | | | | | | | | | Increase the default value of the existing max_worker_processes GUC from 8 to 16, and add a new max_parallel_workers GUC with a maximum of 8. This way, even if the maximum amount of parallel query is happening, there is still room for background workers that do other things, as originally envisioned when max_worker_processes was added. Julien Rouhaud, reviewed by Amit Kapila and by revised by me.
* Fix Windows build for 78c8c814390fAlvaro Herrera2016-12-02
| | | | Author: Petr Jelínek
* Permit dump/reload of not-too-large >1GB tuplesAlvaro Herrera2016-12-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our documentation states that our maximum field size is 1 GB, and that our maximum row size of 1.6 TB. However, while this might be attainable in theory with enough contortions, it is not workable in practice; for starters, pg_dump fails to dump tables containing rows larger than 1 GB, even if individual columns are well below the limit; and even if one does manage to manufacture a dump file containing a row that large, the server refuses to load it anyway. This commit enables dumping and reloading of such tuples, provided two conditions are met: 1. no single column is larger than 1 GB (in output size -- for bytea this includes the formatting overhead) 2. the whole row is not larger than 2 GB There are three related changes to enable this: a. StringInfo's API now has two additional functions that allow creating a string that grows beyond the typical 1GB limit (and "long" string). ABI compatibility is maintained. We still limit these strings to 2 GB, though, for reasons explained below. b. COPY now uses long StringInfos, so that pg_dump doesn't choke trying to emit rows longer than 1GB. c. heap_form_tuple now uses the MCXT_ALLOW_HUGE flag in its allocation for the input tuple, which means that large tuples are accepted on input. Note that at this point we do not apply any further limit to the input tuple size. The main reason to limit to 2 GB is that the FE/BE protocol uses 32 bit length words to describe each row; and because the documentation is ambiguous on its signedness and libpq does consider it signed, we cannot use the highest-order bit. Additionally, the StringInfo API uses "int" (which is 4 bytes wide in most platforms) in many places, so we'd need to change that API too in order to improve, which has lots of fallout. Backpatch to 9.5, which is the oldest that has MemoryContextAllocExtended, a necessary piece of infrastructure. We could apply to 9.4 with very minimal additional effort, but any further than that would require backpatching "huge" allocations too. This is the largest set of changes we could find that can be back-patched without breaking compatibility with existing systems. Fixing a bigger set of problems (for example, dumping tuples bigger than 2GB, or dumping fields bigger than 1GB) would require changing the FE/BE protocol and/or changing the StringInfo API in an ABI-incompatible way, neither of which would be back-patchable. Authors: Daniel Vérité, Álvaro Herrera Reviewed by: Tomas Vondra Discussion: https://postgr.es/m/20160229183023.GA286012@alvherre.pgsql
* Refactor libpqwalreceiverPeter Eisentraut2016-12-01
| | | | | | | | | | | | | | | | The whole walreceiver API is now wrapped into a struct, like most of our other loadable module APIs. The libpq connection is no longer a global variable in libpqwalreceiver. Instead, it is encapsulated into a struct that is passed around the functions. This allows multiple walreceivers to run at the same time. Add some rudimentary support for logical replication connections to libpqwalreceiver. These changes are mostly cosmetic and are going to be useful for the future logical replication patches. From: Petr Jelinek <petr@2ndquadrant.com>
* Use latch instead of select() in walreceiverPeter Eisentraut2016-12-01
| | | | | | | | | Replace use of poll()/select() by WaitLatchOrSocket(), which is more portable and flexible. Also change walreceiver to use its procLatch instead of a custom latch. From: Petr Jelinek <petr@2ndquadrant.com>
* Add aggregate_with_argtypes and use it consistentlyPeter Eisentraut2016-12-01
| | | | | | | | This works like function_with_argtypes, but aggregates allow slightly different arguments. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
* Move function_with_argtypes to a better locationPeter Eisentraut2016-12-01
| | | | | | | | It was apparently added for use by GRANT/REVOKE, but move it closer to where other function signature related things are kept. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
* Use grammar symbol function_with_argtypes consistentlyPeter Eisentraut2016-12-01
| | | | | | | | | Instead of sometimes referring to a function signature like func_name func_args, use the existing function_with_argtypes symbol, which combines the two. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
* libpq: Fix inadvertent change in PQhost() behavior.Robert Haas2016-12-01
| | | | | | | | | | | Commit 274bb2b3857cc987cfa21d14775cae9b0dababa5 caused PQhost() to return the value of the hostaddr parameter rather than the relevant host when the latter parameter was specified. That's wrong. Commit 9a1d0af4ad2cbd419115b453d811c141b80d872b then amplified the damage by using PQhost() in more places, so that the SSL test suite started failing. Report by Andreas Karlsson; patch by me.
* User narrower representative tuples in the hash-agg hashtable.Andres Freund2016-11-30
| | | | | | | | | | | | | | | | | | | So far the hashtable stored representative tuples in the form of its input slot, with all columns in the hashtable that are not needed (i.e. not grouped upon or functionally dependent) set to NULL. Thats good for saving memory, but it turns out that having tuples full of NULL isn't free. slot_deform_tuple is faster if there's no NULL bitmap even if no NULLs are encountered, and skipping over leading NULLs isn't free. So compute a separate tuple descriptor that only contains the needed columns. As columns have already been moved in/out the slot for the hashtable that does not imply additional per-row overhead. Author: Andres Freund Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/20161103110721.h5i5t5saxfk5eeik@alap3.anarazel.de
* Perform one only projection to compute agg arguments.Andres Freund2016-11-30
| | | | | | | | | | | | | | | | Previously we did a ExecProject() for each individual aggregate argument. That turned out to be a performance bottleneck in queries with multiple aggregates. Doing all the argument computations in one ExecProject() is quite a bit cheaper because ExecProject's fastpath can do the work at once in a relatively tight loop, and because it can get all the required columns with a single slot_getsomeattr and save some other redundant setup costs. Author: Andres Freund Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/20161103110721.h5i5t5saxfk5eeik@alap3.anarazel.de
* Improve hash index bucket split behavior.Robert Haas2016-11-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the right to split a bucket was represented by a heavyweight lock on the page number of the primary bucket page. Unfortunately, this meant that every scan needed to take a heavyweight lock on that bucket also, which was bad for concurrency. Instead, use a cleanup lock on the primary bucket page to indicate the right to begin a split, so that scans only need to retain a pin on that page, which is they would have to acquire anyway, and which is also much cheaper. In addition to reducing the locking cost, this also avoids locking out scans and inserts for the entire lifetime of the split: while the new bucket is being populated with copies of the appropriate tuples from the old bucket, scans and inserts can happen in parallel. There are minor concurrency improvements for vacuum operations as well, though the situation there is still far from ideal. This patch also removes the unworldly assumption that a split will never be interrupted. With the new code, a split is done in a series of small steps and the system can pick up where it left off if it is interrupted prior to completion. While this patch does not itself add write-ahead logging for hash indexes, it is clearly a necessary first step, since one of the things that could interrupt a split is the removal of electrical power from the machine performing it. Amit Kapila. I wrote the original design on which this patch is based, and did a good bit of work on the comments and README through multiple rounds of review, but all of the code is Amit's. Also reviewed by Jesper Pedersen, Jeff Janes, and others. Discussion: http://postgr.es/m/CAA4eK1LfzcZYxLoXS874Ad0+S-ZM60U9bwcyiUZx9mHZ-KCWhw@mail.gmail.com