aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* Allow emit_log_hook to see original message textSimon Riggs2016-03-11
| | | | | | | | | emit_log_hook could only see the translated text, making it harder to identify which message was being sent. Pass original text to allow the exact message to be identified, whichever language is used for logging. Discussion: 20160216.184755.59721141.horiguchi.kyotaro@lab.ntt.co.jp Author: Kyotaro Horiguchi
* Simplify GetLockNameFromTagType.Robert Haas2016-03-10
| | | | | | The old code is wrong, because it returns a pointer to an automatic variable. And it's also more clever than we really need to be considering that the case it's worrying about should never happen.
* Blindly try to fix dtrace enabled builds, broken in 9cd00c45.Andres Freund2016-03-10
| | | | | Reported-By: Peter Eisentraut Discussion: 56E2239E.1050607@gmx.net
* Checkpoint sorting and balancing.Andres Freund2016-03-10
| | | | | | | | | | | | | | | | | | | | | | | | Up to now checkpoints were written in the order they're in the BufferDescriptors. That's nearly random in a lot of cases, which performs badly on rotating media, but even on SSDs it causes slowdowns. To avoid that, sort checkpoints before writing them out. We currently sort by tablespace, relfilenode, fork and block number. One of the major reasons that previously wasn't done, was fear of imbalance between tablespaces. To address that balance writes between tablespaces. The other prime concern was that the relatively large allocation to sort the buffers in might fail, preventing checkpoints from happening. Thus pre-allocate the required memory in shared memory, at server startup. This particularly makes it more efficient to have checkpoint flushing enabled, because that'll often result in a lot of writes that can be coalesced into one flush. Discussion: alpine.DEB.2.10.1506011320000.28433@sto Author: Fabien Coelho and Andres Freund
* Allow to trigger kernel writeback after a configurable number of writes.Andres Freund2016-03-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently writes to the main data files of postgres all go through the OS page cache. This means that some operating systems can end up collecting a large number of dirty buffers in their respective page caches. When these dirty buffers are flushed to storage rapidly, be it because of fsync(), timeouts, or dirty ratios, latency for other reads and writes can increase massively. This is the primary reason for regular massive stalls observed in real world scenarios and artificial benchmarks; on rotating disks stalls on the order of hundreds of seconds have been observed. On linux it is possible to control this by reducing the global dirty limits significantly, reducing the above problem. But global configuration is rather problematic because it'll affect other applications; also PostgreSQL itself doesn't always generally want this behavior, e.g. for temporary files it's undesirable. Several operating systems allow some control over the kernel page cache. Linux has sync_file_range(2), several posix systems have msync(2) and posix_fadvise(2). sync_file_range(2) is preferable because it requires no special setup, whereas msync() requires the to-be-flushed range to be mmap'ed. For the purpose of flushing dirty data posix_fadvise(2) is the worst alternative, as flushing dirty data is just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages from the page cache. Thus the feature is enabled by default only on linux, but can be enabled on all systems that have any of the above APIs. While desirable and likely possible this patch does not contain an implementation for windows. With the infrastructure added, writes made via checkpointer, bgwriter and normal user backends can be flushed after a configurable number of writes. Each of these sources of writes controlled by a separate GUC, checkpointer_flush_after, bgwriter_flush_after and backend_flush_after respectively; they're separate because the number of flushes that are good are separate, and because the performance considerations of controlled flushing for each of these are different. A later patch will add checkpoint sorting - after that flushes from the ckeckpoint will almost always be desirable. Bgwriter flushes are most of the time going to be random, which are slow on lots of storage hardware. Flushing in backends works well if the storage and bgwriter can keep up, but if not it can have negative consequences. This patch is likely to have negative performance consequences without checkpoint sorting, but unfortunately so has sorting without flush control. Discussion: alpine.DEB.2.10.1506011320000.28433@sto Author: Fabien Coelho and Andres Freund
* Give pull_var_clause() reject/recurse/return behavior for WindowFuncs too.Tom Lane2016-03-10
| | | | | | | | | | All along, this function should have treated WindowFuncs in a manner similar to Aggrefs, ie with an option whether or not to recurse into them. By not considering the case, it was always recursing, which is OK for most callers (although I suspect that the case in prepare_sort_from_pathkeys might represent a bug). But now we need return-without-recursing behavior as well. There are also more than a few callers that should never see a WindowFunc, and now we'll get some error checking on that.
* Don't vacuum all-frozen pages.Robert Haas2016-03-10
| | | | | | | | | | | | | | | | | Commit a892234f830e832110f63fc0a2afce2fb21d1584 gave us enough infrastructure to avoid vacuuming pages where every tuple on the page is already frozen. So, replace the notion of a scan_all or whole-table vacuum with the less onerous notion of an "aggressive" vacuum, which will pages that are all-visible, but still skip those that are all-frozen. This should greatly reduce the cost of anti-wraparound vacuuming on large clusters where the majority of data is never touched between one cycle and the next, because we'll no longer have to read all of those pages only to find out that we don't need to do anything with them. Patch by me, reviewed by Masahiko Sawada.
* Refactor pull_var_clause's API to make it less tedious to extend.Tom Lane2016-03-10
| | | | | | | | | | | | | | | | | | | | In commit 1d97c19a0f748e94 and later c1d9579dd8bf3c92, we extended pull_var_clause's API by adding enum-type arguments. That's sort of a pain to maintain, though, because it means every time we add a new behavior we must touch every last one of the call sites, even if there's a reasonable default behavior that most of them could use. Let's switch over to using a bitmask of flags, instead; that seems more maintainable and might save a nanosecond or two as well. This commit changes no behavior in itself, though I'm going to follow it up with one that does add a new behavior. In passing, remove flatten_tlist(), which has not been used since 9.1 and would otherwise need the same API changes. Removing these enums means that optimizer/tlist.h no longer needs to depend on optimizer/var.h. Changing that caused a number of C files to need addition of #include "optimizer/var.h" (probably we can thank old runs of pgrminclude for that); but on balance it seems like a good change anyway.
* Rework wait for AccessExclusiveLocks on Hot StandbySimon Riggs2016-03-10
| | | | | | | | Earlier version committed in 9.0 caused spurious waits in some cases. New infrastructure for lock waits in 9.3 used to correct and improve this. Jeff Janes based upon a proposal by Simon Riggs, who also reviewed Additional review comments from Amit Kapila
* Provide much better wait information in pg_stat_activity.Robert Haas2016-03-10
| | | | | | | | | | | | When a process is waiting for a heavyweight lock, we will now indicate the type of heavyweight lock for which it is waiting. Also, you can now see when a process is waiting for a lightweight lock - in which case we will indicate the individual lock name or the tranche, as appropriate - or for a buffer pin. Amit Kapila, Ildus Kurbangaliev, reviewed by me. Lots of helpful discussion and suggestions by many others, including Alexander Korotkov, Vladimir Borodin, and many others.
* Avoid crash on old Windows with AVX2-capable CPU for VS2013 buildsMagnus Hagander2016-03-10
| | | | | | | | | | | | | | | | The Visual Studio 2013 CRT generates invalid code when it makes a 64-bit build that is later used on a CPU that supports AVX2 instructions using a version of Windows before 7SP1/2008R2SP1. Detect this combination, and in those cases turn off the generation of FMA3, per recommendation from the Visual Studio team. The bug is actually in the CRT shipping with Visual Studio 2013, but Microsoft have stated they're only fixing it in newer major versions. The fix is therefor conditioned specifically on being built with this version of Visual Studio, and not previous or later versions. Author: Christian Ullrich
* Reduce size of two phase file headerSimon Riggs2016-03-10
| | | | | | | Previously 2PC header was fixed at 200 bytes, which in most cases wasted WAL space for a workload using 2PC heavily. Pavan Deolasee, reviewed by Petr Jelinek
* Reduce lock level for altering fillfactorSimon Riggs2016-03-10
| | | | Fabrízio de Royes Mello and Simon Riggs
* Code review for b6fb6471f6afaf649e52f38269fd8c5c60647669.Robert Haas2016-03-10
| | | | | Reports by Tomas Vondra, Vinayak Pokale, and Aleksander Alekseev. Patch by Amit Langote.
* Remove a couple of useless pstrdup() calls.Tom Lane2016-03-09
| | | | | | | | | | | There's no point in pstrdup'ing the result of TextDatumGetCString, since that's necessarily already a freshly-palloc'd C string. These particular calls are unlikely to be of any consequence performance-wise, but still they're a bad precedent that can confuse future patch authors. Noted by Chapman Flack.
* Avoid unlikely data-loss scenarios due to rename() without fsync.Andres Freund2016-03-09
| | | | | | | | | | | | | | | | | | | | | Renaming a file using rename(2) is not guaranteed to be durable in face of crashes. Use the previously added durable_rename()/durable_link_or_rename() in various places where we previously just renamed files. Most of the changed call sites are arguably not critical, but it seems better to err on the side of too much durability. The most prominent known case where the previously missing fsyncs could cause data loss is crashes at the end of a checkpoint. After the actual checkpoint has been performed, old WAL files are recycled. When they're filled, their contents are fdatasynced, but we did not fsync the containing directory. An OS/hardware crash in an unfortunate moment could then end up leaving that file with its old name, but new content; WAL replay would thus not replay it. Reported-By: Tomas Vondra Author: Michael Paquier, Tomas Vondra, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches
* Introduce durable_rename() and durable_link_or_rename().Andres Freund2016-03-09
| | | | | | | | | | | | | | | | | | | | | | | | | Renaming a file using rename(2) is not guaranteed to be durable in face of crashes; especially on filesystems like xfs and ext4 when mounted with data=writeback. To be certain that a rename() atomically replaces the previous file contents in the face of crashes and different filesystems, one has to fsync the old filename, rename the file, fsync the new filename, fsync the containing directory. This sequence is not generally adhered to currently; which exposes us to data loss risks. To avoid having to repeat this arduous sequence, introduce durable_rename(), which wraps all that. Also add durable_link_or_rename(). Several places use link() (with a fallback to rename()) to rename a file, trying to avoid replacing the target file out of paranoia. Some of those rename sequences need to be durable as well. There seems little reason extend several copies of the same logic, so centralize the link() callers. This commit does not yet make use of the new functions; they're used in a followup commit. Author: Michael Paquier, Andres Freund Discussion: 56583BDD.9060302@2ndquadrant.com Backpatch: All supported branches
* PostgresNode: add backup_fs_hot and backup_fs_coldAlvaro Herrera2016-03-09
| | | | | | | | | | These simple methods rely on RecursiveCopy to create a filesystem-level backup of a server. They aren't currently used anywhere yet,but will be useful for future tests. Author: Craig Ringer Reviewed-By: Michael Paquier, Salvador Fandino, Álvaro Herrera Commitfest-URL: https://commitfest.postgresql.org/9/569/
* Add filter capability to RecursiveCopy::copypathAlvaro Herrera2016-03-09
| | | | | | | | | | | This allows skipping copying certain files and subdirectories in tests. This is useful in some circumstances such as copying a data directory; future tests want this feature. Also POD-ify the module. Authors: Craig Ringer, Pallavi Sontakke Reviewed-By: Álvaro Herrera
* Fix incorrect handling of NULL index entries in indexed ROW() comparisons.Tom Lane2016-03-09
| | | | | | | | | | | | | | | | | | | | | | An index search using a row comparison such as ROW(a, b) > ROW('x', 'y') would stop upon reaching a NULL entry in the "b" column, ignoring the fact that there might be non-NULL "b" values associated with later values of "a". This happens because _bt_mark_scankey_required() marks the subsidiary scankey for "b" as required, which is just wrong: it's for a column after the one with the first inequality key (namely "a"), and thus can't be considered a required match. This bit of brain fade dates back to the very beginnings of our support for indexed ROW() comparisons, in 2006. Kind of astonishing that no one came across it before Glen Takahashi, in bug #14010. Back-patch to all supported versions. Note: the given test case doesn't actually fail in unpatched 9.1, evidently because the fix for bug #6278 (i.e., stopping at nulls in either scan direction) is required to make it fail. I'm sure I could devise a case that fails in 9.1 as well, perhaps with something involving making a cursor back up; but it doesn't seem worth the trouble.
* Re-pgindent vacuumlazy.c.Robert Haas2016-03-09
|
* pgbench: When -T is used, don't wait for transactions beyond end of run.Robert Haas2016-03-09
| | | | | | | At low rates, this can lead to pgbench taking significantly longer to terminate than the user might expect. Repair. Fabien Coelho, reviewed by Aleksander Alekseev, Álvaro Herrera, and me.
* Add a generic command progress reporting facility.Robert Haas2016-03-09
| | | | | | | | | | | | | | | | | Using this facility, any utility command can report the target relation upon which it is operating, if there is one, and up to 10 64-bit counters; the intent of this is that users should be able to figure out what a utility command is doing without having to resort to ugly hacks like attaching strace to a backend. As a demonstration, this adds very crude reporting to lazy vacuum; we just report the target relation and nothing else. A forthcoming patch will make VACUUM report a bunch of additional data that will make this much more interesting. But this gets the basic framework in place. Vinayak Pokale, Rahila Syed, Amit Langote, Robert Haas, reviewed by Kyotaro Horiguchi, Jim Nasby, Thom Brown, Masahiko Sawada, Fujii Masao, and Masanori Oyama.
* Fix incorrect tlist generation in create_gather_plan().Tom Lane2016-03-09
| | | | | | | | This function is written as though Gather doesn't project; but it does. Even if it did not project, though, we must use build_path_tlist to ensure that the output columns receive correct sortgroupref labeling. Per report from Amit Kapila.
* Fix copy-and-pasteo in comment.Tom Lane2016-03-09
| | | | Wensheng Zhang
* Improve handling of pathtargets in planner.c.Tom Lane2016-03-09
| | | | | | | | | | | | | Refactor so that the internal APIs in planner.c deal in PathTargets not targetlists, and establish a more regular structure for deriving the targets needed for successive steps. There is more that could be done here; calculating the eval costs of each successive target independently is both inefficient and wrong in detail, since we won't actually recompute values available from the input node's tlist. But it's no worse than what happened before the pathification rewrite. In any case this seems like a good starting point for considering how to handle Konstantin Knizhnik's function-evaluation-postponement patch.
* Add valgrind suppressions for python code.Andres Freund2016-03-08
| | | | | | | | | | | | | | | | Python's allocator does some low-level tricks for efficiency; unfortunately they trigger valgrind errors. Those tricks can be disabled making instrumentation easier; but few people testing postgres will have such a build of python. So add broad suppressions of the resulting errors. See also https://svn.python.org/projects/python/trunk/Misc/README.valgrind This possibly will suppress valid errors, but without it it's basically impossible to use valgrind with plpython code. Author: Andres Freund Backpatch: 9.4, where we started to maintain valgrind suppressions
* Add valgrind suppressions for bootstrap related code.Andres Freund2016-03-08
| | | | | Author: Andres Freund Backpatch: 9.4, where we started to maintain valgrind suppressions
* Improve handling of group-column indexes in GroupingSetsPath.Tom Lane2016-03-08
| | | | | | | | | | | | | | | | | | | Instead of having planner.c compute a groupColIdx array and store it in GroupingSetsPaths, make create_groupingsets_plan() find the grouping columns by searching in the child plan node's tlist. Although that's probably a bit slower for create_groupingsets_plan(), it's more like the way every other plan node type does this, and it provides positive confirmation that we know which child output columns we're supposed to be grouping on. (Indeed, looking at this now, I'm not at all sure that it wasn't broken before, because create_groupingsets_plan() isn't demanding an exact tlist match from its child node.) Also, this allows substantial simplification in planner.c, because it no longer needs to compute the groupColIdx array at all; no other cases were using it. I'd intended to put off this refactoring until later (like 9.7), but in view of the likely bug fix and the need to rationalize planner.c's tlist handling so we can do something sane with Konstantin Knizhnik's function-evaluation-postponement patch, I think it can't wait.
* Handle invalid libpq sockets in more placesPeter Eisentraut2016-03-08
| | | | | | Also, make error messages consistent. From: Michael Paquier <michael.paquier@gmail.com>
* Suppress GCC 6 warning about self-comparisonPeter Eisentraut2016-03-08
| | | | Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
* psql: Fix some strange code in SQL help creationPeter Eisentraut2016-03-08
| | | | | | | | | | | | | Struct QL_HELP used to be defined as static in the sql_help.h header file, which is included in sql_help.c and help.c, thus creating two separate instances of the struct. This causes a warning from GCC 6, because the struct is not used in sql_help.c. Instead, declare the struct as extern in the header file and define it in sql_help.c. This also allows making a bunch of functions static because they are no longer needed outside of sql_help.c. Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
* ecpg: Fix typoPeter Eisentraut2016-03-08
| | | | | | GCC 6 points out the redundant conditions, which were apparently typos. Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
* Fix minor thinko in pathification code.Tom Lane2016-03-08
| | | | | | | | I passed the wrong "root" struct to create_pathtarget in build_minmax_path. Since the subroot is a clone of the outer root, this would not cause any serious problems, but it would waste some cycles because set_pathtarget_cost_width would not have access to Var width estimates set up while running query_planner on the subroot.
* plperl: Correctly handle empty arrays in plperl_ref_from_pg_array.Andres Freund2016-03-08
| | | | | | | | | | | | plperl_ref_from_pg_array() didn't consider the case that postgrs arrays can have 0 dimensions (when they're empty) and accessed the first dimension without a check. Fix that by special casing the empty array case. Author: Alex Hunsaker Reported-By: Andres Freund / valgrind / buildfarm animal skink Discussion: 20160308063240.usnzg6bsbjrne667@alap3.anarazel.de Backpatch: 9.1-
* Finish refactoring make_foo() functions in createplan.c.Tom Lane2016-03-08
| | | | | | | | | | | | | This patch removes some redundant cost calculations that I left for later cleanup in commit 3fc6e2d7f5b652b4. There's now a uniform policy that the make_foo() convenience functions don't do any cost calculations. Most of their callers copy costs from the source Path node, and for those that don't, the calculation in the make_foo() function wasn't necessarily right anyhow. (make_result() was particularly a mess, as it was serving multiple callers using cost calcs designed for only the first one or two that had ever existed.) Aside from saving a few cycles, this ensures that what EXPLAIN prints matches the costs we used for planning purposes. It does not change any planner decisions, since the decisions are already made.
* Comment update for fdw_recheck_quals.Robert Haas2016-03-08
| | | | | | | Commit 5fc4c26db5120bd90348b6ee3101fcddfdf54800 could've done a better job updating these comments. Etsuro Fujita
* Add new flags argument for xl_heap_visible to heap2_desc.Robert Haas2016-03-08
| | | | Masahiko Sawada
* Fix parallel query on standby servers.Robert Haas2016-03-08
| | | | | | | Without this fix, it inevitably bombs out with "ERROR: failed to initialize transaction_read_only to 0". Repair. Ashutosh Sharma; comments adjusted by me.
* Add some functions to fd.c for the convenience of extensions.Robert Haas2016-03-08
| | | | | | | | For example, if you want to perform an ioctl() on a file descriptor opened through the fd.c routines, there's no way to do that without being able to get at the underlying fd. KaiGai Kohei
* Department of second thoughts: remove PD_ALL_FROZEN.Robert Haas2016-03-08
| | | | | | | | | | | Commit a892234f830e832110f63fc0a2afce2fb21d1584 added a second bit per page to the visibility map, which still seems like a good idea, but it also added a second page-level bit alongside PD_ALL_VISIBLE to track whether the visibility map bit was set. That no longer seems like a clever plan, because we don't really need that bit for anything. We always clear both bits when the page is modified anyway. Patch by me, reviewed by Kyotaro Horiguchi and Masahiko Sawada.
* pg_upgrade: Remove converter plugin facility.Robert Haas2016-03-08
| | | | | | | | | We've not found a use for this so far, and the current need, which is to convert the visibility map to a new format, does not suit the existing design anyway. So just rip it out. Author: Masahiko Sawada, slightly revised by me. Discussion: 20160215211313.GB31273@momjian.us
* Spell "parallel" correctly.Tom Lane2016-03-07
| | | | Per David Rowley.
* Fix uninstall target in tsearch MakefilePeter Eisentraut2016-03-07
| | | | Artur Zakirov
* Make get_controlfile() error logging consistent with src/commonJoe Conway2016-03-07
| | | | | | | | As originally committed, get_controlfile() used a non-standard approach to error logging. Make it consistent with the majority of error logging done in src/common. Applies to master only.
* Further improvements to c8f621c43.Andres Freund2016-03-07
| | | | | | | | | | | | | | Coverity and inspection for the issue addressed in fd45d16f found some questionable code. Specifically coverity noticed that the wrong length was added in ReorderBufferSerializeChange() - without immediate negative consequences as the variable isn't used afterwards. During code-review and testing I noticed that a bit of space was wasted when allocating tuple bufs in several places. Thirdly, the debug memset()s in ReorderBufferGetTupleBuf() reduce the error checking valgrind can do. Backpatch: 9.4, like c8f621c43.
* Make the upper part of the planner work by generating and comparing Paths.Tom Lane2016-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've been saying we needed to do this for more than five years, and here it finally is. This patch removes the ever-growing tangle of spaghetti logic that grouping_planner() used to use to try to identify the best plan for post-scan/join query steps. Now, there is (nearly) independent consideration of each execution step, and entirely separate construction of Paths to represent each of the possible ways to do that step. We choose the best Path or set of Paths using the same add_path() logic that's been used inside query_planner() for years. In addition, this patch removes the old restriction that subquery_planner() could return only a single Plan. It now returns a RelOptInfo containing a set of Paths, just as query_planner() does, and the parent query level can use each of those Paths as the basis of a SubqueryScanPath at its level. This allows finding some optimizations that we missed before, wherein a subquery was capable of returning presorted data and thereby avoiding a sort in the parent level, making the overall cost cheaper even though delivering sorted output was not the cheapest plan for the subquery in isolation. (A couple of regression test outputs change in consequence of that. However, there is very little change in visible planner behavior overall, because the point of this patch is not to get immediate planning benefits but to create the infrastructure for future improvements.) There is a great deal left to do here. This patch unblocks a lot of planner work that was basically impractical in the old code structure, such as allowing FDWs to implement remote aggregation, or rewriting plan_set_operations() to allow consideration of multiple implementation orders for set operations. (The latter will likely require a full rewrite of plan_set_operations(); what I've done here is only to fix it to return Paths not Plans.) I have also left unfinished some localized refactoring in createplan.c and planner.c, because it was not necessary to get this patch to a working state. Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
* Fix backwards test for Windows service-ness in pg_ctl.Tom Lane2016-03-07
| | | | | | | A thinko in a96761391 caused pg_ctl to get it exactly backwards when deciding whether to report problems to the Windows eventlog or to stderr. Per bug #14001 from Manuel Mathar, who also identified the fix. Like the previous patch, back-patch to all supported branches.
* Re-fix broken definition for function name in pgbench's exprscan.l.Tom Lane2016-03-06
| | | | | Wups, my first try wasn't quite right either. Too focused on fixing the existing bug, not enough on not introducing new ones.
* Fix broken definition for function name in pgbench's exprscan.l.Tom Lane2016-03-06
| | | | | | | | | | | As written, this would accept e.g. 123e9 as a function name. Aside from being mildly astonishing, that would come back to haunt us if we ever try to add float constants to the expression syntax. Insist that function names start with letters (or at least non-digits). In passing reset yyline as well as yycol when starting a new expression. This variable is useless since it's used nowhere, but if we're going to have it we should have it act sanely.