aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
...
* Fix relcache for policies, and doc updatesStephen Frost2014-09-26
| | | | | | | | | | | | | | | | | | Andres pointed out that there was an extra ';' in equalPolicies, which made me realize that my prior testing with CLOBBER_CACHE_ALWAYS was insufficient (it didn't always catch the issue, just most of the time). Thanks to that, a different issue was discovered, specifically in equalRSDescs. This change corrects eqaulRSDescs to return 'true' once all policies have been confirmed logically identical. After stepping through both functions to ensure correct behavior, I ran this for about 12 hours of CLOBBER_CACHE_ALWAYS runs of the regression tests with no failures. In addition, correct a few typos in the documentation which were pointed out by Thom Brown (thanks!) and improve the policy documentation further by adding a flushed out usage example based on a unix passwd file. Lastly, clean up a few comments in the regression tests and pg_dump.h.
* Fix whitespacePeter Eisentraut2014-09-26
|
* Add a basic atomic ops API abstracting away platform/architecture details.Andres Freund2014-09-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several upcoming performance/scalability improvements require atomic operations. This new API avoids the need to splatter compiler and architecture dependent code over all the locations employing atomic ops. For several of the potential usages it'd be problematic to maintain both, a atomics using implementation and one using spinlocks or similar. In all likelihood one of the implementations would not get tested regularly under concurrency. To avoid that scenario the new API provides a automatic fallback of atomic operations to spinlocks. All properties of atomic operations are maintained. This fallback - obviously - isn't as fast as just using atomic ops, but it's not bad either. For one of the future users the atomics ontop spinlocks implementation was actually slightly faster than the old purely spinlock using implementation. That's important because it reduces the fear of regressing older platforms when improving the scalability for new ones. The API, loosely modeled after the C11 atomics support, currently provides 'atomic flags' and 32 bit unsigned integers. If the platform efficiently supports atomic 64 bit unsigned integers those are also provided. To implement atomics support for a platform/architecture/compiler for a type of atomics 32bit compare and exchange needs to be implemented. If available and more efficient native support for flags, 32 bit atomic addition, and corresponding 64 bit operations may also be provided. Additional useful atomic operations are implemented generically ontop of these. The implementation for various versions of gcc, msvc and sun studio have been tested. Additional existing stub implementations for * Intel icc * HUPX acc * IBM xlc are included but have never been tested. These will likely require fixes based on buildfarm and user feedback. As atomic operations also require barriers for some operations the existing barrier support has been moved into the atomics code. Author: Andres Freund with contributions from Oskari Saarenmaa Reviewed-By: Amit Kapila, Robert Haas, Heikki Linnakangas and Álvaro Herrera Discussion: CA+TgmoYBW+ux5-8Ja=Mcyuy8=VXAnVRHp3Kess6Pn3DMXAPAEA@mail.gmail.com, 20131015123303.GH5300@awork2.anarazel.de, 20131028205522.GI20248@awork2.anarazel.de
* Remove ill-conceived ban on zero length json object keys.Andrew Dunstan2014-09-25
| | | | | | | | | | We removed a similar ban on this in json_object recently, but the ban in datum_to_json was left, which generate4d sprutious errors in othee json generators, notable json_build_object. Along the way, add an assertion that datum_to_json is not passed a null key. All current callers comply with this rule, but the assertion will catch any possible future misbehaviour.
* Change locking regimen around buffer replacement.Robert Haas2014-09-25
| | | | | | | | | | | | | | | | | | | | | Previously, we used an lwlock that was held from the time we began seeking a candidate buffer until the time when we found and pinned one, which is disastrous for concurrency. Instead, use a spinlock which is held just long enough to pop the freelist or advance the clock sweep hand, and then released. If we need to advance the clock sweep further, we reacquire the spinlock once per buffer. This represents a significant increase in atomic operations around buffer eviction, but it still wins on many workloads. On others, it may result in no gain, or even cause a regression, unless the number of buffer mapping locks is also increased. However, that seems like material for a separate commit. We may also need to consider other methods of mitigating contention on this spinlock, such as splitting it into multiple locks or jumping the clock sweep hand more than one buffer at a time, but those, too, seem like separate improvements. Patch by me, inspired by a much larger patch from Amit Kapila. Reviewed by Andres Freund.
* Fix VPATH builds of the replication parser from git for some !gcc compilers.Andres Freund2014-09-25
| | | | | | | | | | | | | Some compilers don't automatically search the current directory for included files. 9cc2c182fc2 fixed that for builds from tarballs by adding an include to the source directory. But that doesn't work when the scanner is generated in the VPATH directory. Use the same search path as the other parsers in the tree. One compiler that definitely was affected is solaris' sun cc. Backpatch to 9.1 which introduced using an actual parser for replication commands.
* Return NULL from json_object_agg if it gets no rows.Andrew Dunstan2014-09-25
| | | | | This makes it consistent with the docs and with all other builtin aggregates apart from count().
* Copy-editing of row securityStephen Frost2014-09-24
| | | | | | Address a few typos in the row security update, pointed out off-list by Adam Brightwell. Also include 'ALL' in the list of commands supported, for completeness.
* Code review for row security.Stephen Frost2014-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Buildfarm member tick identified an issue where the policies in the relcache for a relation were were being replaced underneath a running query, leading to segfaults while processing the policies to be added to a query. Similar to how TupleDesc RuleLocks are handled, add in a equalRSDesc() function to check if the policies have actually changed and, if not, swap back the rsdesc field (using the original instead of the temporairly built one; the whole structure is swapped and then specific fields swapped back). This now passes a CLOBBER_CACHE_ALWAYS for me and should resolve the buildfarm error. In addition to addressing this, add a new chapter in Data Definition under Privileges which explains row security and provides examples of its usage, change \d to always list policies (even if row security is disabled- but note that it is disabled, or enabled with no policies), rework check_role_for_policy (it really didn't need the entire policy, but it did need to be using has_privs_of_role()), and change the field in pg_class to relrowsecurity from relhasrowsecurity, based on Heikki's suggestion. Also from Heikki, only issue SET ROW_SECURITY in pg_restore when talking to a 9.5+ server, list Bypass RLS in \du, and document --enable-row-security options for pg_dump and pg_restore. Lastly, fix a number of minor whitespace and typo issues from Heikki, Dimitri, add a missing #include, per Peter E, fix a few minor variable-assigned-but-not-used and resource leak issues from Coverity and add tab completion for role attribute bypassrls as well.
* Fix bogus variable-mangling in security_barrier_replace_vars().Tom Lane2014-09-24
| | | | | | | | | | | | | | This function created new Vars with varno different from varnoold, which is a condition that should never prevail before setrefs.c does the final variable-renumbering pass. The created Vars could not be seen as equal() to normal Vars, which among other things broke equivalence-class processing for them. The consequences of this were indeed visible in the regression tests, in the form of failure to propagate constants as one would expect. I stumbled across it while poking at bug #11457 --- after intentionally disabling join equivalence processing, the security-barrier regression tests started falling over with fun errors like "could not find pathkey item to sort", because of failure to match the corrupted Vars to normal ones.
* Fix incorrect search for "x?" style matches in creviterdissect().Tom Lane2014-09-23
| | | | | | | | | | | | | | | | | | | | When the number of allowed iterations is limited (either a "?" quantifier or a bound expression), the last sub-match has to reach to the end of the target string. The previous coding here first tried the shortest possible match (one character, usually) and then gave up and back-tracked if that didn't work, typically leading to failure to match overall, as shown in bug #11478 from Christoph Berg. The minimum change to fix that would be to not decrement k before "goto backtrack"; but that would be a pretty stupid solution, because we'd laboriously try each possible sub-match length before finally discovering that only ending at the end can work. Instead, force the sub-match endpoint limit up to the end for even the first shortest() call if we cannot have any more sub-matches after this one. Bug introduced in my rewrite that added the iterdissect logic, commit 173e29aa5deefd9e71c183583ba37805c8102a72. The shortest-first search code was too closely modeled on the longest-first code, which hasn't got this issue since it tries a match reaching to the end to start with anyway. Back-patch to all affected branches.
* Log ALTER SYSTEM statements as DDLStephen Frost2014-09-22
| | | | | | | | | Per discussion in bug #11350, log ALTER SYSTEM commands at the log_statement=ddl level, rather than at the log_statement=all level. Pointed out by Tomonari Katsumata. Back-patch to 9.4 where ALTER SYSTEM was introduced.
* Process withCheckOption exprs in setrefs.cStephen Frost2014-09-22
| | | | | | | | | | | | | | While withCheckOption exprs had been handled in many cases by happenstance, they need to be handled during set_plan_references and more specifically down in set_plan_refs for ModifyTable plan nodes. This is to ensure that the opfuncid's are set for operators referenced in the withCheckOption exprs. Identified as an issue by Thom Brown Patch by Dean Rasheed Back-patch to 9.4, where withCheckOption was introduced.
* Remove most volatile qualifiers from xlog.cAndres Freund2014-09-22
| | | | | | | | | | | | | | For the reason outlined in df4077cda2e also remove volatile qualifiers from xlog.c. Some of these uses of volatile have been added after noticing problems back when spinlocks didn't imply compiler barriers. So they are a good test - in fact removing the volatiles breaks when done without the barriers in spinlocks present. Several uses of volatile remain where they are explicitly used to access shared memory without locks. These locations are ok with slightly out of date data, but removing the volatile might lead to the variables never being reread from memory. These uses could also be replaced by barriers, but that's a separate change of doubtful value.
* Remove volatile qualifiers from lwlock.c.Robert Haas2014-09-22
| | | | | | | Now that spinlocks (hopefully!) act as compiler barriers, as of commit 0709b7ee72e4bc71ad07b7120acd117265ab51d0, this should be safe. This serves as a demonstration of the new coding style, and may be optimized better on some machines as well.
* Fix compiler warning.Robert Haas2014-09-22
| | | | It is meaningless to declare a pass-by-value return type const.
* Fix mishandling of CreateEventTrigStmt's eventname field.Robert Haas2014-09-22
| | | | | | It's a string, not a scalar. Petr Jelinek
* Remove postgres --help blurb about the removed -A option.Andres Freund2014-09-22
| | | | | | | I missed this in 3bdcf6a5a755503. Noticed by Merlin Moncure Discussion: CAHyXU0yC7uPeeVzQROwtnrOP9dxTEUPYjB0og4qUnbipMEV57w@mail.gmail.com
* Improve code around the recently added rm_identify rmgr callback.Andres Freund2014-09-22
| | | | | | | | | | | | | | | There are four weaknesses in728f152e07f998d2cb4fe5f24ec8da2c3bda98f2: * append_init() in heapdesc.c was ugly and required that rm_identify return values are only valid till the next call. Instead just add a couple more switch() cases for the INIT_PAGE cases. Now the returned value will always be valid. * a couple rm_identify() callbacks missed masking xl_info with ~XLR_INFO_MASK. * pg_xlogdump didn't map a NULL rm_identify to UNKNOWN or a similar string. * append_init() was called when id=NULL - which should never actually happen. But it's better to be careful.
* Add a fast pre-check for equality of equal-length strings.Robert Haas2014-09-19
| | | | | | | | | Testing reveals that that doing a memcmp() before the strcoll() costs practically nothing, at least on the systems we tested, and it speeds up sorts containing many equal strings significatly. Peter Geoghegan. Review by myself and Heikki Linnakangas. Comments rewritten by me.
* Row-Level Security Policies (RLS)Stephen Frost2014-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Building on the updatable security-barrier views work, add the ability to define policies on tables to limit the set of rows which are returned from a query and which are allowed to be added to a table. Expressions defined by the policy for filtering are added to the security barrier quals of the query, while expressions defined to check records being added to a table are added to the with-check options of the query. New top-level commands are CREATE/ALTER/DROP POLICY and are controlled by the table owner. Row Security is able to be enabled and disabled by the owner on a per-table basis using ALTER TABLE .. ENABLE/DISABLE ROW SECURITY. Per discussion, ROW SECURITY is disabled on tables by default and must be enabled for policies on the table to be used. If no policies exist on a table with ROW SECURITY enabled, a default-deny policy is used and no records will be visible. By default, row security is applied at all times except for the table owner and the superuser. A new GUC, row_security, is added which can be set to ON, OFF, or FORCE. When set to FORCE, row security will be applied even for the table owner and superusers. When set to OFF, row security will be disabled when allowed and an error will be thrown if the user does not have rights to bypass row security. Per discussion, pg_dump sets row_security = OFF by default to ensure that exports and backups will have all data in the table or will error if there are insufficient privileges to bypass row security. A new option has been added to pg_dump, --enable-row-security, to ask pg_dump to export with row security enabled. A new role capability, BYPASSRLS, which can only be set by the superuser, is added to allow other users to be able to bypass row security using row_security = OFF. Many thanks to the various individuals who have helped with the design, particularly Robert Haas for his feedback. Authors include Craig Ringer, KaiGai Kohei, Adam Brightwell, Dean Rasheed, with additional changes and rework by me. Reviewers have included all of the above, Greg Smith, Jeff McCormick, and Robert Haas.
* Add rmgr callback to name xlog record types for display purposes.Andres Freund2014-09-19
| | | | | | | | | | | | | | | | | | | This is primarily useful for the upcoming pg_xlogdump --stats feature, but also allows to remove some duplicated code in the rmgr_desc routines. Due to the separation and harmonization, the output of dipsplayed records changes somewhat. But since this isn't enduser oriented content that's ok. It's potentially desirable to further change pg_xlogdump's display of records. It previously wasn't possible to show the record type separately from the description forcing it to be in the last column. But that's better done in a separate commit. Author: Abhijit Menon-Sen, slightly editorialized by me Reviewed-By: Álvaro Herrera, Andres Freund, and Heikki Linnakangas Discussion: 20140604104716.GA3989@toroid.org
* Fix pointer type in size passed to memset.Heikki Linnakangas2014-09-14
| | | | | | | Pointers are all the same size, so it makes no practical difference, but let's be tidy. Found by Coverity, noted off-list by Tom Lane.
* Invent PGC_SU_BACKEND and mark log_connections/log_disconnections that way.Tom Lane2014-09-13
| | | | | | | | | | | | | | | | | | | This new GUC context option allows GUC parameters to have the combined properties of PGC_BACKEND and PGC_SUSET, ie, they don't change after session start and non-superusers can't change them. This is a more appropriate choice for log_connections and log_disconnections than their previous context of PGC_BACKEND, because we don't want non-superusers to be able to affect whether their sessions get logged. Note: the behavior for log_connections is still a bit odd, in that when a superuser attempts to set it from PGOPTIONS, the setting takes effect but it's too late to enable or suppress connection startup logging. It's debatable whether that's worth fixing, and in any case there is a reasonable argument for PGC_SU_BACKEND to exist. In passing, re-pgindent the files touched by this commit. Fujii Masao, reviewed by Joe Conway and Amit Kapila
* Revert f68dc5d86b9f287f80f4417f5a24d876eb13771dBruce Momjian2014-09-12
| | | | Renaming will have to be more comprehensive, so I need approval.
* More formatting.c variable renaming, for clarityBruce Momjian2014-09-12
|
* Change NTUP_PER_BUCKET to 1 to improve hash join lookup speed.Robert Haas2014-09-12
| | | | | | | | | | | | | | | | Since this makes the bucket headers use ~10x as much memory, properly account for that memory when we figure out whether everything fits in work_mem. This might result in some cases that previously used only a single batch getting split into multiple batches, but it's unclear as yet whether we need defenses against that case, and if so, what the shape of those defenses should be. It's worth noting that even in these edge cases, users should still be no worse off than they would have been last week, because commit 45f6240a8fa9d35548eb2ef23dba2c11540aa02a saved a big pile of memory on exactly the same workloads. Tomas Vondra, reviewed and somewhat revised by me.
* Add GUC to enable logging of replication commands.Fujii Masao2014-09-13
| | | | | | | | | | | | | | | Previously replication commands like IDENTIFY_COMMAND were not logged even when log_statements is set to all. Some users who want to audit all types of statements were not satisfied with this situation. To address the problem, this commit adds new GUC log_replication_commands. If it's enabled, all replication commands are logged in the server log. There are many ways to allow us to enable that logging. For example, we can extend log_statement so that replication commands are logged when it's set to all. But per discussion in the community, we reached the consensus to add separate GUC for that. Reviewed by Ian Barwick, Robert Haas and Heikki Linnakangas.
* Fix GIN data page split ratio calculation.Heikki Linnakangas2014-09-12
| | | | | | | | | | The code that tried to split a page at 75/25 ratio, when appending to the end of an index, was buggy in two ways. First, there was a silly typo that caused it to just fill the left page as full as possible. But the logic as it was intended wasn't correct either, and would actually have given a ratio closer to 60/40 than 75/25. Gaetano Mendola spotted the typo. Backpatch to 9.4, where this code was added.
* Fix power_var_int() for large integer exponents.Tom Lane2014-09-11
| | | | | | | | | | | | | | | | | | | The code for raising a NUMERIC value to an integer power wasn't very careful about large powers. It got an outright wrong answer for an exponent of INT_MIN, due to failure to consider overflow of the Abs(exp) operation; which is fixable by using an unsigned rather than signed exponent value after that point. Also, even though the number of iterations of the power-computation loop is pretty limited, it's easy for the repeated squarings to result in ridiculously enormous intermediate values, which can take unreasonable amounts of time/memory to process, or even overflow the internal "weight" field and so produce a wrong answer. We can forestall misbehaviors of that sort by bailing out as soon as the weight value exceeds what will fit in int16, since then the final answer must overflow (if exp > 0) or underflow (if exp < 0) the packed numeric format. Per off-list report from Pavel Stehule. Back-patch to all supported branches.
* Add 'ignore_nulls' option to row_to_jsonStephen Frost2014-09-11
| | | | | | | | | | | | | | | Provide an option to skip NULL values in a row when generating a JSON object from that row with row_to_json. This can reduce the size of the JSON object in cases where columns are NULL without really reducing the information in the JSON object. This also makes row_to_json into a single function with default values, rather than having multiple functions. In passing, change array_to_json to also be a single function with default values (we don't add an 'ignore_nulls' option yet- it's not clear that there is a sensible use-case there, and it hasn't been asked for in any case). Pavel Stehule
* Remove dead InRecovery check.Heikki Linnakangas2014-09-11
| | | | | With the new B-tree incomplete split handling in 9.4, _bt_insert_parent is never called in recovery.
* improve hash creation warning messageBruce Momjian2014-09-11
| | | | | | This improves the wording of commit 84aa8ba128a08e6fdebb2497c7a79ebf18093e12. Report by Kevin Grittner
* Add missing volatile qualifier.Robert Haas2014-09-11
| | | | | Yet another silly mistake in 0709b7ee72e4bc71ad07b7120acd117265ab51d0, again found by buildfarm member castoroides.
* Silence compiler warning on Windows.Heikki Linnakangas2014-09-11
| | | | David Rowley.
* Implement mxid_age() to compute multi-xid ageBruce Momjian2014-09-10
| | | | Report by Josh Berkus
* Issue a warning during the creation of hash indexesBruce Momjian2014-09-10
|
* Pack tuples in a hash join batch densely, to save memory.Heikki Linnakangas2014-09-10
| | | | | | | | | | | | | Instead of palloc'ing each HashJoinTuple individually, allocate 32kB chunks and pack the tuples densely in the chunks. This avoids the AllocChunk header overhead, and the space wasted by standard allocator's habit of rounding sizes up to the nearest power of two. This doesn't contain any planner changes, because the planner's estimate of memory usage ignores the palloc overhead. Now that the overhead is smaller, the planner's estimates are in fact more accurate. Tomas Vondra, reviewed by Robert Haas.
* Preserve AND/OR flatness while extracting restriction OR clauses.Tom Lane2014-09-09
| | | | | | | | | The code I added in commit f343a880d5555faf1dad0286c5632047c8f599ad was careless about preserving AND/OR flatness: it could create a structure with an OR node directly underneath another one. That breaks an assumption that's fairly important for planning efficiency, not to mention triggering various Asserts (as reported by Benjamin Smith). Add a trifle more logic to handle the case properly.
* Change the spinlock primitives to function as compiler barriers.Robert Haas2014-09-09
| | | | | | | | | | | | | | Previously, they functioned as barriers against CPU reordering but not compiler reordering, an odd API that required extensive use of volatile everywhere that spinlocks are used. That's error-prone and has negative implications for performance, so change it. In theory, this makes it safe to remove many of the uses of volatile that we currently have in our code base, but we may find that there are some bugs in this effort when we do. In the long run, though, this should make for much more maintainable code. Patch by me. Review by Andres Freund.
* Add width_bucket(anyelement, anyarray).Tom Lane2014-09-09
| | | | | | | | | | | This provides a convenient method of classifying input values into buckets that are not necessarily equal-width. It works on any sortable data type. The choice of function name is a bit debatable, perhaps, but showing that there's a relationship to the SQL standard's width_bucket() function seems more attractive than the other proposals. Petr Jelinek, reviewed by Pavel Stehule
* Allow empty content in xml typePeter Eisentraut2014-09-09
| | | | | | | | The xml type previously rejected "content" that is empty or consists only of spaces. But the SQL/XML standard allows that, so change that. The accepted values for XML "documents" are not changed. Reviewed-by: Ali Akbar <the.apaan@gmail.com>
* Move ALTER ... ALL IN to ProcessUtilitySlowStephen Frost2014-09-09
| | | | | | | | | | | | | | Now that ALTER TABLE .. ALL IN TABLESPACE has replaced the previous ALTER TABLESPACE approach, it makes sense to move the calls down in to ProcessUtilitySlow where the rest of ALTER TABLE is handled. This also means that event triggers will support ALTER TABLE .. ALL (which was the impetus for the original change, though it has other good qualities also). Álvaro Herrera Back-patch to 9.4 as the original rework was.
* Fix spinlock implementation for some !solaris sparc platforms.Andres Freund2014-09-09
| | | | | | | | | | | | | | | | | Some Sparc CPUs can be run in various coherence models, ranging from RMO (relaxed) over PSO (partial) to TSO (total). Solaris has always run CPUs in TSO mode while in userland, but linux didn't use to and the various *BSDs still don't. Unfortunately the sparc TAS/S_UNLOCK were only correct under TSO. Fix that by adding the necessary memory barrier instructions. On sparcv8+, which should be all relevant CPUs, these are treated as NOPs if the current consistency model doesn't require the barriers. Discussion: 20140630222854.GW26930@awork2.anarazel.de Will be backpatched to all released branches once a few buildfarm cycles haven't shown up problems. As I've no access to sparc, this is blindly written.
* Rename C variables in formatting.c, for clarityBruce Momjian2014-09-05
| | | | | Also add C comments. This should help future debugging of this notorious file.
* Assorted message fixes and improvementsPeter Eisentraut2014-09-05
|
* Fix segmentation fault that an empty prepared statement could cause.Fujii Masao2014-09-05
| | | | | | Back-patch to all supported branches. Per bug #11335 from Haruka Takatsuka
* Issue clearer notice when inherited merged columns are movedBruce Momjian2014-09-03
| | | | | | | CREATE TABLE INHERIT moves user-specified columns to the location of the inherited column. Report by Fatal Majid
* Refactor per-page logic common to all redo routines to a new function.Heikki Linnakangas2014-09-02
| | | | | | | | | | | | Every redo routine uses the same idiom to determine what to do to a page: check if there's a backup block for it, and if not read, the buffer if the block exists, and check its LSN. Refactor that into a common function, XLogReadBufferForRedo, making all the redo routines shorter and more readable. This has no user-visible effect, and makes no changes to the WAL format. Reviewed by Andres Freund, Alvaro Herrera, Michael Paquier.
* Support ALTER SYSTEM RESET command.Fujii Masao2014-09-02
| | | | | | | This patch allows us to execute ALTER SYSTEM RESET command to remove the configuration entry from postgresql.auto.conf. Vik Fearing, reviewed by Amit Kapila and me.