| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
The previous code, perhaps out of concern for avoid memory leaks, formed
the tuple in one memory context and then copied it to another memory
context. However, this doesn't appear to be necessary, since
index_form_tuple and the functions it calls take precautions against
leaking memory. In my testing, building the tuple directly inside the
sort context shaves several percent off the index build time.
Rearrange things so we do that.
Patch by me. Review by Amit Kapila, Tom Lane, Andres Freund.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When reading large amounts of preexisting WAL during logical decoding
using the SQL interface we possibly could fail to check interrupts in
due time. Similarly the same could happen on systems with a very high
WAL volume while creating a new logical replication slot, independent
of the used interface.
Previously these checks where only performed in xlogreader's read_page
callbacks, while waiting for new WAL to be produced. That's not
sufficient though, if there's never a need to wait. Walsender's send
loop already contains a interrupt check.
Backpatch to 9.4 where the logical decoding feature was introduced.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The assertion failed if WAL_DEBUG or LWLOCK_STATS was enabled; fix that by
using separate memory contexts for the allocations made within those code
blocks.
This patch introduces a mechanism for marking any memory context as allowed
in a critical section. Previously ErrorContext was exempt as a special case.
Instead of a blanket exception of the checkpointer process, only exempt the
memory context used for the pending ops hash table.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "false" case was really quite useless since all it did was to throw
an error; a definition not helped in the least by making it the default.
Instead let's just have the "true" case, which emits nested objects and
arrays in JSON syntax. We might later want to provide the ability to
emit sub-objects in Postgres record or array syntax, but we'd be best off
to drive that off a check of the target field datatype, not a separate
argument.
For the functions newly added in 9.4, we can just remove the flag arguments
outright. We can't do that for json_populate_record[set], which already
existed in 9.3, but we can ignore the argument and always behave as if it
were "true". It helps that the flag arguments were optional and not
documented in any useful fashion anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When running several postgres clusters on one OS instance it's often
inconveniently hard to identify which "postgres" process belongs to
which postgres instance.
Add the cluster_name GUC, whose value will be included as part of the
process titles if set. With that processes can more easily identified
using tools like 'ps'.
To avoid problems with encoding mismatches between postgresql.conf,
consoles, and individual databases replace non-ASCII chars in the name
with question marks. The length is limited to NAMEDATALEN to make it
less likely to truncate important information at the end of the
status.
Thomas Munro, with some adjustments by me and review by a host of people.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Support for running postgres on Alpha hasn't been tested for a long
while. Due to Alpha's uniquely lax cache coherency model it's a hard
to develop for platform (especially blindly!) and thought to be
unlikely to currently work correctly.
As Alpha is the only supported architecture for Tru64 drop support for
it as well. Tru64's support has ended 2012 and it has been in
maintenance-only mode for much longer.
Also remove stray references to __ksr__ and ultrix defines.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can allow this even without any specific knowledge of the semantics
of the window function, so long as pushed-down quals will either accept
every row in a given window partition, or reject every such row. Because
window functions act only within a partition, such a case can't result
in changing the window functions' outputs for any surviving row.
Eliminating entire partitions in this way obviously can reduce the cost
of the window-function computations substantially.
The fly in the ointment is that it's hard to be entirely sure whether
this is true for an arbitrary qual condition. This patch allows pushdown
if (a) the qual references only partitioning columns, and (b) the qual
contains no volatile functions. We are at risk of incorrect results if
the qual can produce different answers for values that the partitioning
equality operator sees as equal. While it's not hard to invent cases
for which that can happen, it seems to seldom be a problem in practice,
since no one has complained about a similar assumption that we've had
for many years with respect to DISTINCT. The potential performance
gains seem to be worth the risk.
David Rowley, reviewed by Vik Fearing; some credit is due also to
Thomas Mayer who did considerable preliminary investigation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of truncating pg_multixact at vacuum time, do it only at
checkpoint time. The reason for doing it this way is twofold: first, we
want it to delete only segments that we're certain will not be required
if there's a crash immediately after the removal; and second, we want to
do it relatively often so that older files are not left behind if
there's an untimely crash.
Per my proposal in
http://www.postgresql.org/message-id/20140626044519.GJ7340@eldon.alvh.no-ip.org
we now execute the truncation in the checkpointer process rather than as
part of vacuum. Vacuum is in only charge of maintaining in shared
memory the value to which it's possible to truncate the files; that
value is stored as part of checkpoints also, and so upon recovery we can
reuse the same value to re-execute truncate and reset the
oldest-value-still-safe-to-use to one known to remain after truncation.
Per bug reported by Jeff Janes in the course of his tests involving
bug #8673.
While at it, update some comments that hadn't been updated since
multixacts were changed.
Backpatch to 9.3, where persistency of pg_multixact files was
introduced by commit 0ac5ad5134f2.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were allowing a table's pg_class.relminmxid value to move backwards
when heaps were swapped by VACUUM FULL or CLUSTER. There is a
similar protection against relfrozenxid going backwards, which we
neglected to clone when the multixact stuff was rejiggered by commit
0ac5ad5134f276.
Backpatch to 9.3, where relminmxid was introduced.
As reported by Heikki in
http://www.postgresql.org/message-id/52401AEA.9000608@vmware.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't assert MultiXactIdIsRunning if the multi came from a tuple that
had been share-locked and later copied over to the new cluster by
pg_upgrade. Doing that causes an error to be raised unnecessarily:
MultiXactIdIsRunning is not open to the possibility that its argument
came from a pg_upgraded tuple, and all its other callers are already
checking; but such multis cannot, obviously, have transactions still
running, so the assert is pointless.
Noticed while investigating the bogus pg_multixact/offsets/0000 file
left over by pg_upgrade, as reported by Andres Freund in
http://www.postgresql.org/message-id/20140530121631.GE25431@alap3.anarazel.de
Backpatch to 9.3, as the commit that introduced the buglet.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A WHERE clause applied to the output of a subquery with DISTINCT should
theoretically be applied only once per distinct row; but if we push it
into the subquery then it will be evaluated at each row before duplicate
elimination occurs. If the qual is volatile this can give rise to
observably wrong results, so don't do that.
While at it, refactor a little bit to allow subquery_is_pushdown_safe
to report more than one kind of restrictive condition without indefinitely
expanding its argument list.
Although this is a bug fix, it seems unwise to back-patch it into released
branches, since it might de-optimize plans for queries that aren't giving
any trouble in practice. So apply to 9.4 but not further back.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I noticed that the functions in jsonfuncs.c sometimes printed error
messages that claimed I'd called some other function. Investigation showed
that this was from repurposing code into "worker" functions without taking
much care as to whether it would mention the right SQL-level function if it
threw an error. Moreover, there was a weird mismash of messages that
contained a fixed function name, messages that used %s for a function name,
and messages that constructed a function name out of spare parts, like
"json%s_populate_record" (which, quite aside from being ugly as sin, wasn't
even sufficient to cover all the cases). This would put an undue burden on
our long-suffering translators. Standardize on inserting the SQL function
name with %s so as to reduce the number of translatable strings, and pass
function names around as needed to make sure we can report the right one.
Fix up some gratuitous variations in wording, too.
|
|
|
|
|
|
| |
Re-pgindent, remove a lot of random vertical whitespace, remove useless
(if not counterproductive) inline markings, get rid of unnecessary
zero-padding of strings for hashtable searches. No functional changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
populate_recordset_object_start() improperly created a new hash table
(overwriting the link to the existing one) if called at nest levels
greater than one. This resulted in previous fields not appearing in
the final output, as reported by Matti Hameister in bug #10728.
In 9.4 the problem also affects json_to_recordset.
This perhaps missed detection earlier because the default behavior is to
throw an error for nested objects: you have to pass use_json_as_text = true
to see the problem.
In addition, fix query-lifespan leakage of the hashtable created by
json_populate_record(). This is pretty much the same problem recently
fixed in dblink: creating an intended-to-be-temporary context underneath
the executor's per-tuple context isn't enough to make it go away at the
end of the tuple cycle, because MemoryContextReset is not
MemoryContextResetAndDeleteChildren.
Michael Paquier and Tom Lane
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The syntax doesn't let you specify "WITH OIDS" for foreign tables, but it
was still possible with default_with_oids=true. But the rest of the system,
including pg_dump, isn't prepared to handle foreign tables with OIDs
properly.
Backpatch down to 9.1, where foreign tables were introduced. It's possible
that there are databases out there that already have foreign tables with
OIDs. There isn't much we can do about that, but at least we can prevent
them from being created in the future.
Patch by Etsuro Fujita, reviewed by Hadi Moshayedi.
|
|
|
|
|
|
|
|
| |
Normally, this won't matter too much; but if I/O is really slow, for
example because the system is overloaded, we might write many pages
before checking for interrupts. A single toast insertion might
write up to 1GB of data, and a multi-insert could write hundreds
of tuples (and their corresponding TOAST data).
|
|
|
|
|
| |
The record header was not copied correctly to the buffer that was passed
to the rm_desc function. Broken by my rm_desc signature refactoring patch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The catcache code is effectively assuming this already, so let's insist
that the catalog and index are actually declared that way.
Having done that, the comments in indexing.h about non-unique indexes
not being used for catcaches are completely redundant not just mostly so;
and we didn't have such a comment for every such index anyway. So let's
get rid of them.
Per discussion of whether we should identify primary keys for catalogs.
We might or might not take that further step, but this change in itself
will allow quicker detection of misdeclared catcaches, so it seems worth
doing in any case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since fdf9e21196a lazy_vacuum_page() rechecks the all-visible status
of pages in the second pass over the heap. It does so inside a
critical section, but both visibilitymap_test() and
heap_page_is_all_visible() perform operations that should not happen
inside one. The former potentially performs IO and both potentially do
memory allocations.
To fix, simply move all the all-visible handling outside the critical
section. Doing so means that the PD_ALL_VISIBLE on the page won't be
included in the full page image of the HEAP2_CLEAN record anymore. But
that's fine, the flag will be set by the HEAP2_VISIBLE logged later.
Backpatch to 9.3 where the problem was introduced. The bug only came
to light due to the assertion added in 4a170ee9 and isn't likely to
cause problems in production scenarios. The worst outcome is a
avoidable PANIC restart.
This also gets rid of the difference in the order of operations
between master and standby mentioned in 2a8e1ac5.
Per reports from David Leverton and Keith Fiske in bug #10533.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The existance of the assert_enabled variable (backing the
debug_assertions GUC) reduced the amount of knowledge some static code
checkers (like coverity and various compilers) could infer from the
existance of the assertion. That could have been solved by optionally
removing the assertion_enabled variable from the Assert() et al macros
at compile time when some special macro is defined, but the resulting
complication doesn't seem to be worth the gain from having
debug_assertions. Recompiling is fast enough.
The debug_assertions GUC is still available, but readonly, as it's
useful when diagnosing problems. The commandline/client startup option
-A, which previously also allowed to enable/disable assertions, has
been removed as it doesn't serve a purpose anymore.
While at it, reduce code duplication in bufmgr.c and localbuf.c
assertions checking for spurious buffer pins. That code had to be
reindented anyway to cope with the assert_enabled removal.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ExecMakeTableFunctionResult evaluated the arguments for a function-in-FROM
in the query-lifespan memory context. This is insignificant in simple
cases where the function relation is scanned only once; but if the function
is in a sub-SELECT or is on the inside of a nested loop, any memory
consumed during argument evaluation can add up quickly. (The potential for
trouble here had been foreseen long ago, per existing comments; but we'd
not previously seen a complaint from the field about it.) To fix, create
an additional temporary context just for this purpose.
Per an example from MauMau. Back-patch to all active branches.
|
|
|
|
|
|
|
|
|
|
| |
data_directory could be set both in postgresql.conf and postgresql.auto.conf so far.
This could cause some problematic situations like circular definition. To avoid such
situations, this commit forbids a user to set data_directory in postgresql.auto.conf.
Backpatch this to 9.4 where ALTER SYSTEM command was introduced.
Amit Kapila, reviewed by Abhijit Menon-Sen, with minor adjustments by me.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Arrange for postmaster child processes to respond to two environment
variables, PG_OOM_ADJUST_FILE and PG_OOM_ADJUST_VALUE, to determine whether
they reset their OOM score adjustments and if so to what. This is superior
to the previous design involving #ifdef's in several ways. The behavior is
now available in a default build, and both ends of the adjustment --- the
original adjustment of the postmaster's level and the subsequent
readjustment by child processes --- can now be controlled in one place,
namely the postmaster launch script. So it's no longer necessary for the
launch script to act on faith that the server was compiled with the
appropriate options. In addition, if someone wants to use an OOM score
other than zero for the child processes, that doesn't take a recompile
anymore; and we no longer have to cater separately to the two different
historical kernel APIs for this adjustment.
Gurjeet Singh, somewhat revised by me
|
|
|
|
|
|
|
| |
The check was confusing and is a condition that should never in fact
happen.
Per gripe from Dmitry Dolgov.
|
|
|
|
| |
Seems to have been introduced in 1a3458b6d8d202715a83c88474a1b63726d0929e.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This SQL-standard feature allows a sub-SELECT yielding multiple columns
(but only one row) to be used to compute the new values of several columns
to be updated. While the same results can be had with an independent
sub-SELECT per column, such a workaround can require a great deal of
duplicated computation.
The standard actually says that the source for a multi-column assignment
could be any row-valued expression. The implementation used here is
tightly tied to our existing sub-SELECT support and can't handle other
cases; the Bison grammar would have some issues with them too. However,
I don't feel too bad about this since other cases can be converted into
sub-SELECTs. For instance, "SET (a,b,c) = row_valued_function(x)" could
be written "SET (a,b,c) = (SELECT * FROM row_valued_function(x))".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since most of the system thinks AND and OR are N-argument expressions
anyway, let's have the grammar generate a representation of that form when
dealing with input like "x AND y AND z AND ...", rather than generating
a deeply-nested binary tree that just has to be flattened later by the
planner. This avoids stack overflow in parse analysis when dealing with
queries having more than a few thousand such clauses; and in any case it
removes some rather unsightly inconsistencies, since some parts of parse
analysis were generating N-argument ANDs/ORs already.
It's still possible to get a stack overflow with weirdly parenthesized
input, such as "x AND (y AND (z AND ( ... )))", but such cases are not
mainstream usage. The maximum depth of parenthesization is already
limited by Bison's stack in such cases, anyway, so that the limit is
probably fairly platform-independent.
Patch originally by Gurjeet Singh, heavily revised by me
|
|
|
|
| |
Just feels more natural, and is more consistent with rm_redo.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have for a long time been able to prove implications and refutations
between clauses structured like "expr op const" with the same subexpression
and btree-related operators; for example that "x < 4" implies "x <= 5".
The implication machinery is needed to detect usability of partial indexes,
and the refutation machinery is needed to implement constraint exclusion.
This patch extends that machinery to make proofs for operator expressions
involving the same two immutable-but-not-necessarily-just-Const input
expressions, ie does "expr1 op1 expr2" prove or refute "expr1 op2 expr2" or
"expr2 op2 expr1"? An important example is that we can now prove "x = y"
given "y = x", which formerly the code could not deduce unless x or y was a
constant. We can make use of the system's knowledge of operator commutator
and negator pairs, and can also make use of btree opclass relationships,
for example "x < y" implies "x <= y" and refutes "x > y" (notice that
neither of these could be proven just from commutator or negator links).
Inspired by a gripe from Brian Dunavant. This seems more like a new
feature than a bug fix, though, so no back-patch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We should report the errno when we get a failure from functions like
BufFileWrite. "ERROR: write failed" is unreasonably taciturn for a
case that's well within the realm of possibility; I've seen it a
couple times in the buildfarm recently, in situations that were
probably out-of-disk-space, but it'd be good to see the errno
to confirm it.
I think this code was originally written without assuming that
the buffile.c functions would return useful errno; but most other
callers *are* assuming that, and a quick look at the buffile code
gives no reason to suppose otherwise.
Also, a couple of the old messages were phrased on the assumption
that a short read might indicate a logic bug in tuplestore itself;
but that code's pretty well tested by now, so a filesystem-level
problem seems much more likely.
|
|
|
|
|
| |
I thought I could get away with hardcoded int4 here, but the buildfarm
says differently.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous naming broke the query that libpq's lo_initialize() uses
to collect the OIDs of the server-side functions it requires, because
that query effectively assumes that there is only one function named
lo_create in the pg_catalog schema (and likewise only one lo_open, etc).
While we should certainly make libpq more robust about this, the naive
query will remain in use in the field for the foreseeable future, so it
seems the only workable choice is to use a different name for the new
function. lo_from_bytea() won a small straw poll.
Back-patch into 9.4 where the new function was introduced.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a sub-select-in-FROM gets flattened into the upper query, then we
naturally get rid of any output columns that are defined in the sub-select
text but not actually used in the upper query. However, this doesn't
happen when it's not possible to flatten the subquery, for example because
it contains GROUP BY, LIMIT, etc. Allowing the subquery to compute useless
output columns is often fairly harmless, but sometimes it has significant
performance cost: the unused output might be an expensive expression,
or it might be a Var from a relation that we could remove entirely (via
the join-removal logic) if only we realized that we didn't really need
that Var. Situations like this are common when expanding views, so it
seems worth taking the trouble to detect and remove unused outputs.
Because the upper query's Var numbering for subquery references depends on
positions in the subquery targetlist, we don't want to renumber the items
we leave behind. Instead, we can implement "removal" by replacing the
unwanted expressions with simple NULL constants. This wastes a few cycles
at runtime, but not enough to justify more work in the planner.
|
|
|
|
|
|
|
|
| |
Change the order of checks in similar functions to be the same; remove
a parameter that's not needed anymore; rename a memory context and
expand a couple of comments.
Per review comments from Amit Kapila
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we grabbed this file off the Snowball project's website, we mistakenly
supposed that it was in LATIN1 encoding, but evidently it was actually in
LATIN2. This resulted in ő (o-double-acute, U+0151, which is code 0xF5 in
LATIN2) being misconverted into õ (o-tilde, U+00F5), as complained of in
bug #10589 from Zoltán Sörös. We'd have messed up u-double-acute too,
but there aren't any of those in the file. Other characters used in the
file have the same codes in LATIN1 and LATIN2, which no doubt helped hide
the problem for so long.
The error is not only ours: the Snowball project also was confused about
which encoding is required for Hungarian. But dealing with that will
require source-code changes that I'm not at all sure we'll wish to
back-patch. Fixing the stopword file seems reasonably safe to back-patch
however.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the code used a node label of zero both for strings that
contain no bytes beyond the inner tuple's prefix, and for cases where an
"allTheSame" inner tuple has to be split to allow a string with a different
next byte to be inserted into it. Failing to distinguish these cases meant
that if a string ending with the current prefix needed to be inserted into
an allTheSame tuple, we got into an infinite loop, because after splitting
the tuple we'd descend into the child allTheSame tuple and then find we
need to split again.
To fix, instead use -1 and -2 as the node labels for these two cases.
This requires widening the node label type from "char" to int2, but
fortunately SPGiST stores all pass-by-value node label types in their
Datum representation, which means that this change is transparently upward
compatible so far as the on-disk representation goes. We continue to
recognize zero as a dummy node label for reading purposes, but will not
attempt to push new index entries down into such a label, so that the loop
won't occur even when dealing with an existing index.
Per report from Teodor Sigaev. Back-patch to 9.2 where the faulty
code was introduced.
|
|
|
|
|
|
|
|
| |
In a50d97625497b7 I already changed this, but got it wrong for the case
where the number of members is larger than the number of entries that
fit in the last page of the last segment.
As reported by Serge Negodyuck in a followup to bug #8673.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A ReorderBufferTransaction's end_lsn, the sentPtr advocated by
walsender keepalive messages, and the end location remembered by the
decoding get_*changes* SQL functions all use the location of the last
read record + 1. I.e. the LSN points to the beginning of the next
record. That cannot realistically be changed without changing the
replication protocol because that's how keepalive messages have worked
since 9.0.
The bug is that the logic inside the snapshot builder, which decides
whether a transaction's contents should be decoded, assumed the start
location would point towards the last byte of the last record. The
reason this didn't actually cause visible problems is that currently
that decision is only made for commit records. Since interesting
transactions always have at least one additional record - containing
actual data - we'd never skip a transaction.
But if there ever were transactions, or other events, with just one
record containing important information, we'd skip them after stopping
and restarting logical decoding.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's critical that the backend's idea of LOBLKSIZE match the way data has
actually been divided up in pg_largeobject. While we don't provide any
direct way to adjust that value, doing so is a one-line source code change
and various people have expressed interest recently in changing it. So,
just as with TOAST_MAX_CHUNK_SIZE, it seems prudent to record the value in
pg_control and cross-check that the backend's compiled-in setting matches
the on-disk data.
Also tweak the code in inv_api.c so that fetches from pg_largeobject
explicitly verify that the length of the data field is not more than
LOBLKSIZE. Formerly we just had Asserts() for that, which is no protection
at all in production builds. In some of the call sites an overlength data
value would translate directly to a security-relevant stack clobber, so it
seems worth one extra runtime comparison to be sure.
In the back branches, we can't change the contents of pg_control; but we
can still make the extra checks in inv_api.c, which will offer some amount
of protection against running with the wrong value of LOBLKSIZE.
|
|
|
|
|
|
|
|
|
|
|
| |
Previously there's been a mix between 'slotname' and 'slot_name'. It's
not nice to be unneccessarily inconsistent in a new feature. As a post
beta1 initdb now is required in the wake of eeca4cd35e, fix the
inconsistencies.
Most the changes won't affect usage of replication slots because the
majority of changes is around function parameter names. The prominent
exception to that is that the recovery.conf parameter
'primary_slotname' is now named 'primary_slot_name'.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The way the code was written, the padding was copied from uninitialized
memory areas.. Because the structs are local variables in the code where
the WAL records are constructed, making them larger and zeroing the padding
bytes would not make the code very pretty, so rather than fixing this
directly by zeroing out the padding bytes, it seems more clear to not try to
align the tuples in the WAL records. The redo functions are taught to copy
the tuple header to a local variable to avoid unaligned access.
Stable-branches have the same problem, but we can't change the WAL format
there, so fix in master only. Reading a few random extra bytes at the stack
is harmless in practice, so it's not worth crafting a different
back-patchable fix.
Per reports from Kevin Grittner and Andres Freund, using clang static
analyzer and Valgrind, respectively.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is needed to allow ORDER BY, DISTINCT, etc to work as expected for
pg_lsn values.
We had previously decided to put this off for 9.5, but in view of commit
eeca4cd35e284c72b2ea1b4494e64e7738896e81 there's no reason to avoid a
catversion bump for 9.4beta2, and this does make a pretty significant
usability difference for pg_lsn.
Michael Paquier, with fixes from Andres Freund and Tom Lane
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
HeapTupleSatisfiesVacuum() didn't properly discern between
DELETE_IN_PROGRESS and INSERT_IN_PROGRESS for rows that have been
inserted in the current transaction and deleted in a aborted
subtransaction of the current backend. At the very least that caused
problems for CLUSTER and CREATE INDEX in transactions that had
aborting subtransactions producing rows, leading to warnings like:
WARNING: concurrent delete in progress within table "..."
possibly in an endless, uninterruptible, loop.
Instead of treating *InProgress xmins the same as *IsCurrent ones,
treat them as being distinct like the other visibility routines. As
implemented this separatation can cause a behaviour change for rows
that have been inserted and deleted in another, still running,
transaction. HTSV will now return INSERT_IN_PROGRESS instead of
DELETE_IN_PROGRESS for those. That's both, more in line with the other
visibility routines and arguably more correct. The latter because a
INSERT_IN_PROGRESS will make callers look at/wait for xmin, instead of
xmax.
The only current caller where that's possibly worse than the old
behaviour is heap_prune_chain() which now won't mark the page as
prunable if a row has concurrently been inserted and deleted. That's
harmless enough.
As a cautionary measure also insert a interrupt check before the gotos
in IndexBuildHeapScan() that lead to the uninterruptible loop. There
are other possible causes, like a row that several sessions try to
update and all fail, for repeated loops and the cost of doing so in
the retry case is low.
As this bug goes back all the way to the introduction of
subtransactions in 573a71a5da backpatch to all supported releases.
Reported-By: Sandro Santilli
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shutdown.
187492b6c2e8cafc5b39063ca3b67846e8155d24 changed pgstat.c so that
the stats files were saved into $PGDATA/pg_stat directory when the server
was shutdowned. But it accidentally forgot to change the location of
pg_stat_statements permanent stats file. This commit fixes pg_stat_statements
so that its stats file is also saved into $PGDATA/pg_stat at shutdown.
Since this fix changes the file layout, we don't back-patch it to 9.3
where this oversight was introduced.
|
|
|
|
|
|
|
|
|
|
| |
Per gripe from Peter Eisentraut and Tom Lane.
The output is slightly different, but still ISO 8601 compliant: to_char
doesn't output the minutes when time zone offset is an integer number of
hours, while EncodeDateTime outputs ":00".
The code is slightly adapted from code in xml.c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, any backslash in text being escaped for JSON was doubled so
that the result was still valid JSON. However, this led to some perverse
results in the case of Unicode sequences, These are now detected and the
initial backslash is no longer escaped. All other backslashes are
still escaped. No validity check is performed, all that is looked for is
\uXXXX where X is a hexidecimal digit.
This is a change from the 9.2 and 9.3 behaviour as noted in the Release
notes.
Per complaint from Teodor Sigaev.
|
|
|
|
|
|
|
|
|
|
|
| |
Many JSON processors require timestamp strings in ISO 8601 format in
order to convert the strings. When converting a timestamp, with or
without timezone, to a JSON datum we therefore now use such a format
rather than the type's default text output, in functions such as
to_json().
This is a change in behaviour from 9.2 and 9.3, as noted in the release
notes.
|