| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
|
|
|
|
|
|
| |
It's not ready yet, revert two commits
690c543550b0d2852060c18d270cdb534d339d9a - unstable test output
386e3d7609c49505e079c40c65919d99feb82505 - patch itself
|
|
|
|
|
|
|
|
|
|
| |
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.
Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a process is waiting for a heavyweight lock, we will now indicate
the type of heavyweight lock for which it is waiting. Also, you can
now see when a process is waiting for a lightweight lock - in which
case we will indicate the individual lock name or the tranche, as
appropriate - or for a buffer pin.
Amit Kapila, Ildus Kurbangaliev, reviewed by me. Lots of helpful
discussion and suggestions by many others, including Alexander
Korotkov, Vladimir Borodin, and many others.
|
|
|
|
|
|
|
| |
overriden -> overridden
The misspelling in create_extension.sgml was introduced in b67aaf2,
so no need to backpatch.
|
|
|
|
| |
Backpatch certain files through 9.1
|
| |
|
|
|
|
|
|
| |
Catalog version bumped
Kyotaro HORIGUCHI
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new type has the scope of whole the database cluster so it doesn't
behave the same as the existing OID alias types which have database
scope,
concerning object dependency. To avoid confusion constants of the new
type are prohibited from appearing where dependencies are made involving
it.
Also, add a note to the docs about possible MVCC violation and
optimization issues, which are general over the all reg* types.
Kyotaro Horiguchi
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Multixact member files are subject to early wraparound overflow and
removal: if the average multixact size is above a certain threshold (see
note below) the protections against offset overflow are not enough:
during multixact truncation at checkpoint time, some
pg_multixact/members files would be removed because the server considers
them to be old and not needed anymore. This leads to loss of files that
are critical to interpret existing tuples's Xmax values.
To protect against this, since we don't have enough info in pg_control
and we can't modify it in old branches, we maintain shared memory state
about the oldest value that we need to keep; we use this during new
multixact creation to abort if an old still-needed file would get
overwritten. This value is kept up to date by checkpoints, which makes
it not completely accurate but should be good enough. We start emitting
warnings sometime earlier, so that the eventual multixact-shutdown
doesn't take DBAs completely by surprise (more precisely: once 20
members SLRU segments are remaining before shutdown.)
On troublesome average multixact size: The threshold size depends on the
multixact freeze parameters. The oldest age is related to the greater of
multixact_freeze_table_age and multixact_freeze_min_age: anything
older than that should be removed promptly by autovacuum. If autovacuum
is keeping up with multixact freezing, the troublesome multixact average
size is
(2^32-1) / Max(freeze table age, freeze min age)
or around 28 members per multixact. Having an average multixact size
larger than that will eventually cause new multixact data to overwrite
the data area for older multixacts. (If autovacuum is not able to keep
up, or there are errors in vacuuming, the actual maximum is
multixact_freeeze_max_age instead, at which point multixact generation
is stopped completely. The default value for this limit is 400 million,
which means that the multixact size that would cause trouble is about 10
members).
Initial bug report by Timothy Garnett, bug #12990
Backpatch to 9.3, where the problem was introduced.
Authors: Álvaro Herrera, Thomas Munro
Reviews: Thomas Munro, Amit Kapila, Robert Haas, Kevin Grittner
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bootstrap determines whether a column is null based on simple builtin
rules. Those work surprisingly well, but nonetheless a few existing
columns aren't set correctly. Additionally there is at least one patch
sent to hackers where forcing the nullness of a column would be helpful.
The boostrap format has gained FORCE [NOT] NULL for this, which will be
emitted by genbki.pl when BKI_FORCE_(NOT_)?NULL is specified for a
column in a catalog header.
This patch doesn't change the marking of any existing columns.
Discussion: 20150215170014.GE15326@awork2.anarazel.de
|
|
|
|
|
|
| |
Sometimes it's useful for a background worker to be able to initialize
its database connection by OID rather than by name, so provide a way
to do that.
|
|
|
|
|
|
|
|
|
| |
Since commit 626eb021988a2 has introduced the auxiliary process
infrastructure, bootstrap_signals() was never used when forked from
postmaster.
Remove the IsUnderPostmaster specific code, and add a appropriate
assertion.
|
|
|
|
|
|
|
|
|
| |
Move common code, that was duplicated in every postmaster child/every
standalone process, into two functions in miscinit.c. Not only does
that already result in a fair amount of net code reduction but it also
makes it much easier to remove more duplication in the future. The
prime motivation wasn't code deduplication though, but easier addition
of new common code.
|
|
|
|
| |
Backpatch certain files through 9.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MapArrayTypeName would copy up to NAMEDATALEN-1 bytes of the base type
name, which of course is wrong: after prepending '_' there is only room for
NAMEDATALEN-2 bytes. Aside from being the wrong result, this case would
lead to overrunning the statically allocated work buffer. This would be a
security bug if the function were ever used outside bootstrap mode, but it
isn't, at least not in any currently supported branches.
Aside from fixing the off-by-one loop logic, this patch gets rid of the
static work buffer by having MapArrayTypeName pstrdup its result; the sole
caller was already doing that, so this just requires moving the pstrdup
call. This saves a few bytes but mainly it makes the API a lot cleaner.
Back-patch on the off chance that there is some third-party code using
MapArrayTypeName with less-secure input. Pushing pstrdup into the function
should not cause any serious problems for such hypothetical code; at worst
there might be a short term memory leak.
Per Coverity scanning.
|
|
|
|
|
| |
This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to have externs for getopt() and its API variables scattered
all over the place. Now that we find we're going to need to tweak the
variable declarations for Cygwin, it seems like a good idea to have
just one place to tweak.
In this commit, the variables are declared "#ifndef HAVE_GETOPT_H".
That may or may not work everywhere, but we'll soon find out.
Andres Freund
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Per reports from Andres Freund and Luke Campbell, a server failure during
set_pglocale_pgservice results in a segfault rather than a useful error
message, because the infrastructure needed to use ereport hasn't been
initialized; specifically, MemoryContextInit hasn't been called.
One known cause of this is starting the server in a directory it
doesn't have permission to read.
We could try to prevent set_pglocale_pgservice from using anything that
depends on palloc or elog, but that would be messy, and the odds of future
breakage seem high. Moreover there are other things being called in main.c
that look likely to use palloc or elog too --- perhaps those things
shouldn't be there, but they are there today. The best solution seems to
be to move the call of MemoryContextInit to very early in the backend's
real main() function. I've verified that an elog or ereport occurring
immediately after that is now capable of sending something useful to
stderr.
I also added code to elog.c to print something intelligible rather than
just crashing if MemoryContextInit hasn't created the ErrorContext.
This could happen if MemoryContextInit itself fails (due to malloc
failure), and provides some future-proofing against someone trying to
sneak in new code even earlier in server startup.
Back-patch to all supported branches. Since we've only heard reports of
this type of failure recently, it may be that some recent change has made
it more likely to see a crash of this kind; but it sure looks like it's
broken all the way back.
|
|
|
|
|
| |
Update all files in head, and files COPYRIGHT and legal.sgml in all back
branches.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just as backends must clean up their shared memory state (releasing
lwlocks, buffer pins, etc.) before exiting, they must also perform
any similar cleanups related to dynamic shared memory segments they
have mapped before unmapping those segments. So add a mechanism to
ensure that.
Existing on_shmem_exit hooks include both "user level" cleanup such
as transaction abort and removal of leftover temporary relations and
also "low level" cleanup that forcibly released leftover shared
memory resources. On-detach callbacks should run after the first
group but before the second group, so create a new before_shmem_exit
function for registering the early callbacks and keep on_shmem_exit
for the regular callbacks. (An earlier draft of this patch added an
additional argument to on_shmem_exit, but that had a much larger
footprint and probably a substantially higher risk of breaking third
party code for no real gain.)
Patch by me, reviewed by KaiGai Kohei and Andres Freund.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Historically, printtup() has assumed that it could prevent memory leakage
by pfree'ing the string result of each output function and manually
managing detoasting of toasted values. This amounts to assuming that
datatype output functions never leak any memory internally; an assumption
we've already decided to be bogus elsewhere, for example in COPY OUT.
range_out in particular is known to leak multiple kilobytes per call, as
noted in bug #8573 from Godfried Vanluffelen. While we could go in and fix
that leak, it wouldn't be very notationally convenient, and in any case
there have been and undoubtedly will again be other leaks in other output
functions. So what seems like the best solution is to run the output
functions in a temporary memory context that can be reset after each row,
as we're doing in COPY OUT. Some quick experimentation suggests this is
actually a tad faster than the retail pfree's anyway.
This patch fixes all the variants of printtup, except for debugtup()
which is used in standalone mode. It doesn't seem worth worrying
about query-lifespan leaks in standalone mode, and fixing that case
would be a bit tedious since debugtup() doesn't currently have any
startup or shutdown functions.
While at it, remove manual detoast management from several other
output-function call sites that had copied it from printtup(). This
doesn't make a lot of difference right now, but in view of recent
discussions about supporting "non-flattened" Datums, we're going to
want that code gone eventually anyway.
Back-patch to 9.2 where range_out was introduced. We might eventually
decide to back-patch this further, but in the absence of known major
leaks in older output functions, I'll refrain for now.
|
|
|
|
|
|
|
|
|
| |
Add asprintf(), pg_asprintf(), and psprintf() to simplify string
allocation and composition. Replacement implementations taken from
NetBSD.
Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Asif Naeem <anaeem.it@gmail.com>
|
| |
|
| |
|
|
|
|
|
| |
Previously set_default_effective_cache_size() could not handle fork,
non-fork, and bootstrap cases.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
|
|
|
|
|
| |
This is the first run of the Perl-based pgindent script. Also update
pgindent instructions.
|
|
|
|
|
|
| |
The value is not used anywhere in code, but will
allow future changes to the checksum version
should that become necessary in the future.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Checksums are set immediately prior to flush out of shared buffers
and checked when pages are read in again. Hint bit setting will
require full page write when block is dirtied, which causes various
infrastructure changes. Extensive comments, docs and README.
WARNING message thrown if checksum fails on non-all zeroes page;
ERROR thrown but can be disabled with ignore_checksum_failure = on.
Feature enabled by an initdb option, since transition from option off
to option on is long and complex and has not yet been implemented.
Default is not to use checksums.
Checksum used is WAL CRC-32 truncated to 16-bits.
Simon Riggs, Jeff Davis, Greg Smith
Wide input and assistance from many community members. Thank you.
|
| |
|
|
|
|
|
| |
Fully update git head, and update back branches in ./COPYRIGHT and
legal.sgml files.
|
|
|
|
| |
rather than just storing a pointer.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reduces unnecessary exposure of other headers through htup.h, which
is very widely included by many files.
I have chosen to move the function prototypes to the new file as well,
because that means htup.h no longer needs to include tupdesc.h. In
itself this doesn't have much effect in indirect inclusion of tupdesc.h
throughout the tree, because it's also required by execnodes.h; but it's
something to explore in the future, and it seemed best to do the htup.h
change now while I'm busy with it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mdinit() was misusing IsBootstrapProcessingMode() to decide whether to
create an fsync pending-operations table in the current process. This led
to creating a table not only in the startup and checkpointer processes as
intended, but also in the bgwriter process, not to mention other auxiliary
processes such as walwriter and walreceiver. Creation of the table in the
bgwriter is fatal, because it absorbs fsync requests that should have gone
to the checkpointer; instead they just sit in bgwriter local memory and are
never acted on. So writes performed by the bgwriter were not being fsync'd
which could result in data loss after an OS crash. I think there is no
live bug with respect to walwriter and walreceiver because those never
perform any writes of shared buffers; but the potential is there for
future breakage in those processes too.
To fix, make AuxiliaryProcessMain() export the current process's
AuxProcType as a global variable, and then make mdinit() test directly for
the types of aux process that should have a pendingOpsTable. Having done
that, we might as well also get rid of the random bool flags such as
am_walreceiver that some of the aux processes had grown. (Note that we
could not have fixed the bug by examining those variables in mdinit(),
because it's called from BaseInit() which is run by AuxiliaryProcessMain()
before entering any of the process-type-specific code.)
Back-patch to 9.2, where the problem was introduced by the split-up of
bgwriter and checkpointer processes. The bogus pendingOpsTable exists
in walwriter and walreceiver processes in earlier branches, but absent
any evidence that it causes actual problems there, I'll leave the older
branches alone.
|
| |
|
|
|
|
|
| |
Startup process now has its own dedicated file, just like all other
special/background processes. Reduces role and size of xlog.c
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bgwriter is now a much less important process, responsible for page
cleaning duties only. checkpointer is now responsible for checkpoints
and so has a key role in shutdown. Later patches will correct doc
references to the now old idea that bgwriter performs checkpoints.
Has beneficial effect on performance at high write rates, but mainly
refactoring to more easily allow changes for power reduction by
simplifying previously tortuous code around required to allow page
cleaning and checkpointing to time slice in the same process.
Patch by me, Review by Dickson Guedes
|
|
|
|
|
|
|
|
|
|
|
| |
We were doing some amazingly complicated things in order to avoid running
the very expensive identify_system_timezone() procedure during GUC
initialization. But there is an obvious fix for that, which is to do it
once during initdb and have initdb install the system-specific default into
postgresql.conf, as it already does for most other GUC variables that need
system-environment-dependent defaults. This means that the timezone (and
log_timezone) settings no longer have any magic behavior in the server.
Per discussion.
|
|
|
|
| |
Found by gcc -Wlogical-op
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
walsender.h should depend on xlog.h, not vice versa. (Actually, the
inclusion was circular until a couple hours ago, which was even sillier;
but Bruce broke it in the expedient rather than logically correct
direction.) Because of that poor decision, plus blind application of
pgrminclude, we had a situation where half the system was depending on
xlog.h to include such unrelated stuff as array.h and guc.h. Clean up
the header inclusion, and manually revert a lot of what pgrminclude had
done so things build again.
This episode reinforces my feeling that pgrminclude should not be run
without adult supervision. Inclusion changes in header files in particular
need to be reviewed with great care. More generally, it'd be good if we
had a clearer notion of module layering to dictate which headers can sanely
include which others ... but that's a big task for another day.
|
| |
|
|
|
|
|
| |
This lets us stop including rel.h into execnodes.h, which is a widely
used header.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There can never be a need to push the indcheckxmin horizon forward, since
any HOT chains that are actually broken with respect to the index must
pre-date its original creation. So we can just avoid changing pg_index
altogether during a REINDEX operation.
This offers a cleaner solution than my previous patch for the problem
found a few days ago that we mustn't try to update pg_index while we are
reindexing it. System catalog indexes will always be created with
indcheckxmin = false during initdb, and with this modified code we should
never try to change their pg_index entries. This avoids special-casing
system catalogs as the former patch did, and should provide a performance
benefit for many cases where REINDEX formerly caused an index to be
considered unusable for a short time.
Back-patch to 8.3 to cover all versions containing HOT. Note that this
patch changes the API for index_build(), but I believe it is unlikely that
any add-on code is calling that directly.
|
|
|
|
|
|
|
|
| |
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the code underlying pg_get_expr() is not secure against malformed
input, and can't practically be made so, we need to prevent miscreants
from feeding arbitrary data to it. We can do this securely by declaring
pg_get_expr() to take a new datatype "pg_node_tree" and declaring the
system catalog columns that hold nodeToString output to be of that type.
There is no way at SQL level to create a non-null value of type pg_node_tree.
Since the backend-internal operations that fill those catalog columns
operate below the SQL level, they are oblivious to the datatype relabeling
and don't need any changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
those process types that go through InitPostgres; in particular, bootstrap
and standalone-backend cases. This ensures that we have set up a PGPROC
and done some other basic initialization steps (corresponding to the
if (IsUnderPostmaster) block in AuxiliaryProcessMain) before we attempt to
run WAL recovery in a standalone backend. As was discovered last September,
this is necessary for some corner-case code paths during WAL recovery,
particularly end-of-WAL cleanup.
Moving the bootstrap case here too is not necessary for correctness, but it
seems like a good idea since it reduces the number of distinct code paths.
|
| |
|