| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
|
|
|
|
| |
This function for bit and bytea counts the set bits in the bit or byte
string. Internally, we use the existing popcount functionality.
For the name, after some discussion, we settled on bit_count, which
also exists with this meaning in MySQL, Java, and Python.
Author: David Fetter <david@fetter.org>
Discussion: https://www.postgresql.org/message-id/flat/20201230105535.GJ13234@fetter.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Once a relation's autovacuum is completed, the logs include more
information about this relation state if the threshold of
log_autovacuum_min_duration (or its relation option) is reached, with
for example contents about the statistics of the VACUUM operation for
the relation, WAL and system usage.
This commit adds more information about the statistics of the relation's
indexes, with one line of logs generated for each index. The index
stats were already calculated, but not printed in the context of
autovacuum yet. While on it, some refactoring is done to keep track of
the index statistics directly within LVRelStats, simplifying some
routines related to parallel VACUUMs.
Author: Masahiko Sawada
Reviewed-by: Michael Paquier, Euler Taveira
Discussion: https://postgr.es/m/CAD21AoAy6SxHiTivh5yAPJSUE4S=QRPpSZUdafOSz0R+fRcM6Q@mail.gmail.com
|
|
|
|
|
|
|
| |
We can't access the entry after it is removed from dynahash.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Ps-pL++f6CJwPx2+vUqXuew=Xt-9Bi-6kCyxn+Fwi2M7w@mail.gmail.com
|
|
|
|
|
|
|
|
| |
A couple error messages and comments used 'statistic kind', not the
correct 'statistics kind'. Fix and backpatch all the way back to 10,
where extended statistics were introduced.
Backpatch-through: 10
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the type of this wait event was Client. But while this
wait event is being reported, walreceiver process is waiting for
the startup process to set initial data for streaming replication.
It's not waiting for any activity on a socket connected to a user
application or walsender. So this commit changes the type for
WalReceiverWaitStart wait event to IPC.
Author: Fujii Masao
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/cdacc27c-37ff-f1a4-20e2-ce19933abfcc@oss.nttdata.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, to check relation permanence, the Relation's Form_pg_class
structure member relpersistence was compared to the value
RELPERSISTENCE_PERMANENT ("p"). This commit adds the macro
RelationIsPermanent() and is used in appropirate places to simplify the
code. This matches other RelationIs* macros.
This macro will be used in more places in future cluster file encryption
patches.
Discussion: https://postgr.es/m/20210318153134.GH20766@tamriel.snowman.net
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bringetbitmap function allocates memory for various purposes, which
may be quite expensive, depending on the number of scan keys. Instead of
allocating them separately, allocate one bit chunk of memory an carve it
into smaller pieces as needed - all the pieces have the same lifespan,
and it saves quite a bit of CPU and memory overhead.
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The handling of IS [NOT] NULL clauses is independent of an opclass, and
most of the code was exactly the same in both minmax and inclusion. So
instead move the code from support procedures to the AM.
This simplifies the code - especially the support procedures - quite a
bit, as they don't need to care about NULL values and flags at all. It
also means the IS [NOT] NULL clauses can be evaluated without invoking
the support procedure.
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: Masahiko Sawada <masahiko.sawada@enterprisedb.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit changes how we pass scan keys to BRIN consistent function.
Instead of passing them one by one, we now pass all scan keys for a
given attribute at once. That makes the consistent function a bit more
complex, as it has to loop through the keys, but it does allow more
elaborate opclasses that can use multiple keys to eliminate ranges much
more effectively.
The existing BRIN opclasses (minmax, inclusion) don't really benefit
from this change. The primary purpose is to allow future opclases to
benefit from seeing all keys at once.
This does change the BRIN API, because the signature of the consistent
function changes (a new parameter with number of scan keys). So this
breaks existing opclasses, and will require supporting two variants of
the code for different PostgreSQL versions. We've considered supporting
two variants of the consistent, but we've decided not to do that.
Firstly, there's another patch that moves handling of NULL values from
the opclass, which means the opclasses need to be updated anyway.
Secondly, we're not aware of any out-of-core BRIN opclasses, so it does
not seem worth the extra complexity.
Bump catversion, because of pg_proc changes.
Author: Tomas Vondra <tomas.vondra@postgresql.org>
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru>
Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com
|
|
|
|
|
|
| |
Until now the bsearch_arg function was used only in extended statistics
code, so it was defined in that code. But we already have qsort_arg in
src/port, so let's move it next to it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
substring(), and perhaps other callers, isn't careful to pass a
slice length that is no more than the datum's true size. Since
toast_decompress_datum_slice's children will palloc the requested
slice length, this can waste memory. Also, close study of the liblz4
documentation suggests that it is dependent on the caller to not ask
for more than the correct amount of decompressed data; this squares
with observed misbehavior with liblz4 1.8.3. Avoid these problems
by switching to the normal full-decompression code path if the
slice request is >= datum's decompressed size.
Tom Lane and Dilip Kumar
Discussion: https://postgr.es/m/507597.1616370729@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The authors of bbe0a81db hadn't quite got the idea that macros named
like SOMETHING_4B_C were only meant for internal endianness-related
details in postgres.h. Choose more legible names for macros that are
intended to be used elsewhere. Rearrange postgres.h a bit to clarify
the separation between those internal macros and ones intended for
wider use.
Also, avoid using the term "rawsize" for true decompressed size;
we've used "extsize" for that, because "rawsize" generally denotes
total Datum size including header. This choice seemed particularly
unfortunate in tests that were comparing one of these meanings to
the other.
This patch includes a couple of not-purely-cosmetic changes: be
sure that the shifts aligning compression methods are unsigned
(not critical today, but will be when compression method 2 exists),
and fix broken definition of VARATT_EXTERNAL_GET_COMPRESSION (now
VARATT_EXTERNAL_GET_COMPRESS_METHOD), whose callers worked only
accidentally.
Discussion: https://postgr.es/m/574197.1616428079@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
| |
The previous coding treated an invalid compression method name as
equivalent to the default, which is certainly not right.
Justin Pryzby
Discussion: http://postgr.es/m/20210321235544.GD4203@telsasoft.com
|
|
|
|
|
|
|
|
|
| |
Remove unused macro. Fix confusion about whether a TOAST compression
method is identified by an OID or a char.
Justin Pryzby
Discussion: http://postgr.es/m/20210321235544.GD4203@telsasoft.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit is mostly a revert of aaa3aed, that switched the routine
doing the internal renaming of recycled WAL segments to use on Windows a
combination of CreateHardLinkA() plus unlink() instead of rename(). As
reported by several users of Postgres 13, this is causing concurrency
issues when manipulating WAL segments, mostly in the shape of the
following error:
LOG: could not rename file "pg_wal/000000XX000000YY000000ZZ":
Permission denied
This moves back to a logic where a single rename() (well, pgrename() for
Windows) is used. This issue has proved to be hard to hit when I tested
it, facing it only once with an archive_command that was not able to do
its work, so it is environment-sensitive. The reporters of this issue
have been able to confirm that the situation improved once we switched
back to a single rename(). In order to check things, I have provided to
the reporters a patched build based on 13.2 with aaa3aed reverted, to
test if the error goes away, and an unpatched build of 13.2 to test if
the error still showed up (just to make sure that I did not mess up my
build process).
Extra thanks to Fujii Masao for pointing out what looked like the
culprit commit, and to all the reporters for taking the time to test
what I have sent them.
Reported-by: Andrus, Guy Burgess, Yaroslav Pashinsky, Thomas Trenz
Reviewed-by: Tom Lane, Andres Freund
Discussion: https://postgr.es/m/3861ff1e-0923-7838-e826-094cc9bef737@hot.ee
Discussion: https://postgr.es/m/16874-c3eecd319e36a2bf@postgresql.org
Discussion: https://postgr.es/m/095ccf8d-7f58-d928-427c-b17ace23cae6@burgess.co.nz
Discussion: https://postgr.es/m/16927-67c570d968c99567%40postgresql.org
Discussion: https://postgr.es/m/YFBcRbnBiPdGZvfW@paquier.xyz
Backpatch-through: 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Any transactions found as still prepared by a checkpoint have their
state data read from the WAL records generated by PREPARE TRANSACTION
before being moved into their new location within pg_twophase/. While
reading such records, the WAL reader uses the callback
read_local_xlog_page() to read a page, that is shared across various
parts of the system. This callback, since 1148e22a, has introduced an
update of ThisTimeLineID when reading a record while in recovery, which
is potentially helpful in the context of cascading WAL senders.
This update of ThisTimeLineID interacts badly with the checkpointer if a
promotion happens while some 2PC data is read from its record, as, by
changing ThisTimeLineID, any follow-up WAL records would be written to
an timeline older than the promoted one. This results in consistency
issues. For instance, a subsequent server restart would cause a failure
in finding a valid checkpoint record, resulting in a PANIC, for
instance.
This commit changes the code reading the 2PC data to reset the timeline
once the 2PC record has been read, to prevent messing up with the static
state of the checkpointer. It would be tempting to do the same thing
directly in read_local_xlog_page(). However, based on the discussion
that has led to 1148e22a, users may rely on the updates of
ThisTimeLineID when a WAL record page is read in recovery, so changing
this callback could break some cases that are working currently.
A TAP test reproducing the issue is added, relying on a PITR to
precisely trigger a promotion with a prepared transaction still
tracked.
Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masao
and myself.
Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin Yeap
Discussion: https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.com
Backpatch-through: 10
|
|
|
|
|
|
|
|
|
| |
It's not okay to scribble directly on a syscache entry.
Nor to continue accessing said entry after releasing it.
Also get rid of not-used local variables.
Per valgrind testing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Maintain a simple array of metadata about pages that were deleted during
nbtree VACUUM's current btvacuumscan() call. Use this metadata at the
end of btvacuumscan() to attempt to place newly deleted pages in the FSM
without further delay. It might not yet be safe to place any of the
pages in the FSM by then (they may not be deemed recyclable), but we
have little to lose and plenty to gain by trying. In practice there is
a very good chance that this will work out when vacuuming larger
indexes, where scanning the index naturally takes quite a while.
This commit doesn't change the page recycling invariants; it merely
improves the efficiency of page recycling within the confines of the
existing design. Recycle safety is a part of nbtree's implementation of
what Lanin & Shasha call "the drain technique". The design happens to
use transaction IDs (they're stored in deleted pages), but that in
itself doesn't align the cutoff for recycle safety to any of the
XID-based cutoffs used by VACUUM (e.g., OldestXmin). All that matters
is whether or not _other_ backends might be able to observe various
inconsistencies in the tree structure (that they cannot just detect and
recover from by moving right). Recycle safety is purely a question of
maintaining the consistency (or the apparent consistency) of a physical
data structure.
Note that running a simple serial test case involving a large range
DELETE followed by a VACUUM VERBOSE will probably show that any newly
deleted nbtree pages are not yet reusable/recyclable. This is expected
in the absence of even one concurrent XID assignment. It is an old
implementation restriction. In practice it's unlikely to be the thing
that makes recycling remain unsafe, at least with larger indexes, where
recycling newly deleted pages during the same VACUUM actually matters.
An important high-level goal of this commit (as well as related recent
commits e5d8a999 and 9f3665fb) is to make expensive deferred cleanup
operations in index AMs rare in general. If index vacuuming frequently
depends on the next VACUUM operation finishing off work that the current
operation started, then the general behavior of index vacuuming is hard
to predict. This is relevant to ongoing work that adds a vacuumlazy.c
mechanism to skip index vacuuming in certain cases. Anything that makes
the real world behavior of index vacuuming simpler and more linear will
also make top-down modeling in vacuumlazy.c more robust.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wzk76_P=67iUscb1UN44-gyZL-KgpsXbSxq_bdcMa7Q+wQ@mail.gmail.com
|
|
|
|
|
|
|
|
| |
Compilers that don't understand that elog(ERROR) doesn't return
issued warnings here. In the cases in libpq_pipeline.c, we were
not exactly helping things by failing to mark pg_fatal() as noreturn.
Per buildfarm.
|
|
|
|
|
|
|
|
|
|
| |
The documentation specifically states that lwlock-release fires before
any released waiters have been awakened. It worked that way until
ab5194e6f617a9a9e7aadb3dd1cee948a42d0755, where is seems to have been
misplaced accidentally. Move it back where it belongs.
Author: Craig Ringer <craig.ringer@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/CAGRY4nwxKUS_RvXFW-ugrZBYxPFFM5kjwKT5O+0+Stuga5b4+Q@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When compressing the BRIN summary, we can't simply use the compression
method from the indexed attribute. The summary may use a different data
type, e.g. fixed-length attribute may have varlena summary, leading to
compression failures. For the built-in BRIN opclasses this happens to
work, because the summary uses the same data type as the attribute.
When the data types match, we can inherit use the compression method
specified for the attribute (it's copied into the index descriptor).
Otherwise we don't have much choice and have to use the default one.
Author: Tomas Vondra
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Discussion: https://postgr.es/m/e0367f27-392c-321a-7411-a58e1a7e4817%40enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While back-patching e0e569e1d, I noted that there were some other
places where we ought to be applying DH_free(); namely, where we
load some DH parameters from a file and then reject them as not
being sufficiently secure. While it seems really unlikely that
anybody would hit these code paths in production, let alone do
so repeatedly, let's fix it for consistency.
Back-patch to v10 where this code was introduced.
Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RestoreGUCState applied InitializeOneGUCOption to already-live
GUC entries, causing any malloc'd subsidiary data to be forgotten.
We do want the effect of resetting the GUC to its compiled-in
default, and InitializeOneGUCOption seems like the best way to do
that, so add code to free any existing subsidiary data beforehand.
The interaction between can_skip_gucvar, SerializeGUCState, and
RestoreGUCState is way more subtle than their opaque comments
would suggest to an unwary reader. Rewrite and enlarge the
comments to try to make it clearer what's happening.
Remove a long-obsolete assertion in read_nondefault_variables: the
behavior of set_config_option hasn't depended on IsInitProcessingMode
since f5d9698a8 installed a better way of controlling it.
Although this is fixing a clear memory leak, the leak is quite unlikely
to involve any large amount of data, and it can only happen once in the
lifetime of a worker process. So it seems unnecessary to take any
risk of back-patching.
Discussion: https://postgr.es/m/4105247.1616174862@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since commit 2ce439f3 we have opened every file in the data directory
and called fsync() at the start of crash recovery. This can be very
slow if there are many files, leading to field complaints of systems
taking minutes or even hours to begin crash recovery.
Provide an alternative method, for Linux only, where we call syncfs() on
every possibly different filesystem under the data directory. This is
equivalent, but avoids faulting in potentially many inodes from
potentially slow storage.
The new mode comes with some caveats, described in the documentation, so
the default value for the new setting is "fsync", preserving the older
behavior.
Reported-by: Michael Brown <michael.brown@discourse.org>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Paul Guo <guopa@vmware.com>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: David Steele <david@pgmasters.net>
Discussion: https://postgr.es/m/11bc2bb7-ecb5-3ad0-b39f-df632734cd81%40discourse.org
Discussion: https://postgr.es/m/CAEET0ZHGnbXmi8yF3ywsDZvb3m9CbdsGZgfTXscQ6agcbzcZAw%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function added in be45be9c33 is comparing integer lists (IntList) by
length and contents, but there were two bugs. Firstly, it used intVal()
to extract the value, but that's for Value nodes, not for extracting int
values from IntList. Secondly, it called it directly on the ListCell,
without doing lfirst(). So just do lfirst_int() instead.
Interestingly enough, this did not cause any crashes on the buildfarm,
but valgrind rightfully complained about it.
Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
|
|
|
|
|
|
| |
Introduced by commit bbe0a81db69bd10bd166907c3701492a29aca294.
Per buildfarm member prion.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is now a per-column COMPRESSION option which can be set to pglz
(the default, and the only option in up until now) or lz4. Or, if you
like, you can set the new default_toast_compression GUC to lz4, and
then that will be the default for new table columns for which no value
is specified. We don't have lz4 support in the PostgreSQL code, so
to use lz4 compression, PostgreSQL must be built --with-lz4.
In general, TOAST compression means compression of individual column
values, not the whole tuple, and those values can either be compressed
inline within the tuple or compressed and then stored externally in
the TOAST table, so those properties also apply to this feature.
Prior to this commit, a TOAST pointer has two unused bits as part of
the va_extsize field, and a compessed datum has two unused bits as
part of the va_rawsize field. These bits are unused because the length
of a varlena is limited to 1GB; we now use them to indicate the
compression type that was used. This means we only have bit space for
2 more built-in compresison types, but we could work around that
problem, if necessary, by introducing a new vartag_external value for
any further types we end up wanting to add. Hopefully, it won't be
too important to offer a wide selection of algorithms here, since
each one we add not only takes more coding but also adds a build
dependency for every packager. Nevertheless, it seems worth doing
at least this much, because LZ4 gets better compression than PGLZ
with less CPU usage.
It's possible for LZ4-compressed datums to leak into composite type
values stored on disk, just as it is for PGLZ. It's also possible for
LZ4-compressed attributes to be copied into a different table via SQL
commands such as CREATE TABLE AS or INSERT .. SELECT. It would be
expensive to force such values to be decompressed, so PostgreSQL has
never done so. For the same reasons, we also don't force recompression
of already-compressed values even if the target table prefers a
different compression method than was used for the source data. These
architectural decisions are perhaps arguable but revisiting them is
well beyond the scope of what seemed possible to do as part of this
project. However, it's relatively cheap to recompress as part of
VACUUM FULL or CLUSTER, so this commit adjusts those commands to do
so, if the configured compression method of the table happens not to
match what was used for some column value stored therein.
Dilip Kumar. The original patches on which this work was based were
written by Ildus Kurbangaliev, and those were patches were based on
even earlier work by Nikita Glukhov, but the design has since changed
very substantially, since allow a potentially large number of
compression methods that could be added and dropped on a running
system proved too problematic given some of the architectural issues
mentioned above; the choice of which specific compression method to
add first is now different; and a lot of the code has been heavily
refactored. More recently, Justin Przyby helped quite a bit with
testing and reviewing and this version also includes some code
contributions from him. Other design input and review from Tomas
Vondra, Álvaro Herrera, Andres Freund, Oleg Bartunov, Alexander
Korotkov, and me.
Discussion: http://postgr.es/m/20170907194236.4cefce96%40wp.localdomain
Discussion: http://postgr.es/m/CAFiTN-uUpX3ck%3DK0mLEk-G_kUQY%3DSNOTeqdaNRR9FMdQrHKebw%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 86c23a6eb2 changed the option to specify that postgres will
stop all other server processes by sending the signal SIGSTOP,
from -s to -T. But previously there were comments incorrectly
explaining that SIGSTOP behavior is set by -s option. This commit
fixes them.
Author: Kyotaro Horiguchi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/20210316.165141.1400441966284654043.horikyota.ntt@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
We leaked the error report from PQconninfoParse, when there was
one. It seems unlikely that real usage patterns would repeat
the failure often enough to create serious bloat, but let's
back-patch anyway to keep the code similar in all branches.
Found via valgrind testing.
Back-patch to v10 where this code was added.
Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because guc.c prefers to keep all its string values in malloc'd
not palloc'd storage, it has to be more careful than usual to
avoid leaks. Error exits out of string GUC hook checks failed
to clear the proposed value string, and error exits out of
ProcessGUCArray() failed to clear the malloc'd results of
ParseLongOption().
Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.
Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The text search cache mechanisms assume that we can clean up
an invalidated dictionary cache entry simply by resetting the
associated long-lived memory context. However, that does not work
for ispell affixes that make use of regular expressions, because
the regex library deals in plain old malloc. Hence, we leaked
compiled regex(es) any time we dropped such a cache entry. That
could quickly add up, since even a fairly trivial regex can use up
tens of kB, and a large one can eat megabytes. Add a memory context
callback to ensure that a regex gets freed when its owning cache
entry is cleared.
Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.
Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some code paths in this function perform syscache lookups, which
can lead to table accesses and possibly leakage of cruft into
the caller's context. If said context is CacheMemoryContext,
we eventually will have visible bloat. But fixing this is no
harder than moving one memory context switch step. (The other
callers don't have a problem.)
Andres Freund and I independently found this via valgrind testing.
Back-patch to v12 where this code was added.
Discussion: https://postgr.es/m/20210317023101.anvejcfotwka6gaa@alap3.anarazel.de
Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
|
|
| |
Although these lists are usually NIL, and even when not empty
are unlikely to be large, constant relcache update traffic could
eventually result in visible bloat of CacheMemoryContext.
Found via valgrind testing.
Back-patch to v10 where this field was added.
Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
|
|
|
|
|
|
|
|
|
| |
Our coding convention requires this macro's result to be assigned
back to the original List variable. In this usage, since the
List could not become empty, there was no actual bug --- but
some compilers warned about it. Oversight in be45be9c3.
Discussion: https://postgr.es/m/35077b31-2d62-1e31-0e2e-ddb52d590b73@enterprisedb.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With grouping sets, it's possible that some of the grouping sets are
duplicate. This is especially common with CUBE and ROLLUP clauses. For
example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to
GROUP BY GROUPING SETS (
(a, b, c),
(a, b, c),
(a, b, c),
(a, b),
(a, b),
(a, b),
(a),
(a),
(a),
(c, a),
(c, a),
(c, a),
(c),
(b, c),
(b),
()
)
Some of the grouping sets are calculated multiple times, which is mostly
unnecessary. This commit implements a new GROUP BY DISTINCT feature, as
defined in the SQL standard, which eliminates the duplicate sets.
Author: Vik Fearing
Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra
Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After a crash of a backend using temporary files, the files used to be
left behind, on the basis that it might be useful for debugging. But we
don't have any reports of anyone actually doing that, and it means the
disk usage may grow over time due to repeated backend failures (possibly
even hitting ENOSPC). So this behavior is a bit unfortunate, and fixing
it required either manual cleanup (deleting files, which is error-prone)
or restart of the instance (i.e. service disruption).
This implements automatic cleanup of temporary files, controled by a new
GUC remove_temp_files_after_crash. By default the files are removed, but
it can be disabled to restore the old behavior if needed.
Author: Euler Taveira
Reviewed-by: Tomas Vondra, Michael Paquier, Anastasia Lubennikova, Thomas Munro
Discussion: https://postgr.es/m/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
pg_read_file() is the function that's in core, pg_file_read() is in
adminpack. But when using pg_file_read() in adminpack it calls the *C*
level function pg_read_file() in core, which probably threw the original
author off. But the error hint should be about the SQL function.
Reported-By: Sergei Kornilov
Backpatch-through: 11
Discussion: https://postgr.es/m/373021616060475@mail.yandex.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 05c8482f7f added the implementation of parallel SELECT for
"INSERT INTO ... SELECT ..." which may incur non-negligible overhead in
the additional parallel-safety checks that it performs, even when, in the
end, those checks determine that parallelism can't be used. This is
normally only ever a problem in the case of when the target table has a
large number of partitions.
A new GUC option "enable_parallel_insert" is added, to allow insert in
parallel-mode. The default is on.
In addition to the GUC option, the user may want a mechanism to allow
inserts in parallel-mode with finer granularity at table level. The new
table option "parallel_insert_enabled" allows this. The default is true.
Author: "Hou, Zhijie"
Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When accessing replication slot stats, introduced in 98681675002d,
pgstat_read_statsfiles() reads the data into newly allocated
memory. Unfortunately the current memory context at that point is the
callers, leading to leaks and use-after-free dangers.
The fix is trivial, explicitly use pgStatLocalContext. There's some
potential for further improvements, but that's outside of the scope of
this bugfix.
No backpatch necessary, feature is only in HEAD.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20210317230447.c7uc4g3vbs4wi32i@alap3.anarazel.de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While looking at Robert Foggia's report, I noticed a passel of
other issues in the same area:
* The scheme for backslash-quoting newlines in pathnames is just
wrong; it will misbehave if the last ordinary character in a pathname
is a backslash. I'm not sure why we're bothering to allow newlines
in tablespace paths, but if we're going to do it we should do it
without introducing other problems. Hence, backslashes themselves
have to be backslashed too.
* The author hadn't read the sscanf man page very carefully, because
this code would drop any leading whitespace from the path. (I doubt
that a tablespace path with leading whitespace could happen in
practice; but if we're bothering to allow newlines in the path, it
sure seems like leading whitespace is little less implausible.) Using
sscanf for the task of finding the first space is overkill anyway.
* While I'm not 100% sure what the rationale for escaping both \r and
\n is, if the idea is to allow Windows newlines in the file then this
code failed, because it'd throw an error if it saw \r followed by \n.
* There's no cross-check for an incomplete final line in the map file,
which would be a likely apparent symptom of the improper-escaping
bug.
On the generation end, aside from the escaping issue we have:
* If needtblspcmapfile is true then do_pg_start_backup will pass back
escaped strings in tablespaceinfo->path values, which no caller wants
or is prepared to deal with. I'm not sure if there's a live bug from
that, but it looks like there might be (given the dubious assumption
that anyone actually has newlines in their tablespace paths).
* It's not being very paranoid about the possibility of random stuff
in the pg_tblspc directory. IMO we should ignore anything without an
OID-like name.
The escaping rule change doesn't seem back-patchable: it'll require
doubling of backslashes in the tablespace_map file, which is basically
a basebackup format change. The odds of that causing trouble are
considerably more than the odds of the existing bug causing trouble.
The rest of this seems somewhat unlikely to cause problems too,
so no back-patch.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Robert Foggia of Trustwave reported that read_tablespace_map()
fails to prevent an overrun of its on-stack input buffer.
Since the tablespace map file is presumed trustworthy, this does
not seem like an interesting security vulnerability, but still
we should fix it just in the name of robustness.
While here, document that pg_basebackup's --tablespace-mapping option
doesn't work with tar-format output, because it doesn't. To make it
work, we'd have to modify the tablespace_map file within the tarball
sent by the server, which might be possible but I'm not volunteering.
(Less-painful solutions would require changing the basebackup protocol
so that the source server could adjust the map. That's not very
appetizing either.)
|
|
|
|
|
|
|
| |
This reverts commit 378802e3713c6c0fce31d2390c134cd5d7c30157.
This reverts commit 3b8981b6e1a2aea0f18384c803e21e9391de669a.
Discussion: https://postgr.es/m/CA%2BhUKGJmcqAE3MZeDCLLXa62cWM0AJbKmp2JrJYaJ86bz36LFA%40mail.gmail.com
|
|
|
|
|
|
|
|
|
| |
578b229, that removed support for WITH OIDS, has changed
CatalogTupleInsert() to not return an Oid, but one comment was still
mentioning that.
Author: Vik Fearing
Discussion: https://postgr.es/m/fef01975-ed10-3601-7b9e-80ecef72d00b@postgresfriends.org
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 3048898e dropped -ING from some wait event names that correspond
to barrier phases. Update the phases' names to match.
While we're here making cosmetic changes, also rename "DONE" to "FREE".
That pairs better with "ALLOCATE", and describes the activity that
actually happens in that phase (as we do for the other phases) rather
than describing a state. The distinction is clearer after bugfix commit
3b8981b6 split the phase into two. As for the growth barriers, rename
their "ALLOCATE" phase to "REALLOCATE", which is probably a better
description of what happens then. Also improve the comments about
the phases a bit.
Discussion: https://postgr.es/m/CA%2BhUKG%2BMDpwF2Eo2LAvzd%3DpOh81wUTsrwU1uAwR-v6OGBB6%2B7g%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With very unlucky timing and parallel_leader_participation off, PHJ
could attempt to access per-batch state just as it was being freed.
There was code intended to prevent that by checking for a cleared
pointer, but it was buggy.
Fix, by introducing an extra barrier phase. The new phase
PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to
find a batch to help with, and PHJ_BUILD_DONE means that it is too late.
The last to detach will free the array of per-batch state as before, but
now it will also atomically advance the phase at the same time, so that
late attachers can avoid the hazard, without the data race. This
mirrors the way per-batch hash tables are freed (see phases
PHJ_BATCH_PROBING and PHJ_BATCH_DONE).
Revealed by a one-off build farm failure, where BarrierAttach() failed a
sanity check assertion, because the memory had been clobbered by
dsa_free().
Back-patch to 11, where the code arrived.
Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit ce0fdbfe97 made tablesync slots permanent and allow Drop
Subscription to drop such slots. However, it is possible that before
tablesync worker could get the acknowledgment of slot creation, drop
subscription stops it and that can lead to a dangling slot on the
publisher. Prevent cancel/die interrupts while creating a slot in the
tablesync worker.
Reported-by: Thomas Munro as per buildfarm
Author: Amit Kapila
Reviewed-by: Vignesh C, Takamichi Osumi
Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pass CURSOR_OPT_PARALLEL_OK to pg_plan_query() so that parallel plans
are considered when running the underlying SELECT query. This wasn't
done in commit e9baa5e9, which did this for CREATE MATERIALIZED VIEW,
because it wasn't yet known to be safe.
Since REFRESH always inserts into a freshly created table before later
merging or swapping the data into place with separate operations, we can
enable such plans here too.
Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Reviewed-by: Hou, Zhijie <houzj.fnst@cn.fujitsu.com>
Reviewed-by: Luc Vlaming <luc@swarm64.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CALj2ACXg-4hNKJC6nFnepRHYT4t5jJVstYvri%2BtKQHy7ydcr8A%40mail.gmail.com
|
|
|
|
| |
Oversight in commit d168b666823, which added bottom-up index deletion.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After reading the root cert list from the ssl_ca_file, immediately
install it as client CA list of the new SSL context. That gives the
SSL context ownership of the list, so that SSL_CTX_free will free it.
This avoids a permanent memory leak if we fail further down in
be_tls_init(), which could happen if bogus CRL data is offered.
The leak could only amount to something if the CRL parameters get
broken after server start (else we'd just quit) and then the server
is SIGHUP'd many times without fixing the CRL data. That's rather
unlikely perhaps, but it seems worth fixing, if only because the
code is clearer this way.
While we're here, add some comments about the memory management
aspects of this logic.
Noted by Jelte Fennema and independently by Andres Freund.
Back-patch to v10; before commit de41869b6 it doesn't matter,
since we'd not re-execute this code during SIGHUP.
Discussion: https://postgr.es/m/16160-18367e56e9a28264@postgresql.org
|