diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2024-07-29 12:17:24 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2024-07-29 12:17:24 -0400 |
commit | 81db073a287842cbf0a17cb32108b214a335670b (patch) | |
tree | de12585a2db63394d6097fe195db890d6532ac2a | |
parent | 2fa989e6a3407b9da625e1524c8694bc028e25ba (diff) | |
download | postgresql-81db073a287842cbf0a17cb32108b214a335670b.tar.gz postgresql-81db073a287842cbf0a17cb32108b214a335670b.zip |
Count individual SQL commands in pg_restore's --transaction-size mode.
The initial implementation in commit 959b38d77 counted one action
per TOC entry (except for some special cases for multi-blob BLOBS
entries). This assumes that TOC entries are all about equally
complex, but it turns out that that assumption doesn't hold up very
well in binary-upgrade mode. For example, even after the previous
commit I was able to cause backend bloat with tables having many
inherited constraints. There may be other cases too. (Since no
serious problems have been reported with --single-transaction mode,
we can conclude that the backend copes well with psql's regular
restore scripts; but before 959b38d77 we never ran binary-upgrade
restores with multi-command transactions.)
To fix, count multi-command TOC entries as N actions, allowing the
transaction size to be scaled down when we hit a complex TOC entry.
Rather than add a SQL parser to pg_restore, approximate "multi
command" by counting semicolons in the TOC entry's defn string.
This will be fooled by semicolons appearing in string literals ---
but the error is in the conservative direction, so it doesn't seem
worth working harder. The biggest risk is with function/procedure
TOC entries, but we can just explicitly skip those.
(This is undoubtedly a hack, and maybe someday we'll be able to
revert it after fixing the backend's bloat issues or rethinking
what pg_dump emits in binary upgrade mode. But that surely isn't
a project for v17.)
Thanks to Alexander Korotkov for the let's-count-semicolons idea.
Per report from Justin Pryzby. Back-patch to v17 where txn_size mode
was introduced.
Discussion: https://postgr.es/m/ZqEND4ZcTDBmcv31@pryzbyj2023
-rw-r--r-- | src/bin/pg_dump/pg_backup_archiver.c | 28 |
1 files changed, 25 insertions, 3 deletions
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index 68e321212d9..8c20c263c4b 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -3827,10 +3827,32 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData) { IssueACLPerBlob(AH, te); } - else + else if (te->defn && strlen(te->defn) > 0) { - if (te->defn && strlen(te->defn) > 0) - ahprintf(AH, "%s\n\n", te->defn); + ahprintf(AH, "%s\n\n", te->defn); + + /* + * If the defn string contains multiple SQL commands, txn_size mode + * should count it as N actions not one. But rather than build a full + * SQL parser, approximate this by counting semicolons. One case + * where that tends to be badly fooled is function definitions, so + * ignore them. (restore_toc_entry will count one action anyway.) + */ + if (ropt->txn_size > 0 && + strcmp(te->desc, "FUNCTION") != 0 && + strcmp(te->desc, "PROCEDURE") != 0) + { + const char *p = te->defn; + int nsemis = 0; + + while ((p = strchr(p, ';')) != NULL) + { + nsemis++; + p++; + } + if (nsemis > 1) + AH->txnCount += nsemis - 1; + } } /* |