From 9e257a181cc1dc5e19eb5d770ce09cc98f470f5f Mon Sep 17 00:00:00 2001 From: Andrew Dunstan Date: Sun, 24 Mar 2013 11:27:20 -0400 Subject: Add parallel pg_dump option. New infrastructure is added which creates a set number of workers (threads on Windows, forked processes on Unix). Jobs are then handed out to these workers by the master process as needed. pg_restore is adjusted to use this new infrastructure in place of the old setup which created a new worker for each step on the fly. Parallel dumps acquire a snapshot clone in order to stay consistent, if available. The parallel option is selected by the -j / --jobs command line parameter of pg_dump. Joachim Wieland, lightly editorialized by Andrew Dunstan. --- doc/src/sgml/backup.sgml | 18 +++++++++ doc/src/sgml/perform.sgml | 9 +++++ doc/src/sgml/ref/pg_dump.sgml | 89 ++++++++++++++++++++++++++++++++++++++++--- 3 files changed, 111 insertions(+), 5 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index c4215bed986..e444b1cde3d 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -310,6 +310,24 @@ pg_restore -d dbname + + Use <application>pg_dump</>'s parallel dump feature. + + To speed up the dump of a large database, you can use + pg_dump's parallel mode. This will dump + multiple tables at the same time. You can control the degree of + parallelism with the -j parameter. Parallel dumps + are only supported for the "directory" archive format. + + +pg_dump -j num -F d -f out.dir dbname + + + You can use pg_restore -j to restore a dump in parallel. + This will work for any archive of either the "custom" or the "directory" + archive mode, whether or not it has been created with pg_dump -j. + + diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index 1e7544afeb4..34eace35b6e 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -1433,6 +1433,15 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; base backup. + + + Experiment with the parallel dump and restore modes of both + pg_dump and pg_restore and find the + optimal number of concurrent jobs to use. Dumping and restoring in + parallel by means of the + Consider whether the whole dump should be restored as a single diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 6d0f214d423..0186ce0938b 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -73,10 +73,12 @@ PostgreSQL documentation transfer mechanism. pg_dump can be used to backup an entire database, then pg_restore can be used to examine the archive and/or select which parts of the - database are to be restored. The most flexible output file format is - the custom format (). It allows - for selection and reordering of all archived items, and is compressed - by default. + database are to be restored. The most flexible output file formats are + the custom format () and the + directory format(). They allow + for selection and reordering of all archived items, support parallel + restoration, and are compressed by default. The directory + format is the only format that supports parallel dumps. @@ -251,7 +253,8 @@ PostgreSQL documentation can read. A directory format archive can be manipulated with standard Unix tools; for example, files in an uncompressed archive can be compressed with the gzip tool. - This format is compressed by default. + This format is compressed by default and also supports parallel + dumps. @@ -285,6 +288,62 @@ PostgreSQL documentation + + + + + + Run the dump in parallel by dumping njobs + tables simultaneously. This option reduces the time of the dump but it also + increases the load on the database server. You can only use this option with the + directory output format because this is the only output format where multiple processes + can write their data at the same time. + + + pg_dump will open njobs + + 1 connections to the database, so make sure your + setting is high enough to accommodate all connections. + + + Requesting exclusive locks on database objects while running a parallel dump could + cause the dump to fail. The reason is that the pg_dump master process + requests shared locks on the objects that the worker processes are going to dump later + in order to + make sure that nobody deletes them and makes them go away while the dump is running. + If another client then requests an exclusive lock on a table, that lock will not be + granted but will be queued waiting for the shared lock of the master process to be + released.. Consequently any other access to the table will not be granted either and + will queue after the exclusive lock request. This includes the worker process trying + to dump the table. Without any precautions this would be a classic deadlock situation. + To detect this conflict, the pg_dump worker process requests another + shared lock using the NOWAIT option. If the worker process is not granted + this shared lock, somebody else must have requested an exclusive lock in the meantime + and there is no way to continue with the dump, so pg_dump has no choice + but to abort the dump. + + + For a consistent backup, the database server needs to support synchronized snapshots, + a feature that was introduced in PostgreSQL 9.2. With this + feature, database clients can ensure they see the same dataset even though they use + different connections. pg_dump -j uses multiple database + connections; it connects to the database once with the master process and + once again for each worker job. Without the sychronized snapshot feature, the + different worker jobs wouldn't be guaranteed to see the same data in each connection, + which could lead to an inconsistent backup. + + + If you want to run a parallel dump of a pre-9.2 server, you need to make sure that the + database content doesn't change from between the time the master connects to the + database until the last worker job has connected to the database. The easiest way to + do this is to halt any data modifying processes (DDL and DML) accessing the database + before starting the backup. You also need to specify the + parameter when running + pg_dump -j against a pre-9.2 PostgreSQL + server. + + + + @@ -690,6 +749,17 @@ PostgreSQL documentation + + + + + This option allows running pg_dump -j against a pre-9.2 + server, see the documentation of the parameter + for more details. + + + + @@ -1082,6 +1152,15 @@ CREATE DATABASE foo WITH TEMPLATE template0; + + To dump a database into a directory-format archive in parallel with + 5 worker jobs: + + +$ pg_dump -Fd mydb -j 5 -f dumpdir + + + To reload an archive file into a (freshly created) database named newdb: -- cgit v1.2.3