diff options
Diffstat (limited to 'doc/src/sgml/ref/pgbench.sgml')
-rw-r--r-- | doc/src/sgml/ref/pgbench.sgml | 1176 |
1 files changed, 1176 insertions, 0 deletions
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml new file mode 100644 index 00000000000..a8085463a5e --- /dev/null +++ b/doc/src/sgml/ref/pgbench.sgml @@ -0,0 +1,1176 @@ +<!-- doc/src/sgml/ref/pgbench.sgml --> + +<refentry id="pgbench"> + <indexterm zone="pgbench"> + <primary>pgbench</primary> + </indexterm> + + <refmeta> + <refentrytitle><application>pgbench</application></refentrytitle> + <manvolnum>1</manvolnum> + <refmiscinfo>Application</refmiscinfo> + </refmeta> + + <refnamediv> + <refname>pgbench</refname> + <refpurpose>run a benchmark test on <productname>PostgreSQL</productname></refpurpose> + </refnamediv> + + <refsynopsisdiv> + <cmdsynopsis> + <command>pgbench</command> + <arg choice="plain"><option>-i</option></arg> + <arg rep="repeat"><replaceable>option</replaceable></arg> + <arg choice="opt"><replaceable>dbname</replaceable></arg> + </cmdsynopsis> + <cmdsynopsis> + <command>pgbench</command> + <arg rep="repeat"><replaceable>option</replaceable></arg> + <arg choice="opt"><replaceable>dbname</replaceable></arg> + </cmdsynopsis> + </refsynopsisdiv> + + <refsect1> + <title>Description</title> + <para> + <application>pgbench</application> is a simple program for running benchmark + tests on <productname>PostgreSQL</>. It runs the same sequence of SQL + commands over and over, possibly in multiple concurrent database sessions, + and then calculates the average transaction rate (transactions per second). + By default, <application>pgbench</application> tests a scenario that is + loosely based on TPC-B, involving five <command>SELECT</>, + <command>UPDATE</>, and <command>INSERT</> commands per transaction. + However, it is easy to test other cases by writing your own transaction + script files. + </para> + + <para> + Typical output from pgbench looks like: + +<screen> +transaction type: TPC-B (sort of) +scaling factor: 10 +query mode: simple +number of clients: 10 +number of threads: 1 +number of transactions per client: 1000 +number of transactions actually processed: 10000/10000 +tps = 85.184871 (including connections establishing) +tps = 85.296346 (excluding connections establishing) +</screen> + + The first six lines report some of the most important parameter + settings. The next line reports the number of transactions completed + and intended (the latter being just the product of number of clients + and number of transactions per client); these will be equal unless the run + failed before completion. (In <option>-T</> mode, only the actual + number of transactions is printed.) + The last two lines report the number of transactions per second, + figured with and without counting the time to start database sessions. + </para> + + <para> + The default TPC-B-like transaction test requires specific tables to be + set up beforehand. <application>pgbench</> should be invoked with + the <option>-i</> (initialize) option to create and populate these + tables. (When you are testing a custom script, you don't need this + step, but will instead need to do whatever setup your test needs.) + Initialization looks like: + +<programlisting> +pgbench -i <optional> <replaceable>other-options</> </optional> <replaceable>dbname</> +</programlisting> + + where <replaceable>dbname</> is the name of the already-created + database to test in. (You may also need <option>-h</>, + <option>-p</>, and/or <option>-U</> options to specify how to + connect to the database server.) + </para> + + <caution> + <para> + <literal>pgbench -i</> creates four tables <structname>pgbench_accounts</>, + <structname>pgbench_branches</>, <structname>pgbench_history</>, and + <structname>pgbench_tellers</>, + destroying any existing tables of these names. + Be very careful to use another database if you have tables having these + names! + </para> + </caution> + + <para> + At the default <quote>scale factor</> of 1, the tables initially + contain this many rows: +<screen> +table # of rows +--------------------------------- +pgbench_branches 1 +pgbench_tellers 10 +pgbench_accounts 100000 +pgbench_history 0 +</screen> + You can (and, for most purposes, probably should) increase the number + of rows by using the <option>-s</> (scale factor) option. The + <option>-F</> (fillfactor) option might also be used at this point. + </para> + + <para> + Once you have done the necessary setup, you can run your benchmark + with a command that doesn't include <option>-i</>, that is + +<programlisting> +pgbench <optional> <replaceable>options</> </optional> <replaceable>dbname</> +</programlisting> + + In nearly all cases, you'll need some options to make a useful test. + The most important options are <option>-c</> (number of clients), + <option>-t</> (number of transactions), <option>-T</> (time limit), + and <option>-f</> (specify a custom script file). + See below for a full list. + </para> + </refsect1> + + <refsect1> + <title>Options</title> + + <para> + The following is divided into three subsections: Different options are used + during database initialization and while running benchmarks, some options + are useful in both cases. + </para> + + <refsect2 id="pgbench-init-options"> + <title>Initialization Options</title> + + <para> + <application>pgbench</application> accepts the following command-line + initialization arguments: + + <variablelist> + + <varlistentry> + <term><option>-i</option></term> + <term><option>--initialize</option></term> + <listitem> + <para> + Required to invoke initialization mode. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-F</option> <replaceable>fillfactor</></term> + <term><option>--fillfactor=</option><replaceable>fillfactor</></term> + <listitem> + <para> + Create the <structname>pgbench_accounts</>, + <structname>pgbench_tellers</> and + <structname>pgbench_branches</> tables with the given fillfactor. + Default is 100. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-n</option></term> + <term><option>--no-vacuum</option></term> + <listitem> + <para> + Perform no vacuuming after initialization. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-q</option></term> + <term><option>--quiet</option></term> + <listitem> + <para> + Switch logging to quiet mode, producing only one progress message per 5 + seconds. The default logging prints one message each 100000 rows, which + often outputs many lines per second (especially on good hardware). + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-s</option> <replaceable>scale_factor</></term> + <term><option>--scale=</option><replaceable>scale_factor</></term> + <listitem> + <para> + Multiply the number of rows generated by the scale factor. + For example, <literal>-s 100</> will create 10,000,000 rows + in the <structname>pgbench_accounts</> table. Default is 1. + When the scale is 20,000 or larger, the columns used to + hold account identifiers (<structfield>aid</structfield> columns) + will switch to using larger integers (<type>bigint</type>), + in order to be big enough to hold the range of account + identifiers. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>--foreign-keys</option></term> + <listitem> + <para> + Create foreign key constraints between the standard tables. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>--index-tablespace=<replaceable>index_tablespace</replaceable></option></term> + <listitem> + <para> + Create indexes in the specified tablespace, rather than the default + tablespace. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>--tablespace=<replaceable>tablespace</replaceable></option></term> + <listitem> + <para> + Create tables in the specified tablespace, rather than the default + tablespace. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>--unlogged-tables</option></term> + <listitem> + <para> + Create all tables as unlogged tables, rather than permanent tables. + </para> + </listitem> + </varlistentry> + + </variablelist> + </para> + + </refsect2> + + <refsect2 id="pgbench-run-options"> + <title>Benchmarking Options</title> + + <para> + <application>pgbench</application> accepts the following command-line + benchmarking arguments: + + <variablelist> + + <varlistentry> + <term><option>-c</option> <replaceable>clients</></term> + <term><option>--client=</option><replaceable>clients</></term> + <listitem> + <para> + Number of clients simulated, that is, number of concurrent database + sessions. Default is 1. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-C</option></term> + <term><option>--connect</option></term> + <listitem> + <para> + Establish a new connection for each transaction, rather than + doing it just once per client session. + This is useful to measure the connection overhead. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-d</option></term> + <term><option>--debug</option></term> + <listitem> + <para> + Print debugging output. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-D</option> <replaceable>varname</><literal>=</><replaceable>value</></term> + <term><option>--define=</option><replaceable>varname</><literal>=</><replaceable>value</></term> + <listitem> + <para> + Define a variable for use by a custom script (see below). + Multiple <option>-D</> options are allowed. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-f</option> <replaceable>filename</></term> + <term><option>--file=</option><replaceable>filename</></term> + <listitem> + <para> + Read transaction script from <replaceable>filename</>. + See below for details. + <option>-N</option>, <option>-S</option>, and <option>-f</option> + are mutually exclusive. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-j</option> <replaceable>threads</></term> + <term><option>--jobs=</option><replaceable>threads</></term> + <listitem> + <para> + Number of worker threads within <application>pgbench</application>. + Using more than one thread can be helpful on multi-CPU machines. + The number of clients must be a multiple of the number of threads, + since each thread is given the same number of client sessions to manage. + Default is 1. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-l</option></term> + <term><option>--log</option></term> + <listitem> + <para> + Write the time taken by each transaction to a log file. + See below for details. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-L</option> <replaceable>limit</></term> + <term><option>--latency-limit=</option><replaceable>limit</></term> + <listitem> + <para> + Transaction which last more than <replaceable>limit</> milliseconds + are counted and reported separately, as <firstterm>late</>. + </para> + <para> + When throttling is used (<option>--rate=...</>), transactions that + lag behind schedule by more than <replaceable>limit</> ms, and thus + have no hope of meeting the latency limit, are not sent to the server + at all. They are counted and reported separately as + <firstterm>skipped</>. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-M</option> <replaceable>querymode</></term> + <term><option>--protocol=</option><replaceable>querymode</></term> + <listitem> + <para> + Protocol to use for submitting queries to the server: + <itemizedlist> + <listitem> + <para><literal>simple</>: use simple query protocol.</para> + </listitem> + <listitem> + <para><literal>extended</>: use extended query protocol.</para> + </listitem> + <listitem> + <para><literal>prepared</>: use extended query protocol with prepared statements.</para> + </listitem> + </itemizedlist> + The default is simple query protocol. (See <xref linkend="protocol"> + for more information.) + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-n</option></term> + <term><option>--no-vacuum</option></term> + <listitem> + <para> + Perform no vacuuming before running the test. + This option is <emphasis>necessary</> + if you are running a custom test scenario that does not include + the standard tables <structname>pgbench_accounts</>, + <structname>pgbench_branches</>, <structname>pgbench_history</>, and + <structname>pgbench_tellers</>. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-N</option></term> + <term><option>--skip-some-updates</option></term> + <listitem> + <para> + Do not update <structname>pgbench_tellers</> and + <structname>pgbench_branches</>. + This will avoid update contention on these tables, but + it makes the test case even less like TPC-B. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-P</option> <replaceable>sec</></term> + <term><option>--progress=</option><replaceable>sec</></term> + <listitem> + <para> + Show progress report every <literal>sec</> seconds. The report + includes the time since the beginning of the run, the tps since the + last report, and the transaction latency average and standard + deviation since the last report. Under throttling (<option>-R</>), + the latency is computed with respect to the transaction scheduled + start time, not the actual transaction beginning time, thus it also + includes the average schedule lag time. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-r</option></term> + <term><option>--report-latencies</option></term> + <listitem> + <para> + Report the average per-statement latency (execution time from the + perspective of the client) of each command after the benchmark + finishes. See below for details. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-R</option> <replaceable>rate</></term> + <term><option>--rate=</option><replaceable>rate</></term> + <listitem> + <para> + Execute transactions targeting the specified rate instead of running + as fast as possible (the default). The rate is given in transactions + per second. If the targeted rate is above the maximum possible rate, + the rate limit won't impact the results. + </para> + <para> + The rate is targeted by starting transactions along a + Poisson-distributed schedule time line. The expected start time + schedule moves forward based on when the client first started, not + when the previous transaction ended. That approach means that when + transactions go past their original scheduled end time, it is + possible for later ones to catch up again. + </para> + <para> + When throttling is active, the transaction latency reported at the + end of the run is calculated from the scheduled start times, so it + includes the time each transaction had to wait for the previous + transaction to finish. The wait time is called the schedule lag time, + and its average and maximum are also reported separately. The + transaction latency with respect to the actual transaction start time, + i.e. the time spent executing the transaction in the database, can be + computed by subtracting the schedule lag time from the reported + latency. + </para> + + <para> + If <option>--latency-limit</> is used together with <option>--rate</>, + a transaction can lag behind so much that it is already over the + latency limit when the previous transaction ends, because the latency + is calculated from the scheduled start time. Such transactions are + not sent to the server, but are skipped altogether and counted + separately. + </para> + + <para> + A high schedule lag time is an indication that the system cannot + process transactions at the specified rate, with the chosen number of + clients and threads. When the average transaction execution time is + longer than the scheduled interval between each transaction, each + successive transaction will fall further behind, and the schedule lag + time will keep increasing the longer the test run is. When that + happens, you will have to reduce the specified transaction rate. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-s</option> <replaceable>scale_factor</></term> + <term><option>--scale=</option><replaceable>scale_factor</></term> + <listitem> + <para> + Report the specified scale factor in <application>pgbench</>'s + output. With the built-in tests, this is not necessary; the + correct scale factor will be detected by counting the number of + rows in the <structname>pgbench_branches</> table. However, when testing + custom benchmarks (<option>-f</> option), the scale factor + will be reported as 1 unless this option is used. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-S</option></term> + <term><option>--select-only</option></term> + <listitem> + <para> + Perform select-only transactions instead of TPC-B-like test. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-t</option> <replaceable>transactions</></term> + <term><option>--transactions=</option><replaceable>transactions</></term> + <listitem> + <para> + Number of transactions each client runs. Default is 10. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-T</option> <replaceable>seconds</></term> + <term><option>--time=</option><replaceable>seconds</></term> + <listitem> + <para> + Run the test for this many seconds, rather than a fixed number of + transactions per client. <option>-t</option> and + <option>-T</option> are mutually exclusive. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-v</option></term> + <term><option>--vacuum-all</option></term> + <listitem> + <para> + Vacuum all four standard tables before running the test. + With neither <option>-n</> nor <option>-v</>, pgbench will vacuum the + <structname>pgbench_tellers</> and <structname>pgbench_branches</> + tables, and will truncate <structname>pgbench_history</>. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>--aggregate-interval=<replaceable>seconds</></option></term> + <listitem> + <para> + Length of aggregation interval (in seconds). May be used only together + with <application>-l</application> - with this option, the log contains + per-interval summary (number of transactions, min/max latency and two + additional fields useful for variance estimation). + </para> + <para> + This option is not currently supported on Windows. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>--sampling-rate=<replaceable>rate</></option></term> + <listitem> + <para> + Sampling rate, used when writing data into the log, to reduce the + amount of log generated. If this option is given, only the specified + fraction of transactions are logged. 1.0 means all transactions will + be logged, 0.05 means only 5% of the transactions will be logged. + </para> + <para> + Remember to take the sampling rate into account when processing the + log file. For example, when computing tps values, you need to multiply + the numbers accordingly (e.g. with 0.01 sample rate, you'll only get + 1/100 of the actual tps). + </para> + </listitem> + </varlistentry> + + </variablelist> + </para> + + </refsect2> + + <refsect2 id="pgbench-common-options"> + <title>Common Options</title> + + <para> + <application>pgbench</application> accepts the following command-line + common arguments: + + <variablelist> + + <varlistentry> + <term><option>-h</option> <replaceable>hostname</></term> + <term><option>--host=</option><replaceable>hostname</></term> + <listitem> + <para> + The database server's host name + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-p</option> <replaceable>port</></term> + <term><option>--port=</option><replaceable>port</></term> + <listitem> + <para> + The database server's port number + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-U</option> <replaceable>login</></term> + <term><option>--username=</option><replaceable>login</></term> + <listitem> + <para> + The user name to connect as + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-V</></term> + <term><option>--version</></term> + <listitem> + <para> + Print the <application>pgbench</application> version and exit. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><option>-?</></term> + <term><option>--help</></term> + <listitem> + <para> + Show help about <application>pgbench</application> command line + arguments, and exit. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + </refsect2> + </refsect1> + + <refsect1> + <title>Notes</title> + + <refsect2> + <title>What is the <quote>Transaction</> Actually Performed in pgbench?</title> + + <para> + The default transaction script issues seven commands per transaction: + </para> + + <orderedlist> + <listitem><para><literal>BEGIN;</literal></para></listitem> + <listitem><para><literal>UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;</literal></para></listitem> + <listitem><para><literal>SELECT abalance FROM pgbench_accounts WHERE aid = :aid;</literal></para></listitem> + <listitem><para><literal>UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;</literal></para></listitem> + <listitem><para><literal>UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;</literal></para></listitem> + <listitem><para><literal>INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);</literal></para></listitem> + <listitem><para><literal>END;</literal></para></listitem> + </orderedlist> + + <para> + If you specify <option>-N</>, steps 4 and 5 aren't included in the + transaction. If you specify <option>-S</>, only the <command>SELECT</> is + issued. + </para> + </refsect2> + + <refsect2> + <title>Custom Scripts</title> + + <para> + <application>pgbench</application> has support for running custom + benchmark scenarios by replacing the default transaction script + (described above) with a transaction script read from a file + (<option>-f</option> option). In this case a <quote>transaction</> + counts as one execution of a script file. You can even specify + multiple scripts (multiple <option>-f</option> options), in which + case a random one of the scripts is chosen each time a client session + starts a new transaction. + </para> + + <para> + The format of a script file is one SQL command per line; multiline + SQL commands are not supported. Empty lines and lines beginning with + <literal>--</> are ignored. Script file lines can also be + <quote>meta commands</>, which are interpreted by <application>pgbench</> + itself, as described below. + </para> + + <para> + There is a simple variable-substitution facility for script files. + Variables can be set by the command-line <option>-D</> option, + explained above, or by the meta commands explained below. + In addition to any variables preset by <option>-D</> command-line options, + there are a few variables that are preset automatically, listed in + <xref linkend="pgbench-automatic-variables">. A value specified for these + variables using <option>-D</> takes precedence over the automatic presets. + Once set, a variable's + value can be inserted into a SQL command by writing + <literal>:</><replaceable>variablename</>. When running more than + one client session, each session has its own set of variables. + </para> + + <table id="pgbench-automatic-variables"> + <title>Automatic variables</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Variable</entry> + <entry>Description</entry> + </row> + </thead> + + <tbody> + <row> + <entry> <literal>scale</literal> </entry> + <entry>current scale factor</entry> + </row> + + <row> + <entry> <literal>client_id</literal> </entry> + <entry>unique number identifying the client session (starts from zero)</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + Script file meta commands begin with a backslash (<literal>\</>). + Arguments to a meta command are separated by white space. + These meta commands are supported: + </para> + + <variablelist> + <varlistentry> + <term> + <literal>\set <replaceable>varname</> <replaceable>expression</></literal> + </term> + + <listitem> + <para> + Sets variable <replaceable>varname</> to an integer value calculated + from <replaceable>expression</>. + The expression may contain integer constants such as <literal>5432</>, + references to variables <literal>:</><replaceable>variablename</>, + and expressions composed of unary (<literal>-</>) or binary operators + (<literal>+</>, <literal>-</>, <literal>*</>, <literal>/</>, <literal>%</>) + with their usual associativity, and parentheses. + </para> + + <para> + Examples: +<programlisting> +\set ntellers 10 * :scale +\set aid (1021 * :aid) % (100000 * :scale) + 1 +</programlisting></para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <literal>\setrandom <replaceable>varname</> <replaceable>min</> <replaceable>max</> [ uniform | { gaussian | exponential } <replaceable>threshold</> ]</literal> + </term> + + <listitem> + <para> + Sets variable <replaceable>varname</> to a random integer value + between the limits <replaceable>min</> and <replaceable>max</> inclusive. + Each limit can be either an integer constant or a + <literal>:</><replaceable>variablename</> reference to a variable + having an integer value. + </para> + + <para> + By default, or when <literal>uniform</> is specified, all values in the + range are drawn with equal probability. Specifying <literal>gaussian</> + or <literal>exponential</> options modifies this behavior; each + requires a mandatory threshold which determines the precise shape of the + distribution. + </para> + + <para> + For a Gaussian distribution, the interval is mapped onto a standard + normal distribution (the classical bell-shaped Gaussian curve) truncated + at <literal>-threshold</> on the left and <literal>+threshold</> + on the right. + To be precise, if <literal>PHI(x)</> is the cumulative distribution + function of the standard normal distribution, with mean <literal>mu</> + defined as <literal>(max + min) / 2.0</>, then value <replaceable>i</> + between <replaceable>min</> and <replaceable>max</> inclusive is drawn + with probability: + <literal> + (PHI(2.0 * threshold * (i - min - mu + 0.5) / (max - min + 1)) - + PHI(2.0 * threshold * (i - min - mu - 0.5) / (max - min + 1))) / + (2.0 * PHI(threshold) - 1.0)</>. + Intuitively, the larger the <replaceable>threshold</>, the more + frequently values close to the middle of the interval are drawn, and the + less frequently values close to the <replaceable>min</> and + <replaceable>max</> bounds. + About 67% of values are drawn from the middle <literal>1.0 / threshold</> + and 95% in the middle <literal>2.0 / threshold</>; for instance, if + <replaceable>threshold</> is 4.0, 67% of values are drawn from the middle + quarter and 95% from the middle half of the interval. + The minimum <replaceable>threshold</> is 2.0 for performance of + the Box-Muller transform. + </para> + + <para> + For an exponential distribution, the <replaceable>threshold</> + parameter controls the distribution by truncating a quickly-decreasing + exponential distribution at <replaceable>threshold</>, and then + projecting onto integers between the bounds. + To be precise, value <replaceable>i</> between <replaceable>min</> and + <replaceable>max</> inclusive is drawn with probability: + <literal>(exp(-threshold*(i-min)/(max+1-min)) - + exp(-threshold*(i+1-min)/(max+1-min))) / (1.0 - exp(-threshold))</>. + Intuitively, the larger the <replaceable>threshold</>, the more + frequently values close to <replaceable>min</> are accessed, and the + less frequently values close to <replaceable>max</> are accessed. + The closer to 0 the threshold, the flatter (more uniform) the access + distribution. + A crude approximation of the distribution is that the most frequent 1% + values in the range, close to <replaceable>min</>, are drawn + <replaceable>threshold</>% of the time. + The <replaceable>threshold</> value must be strictly positive. + </para> + + <para> + Example: +<programlisting> +\setrandom aid 1 :naccounts gaussian 5.0 +</programlisting></para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <literal>\sleep <replaceable>number</> [ us | ms | s ]</literal> + </term> + + <listitem> + <para> + Causes script execution to sleep for the specified duration in + microseconds (<literal>us</>), milliseconds (<literal>ms</>) or seconds + (<literal>s</>). If the unit is omitted then seconds are the default. + <replaceable>number</> can be either an integer constant or a + <literal>:</><replaceable>variablename</> reference to a variable + having an integer value. + </para> + + <para> + Example: +<programlisting> +\sleep 10 ms +</programlisting></para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <literal>\setshell <replaceable>varname</> <replaceable>command</> [ <replaceable>argument</> ... ]</literal> + </term> + + <listitem> + <para> + Sets variable <replaceable>varname</> to the result of the shell command + <replaceable>command</>. The command must return an integer value + through its standard output. + </para> + + <para><replaceable>argument</> can be either a text constant or a + <literal>:</><replaceable>variablename</> reference to a variable of + any types. If you want to use <replaceable>argument</> starting with + colons, you need to add an additional colon at the beginning of + <replaceable>argument</>. + </para> + + <para> + Example: +<programlisting> +\setshell variable_to_be_assigned command literal_argument :variable ::literal_starting_with_colon +</programlisting></para> + </listitem> + </varlistentry> + + <varlistentry> + <term> + <literal>\shell <replaceable>command</> [ <replaceable>argument</> ... ]</literal> + </term> + + <listitem> + <para> + Same as <literal>\setshell</literal>, but the result is ignored. + </para> + + <para> + Example: +<programlisting> +\shell command literal_argument :variable ::literal_starting_with_colon +</programlisting></para> + </listitem> + </varlistentry> + </variablelist> + + <para> + As an example, the full definition of the built-in TPC-B-like + transaction is: + +<programlisting> +\set nbranches :scale +\set ntellers 10 * :scale +\set naccounts 100000 * :scale +\setrandom aid 1 :naccounts +\setrandom bid 1 :nbranches +\setrandom tid 1 :ntellers +\setrandom delta -5000 5000 +BEGIN; +UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; +SELECT abalance FROM pgbench_accounts WHERE aid = :aid; +UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; +UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; +INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP); +END; +</programlisting> + + This script allows each iteration of the transaction to reference + different, randomly-chosen rows. (This example also shows why it's + important for each client session to have its own variables — + otherwise they'd not be independently touching different rows.) + </para> + + </refsect2> + + <refsect2> + <title>Per-Transaction Logging</title> + + <para> + With the <option>-l</> option but without the <option>--aggregate-interval</option>, + <application>pgbench</> writes the time taken by each transaction + to a log file. The log file will be named + <filename>pgbench_log.<replaceable>nnn</></filename>, where + <replaceable>nnn</> is the PID of the pgbench process. + If the <option>-j</> option is 2 or higher, creating multiple worker + threads, each will have its own log file. The first worker will use the + same name for its log file as in the standard single worker case. + The additional log files for the other workers will be named + <filename>pgbench_log.<replaceable>nnn</>.<replaceable>mmm</></filename>, + where <replaceable>mmm</> is a sequential number for each worker starting + with 1. + </para> + + <para> + The format of the log is: + +<synopsis> +<replaceable>client_id</> <replaceable>transaction_no</> <replaceable>time</> <replaceable>file_no</> <replaceable>time_epoch</> <replaceable>time_us</> <optional><replaceable>schedule_lag</replaceable></optional> +</synopsis> + + where <replaceable>time</> is the total elapsed transaction time in microseconds, + <replaceable>file_no</> identifies which script file was used + (useful when multiple scripts were specified with <option>-f</>), + and <replaceable>time_epoch</>/<replaceable>time_us</> are a + UNIX epoch format timestamp and an offset + in microseconds (suitable for creating an ISO 8601 + timestamp with fractional seconds) showing when + the transaction completed. + Field <replaceable>schedule_lag</> is the difference between the + transaction's scheduled start time, and the time it actually started, in + microseconds. It is only present when the <option>--rate</> option is used. + The last field <replaceable>skipped_transactions</> reports the number of + transactions skipped because they were too far behind schedule. It is only + present when both options <option>--rate</> and <option>--latency-limit</> + are used. + </para> + + <para> + Here is a snippet of the log file generated: +<screen> + 0 199 2241 0 1175850568 995598 + 0 200 2465 0 1175850568 998079 + 0 201 2513 0 1175850569 608 + 0 202 2038 0 1175850569 2663 +</screen> + + Another example with --rate=100 and --latency-limit=5 (note the additional + <replaceable>schedule_lag</> column): +<screen> + 0 81 4621 0 1412881037 912698 3005 + 0 82 6173 0 1412881037 914578 4304 + 0 83 skipped 0 1412881037 914578 5217 + 0 83 skipped 0 1412881037 914578 5099 + 0 83 4722 0 1412881037 916203 3108 + 0 84 4142 0 1412881037 918023 2333 + 0 85 2465 0 1412881037 919759 740 +</screen> + In this example, transaction 82 was late, because it's latency (6.173 ms) was + over the 5 ms limit. The next two transactions were skipped, because they + were already late before they were even started. + </para> + + <para> + When running a long test on hardware that can handle a lot of transactions, + the log files can become very large. The <option>--sampling-rate</> option + can be used to log only a random sample of transactions. + </para> + </refsect2> + + <refsect2> + <title>Aggregated Logging</title> + + <para> + With the <option>--aggregate-interval</option> option, the logs use a bit different format: + +<synopsis> +<replaceable>interval_start</> <replaceable>num_of_transactions</> <replaceable>latency_sum</> <replaceable>latency_2_sum</> <replaceable>min_latency</> <replaceable>max_latency</> <optional><replaceable>lag_sum</> <replaceable>lag_2_sum</> <replaceable>min_lag</> <replaceable>max_lag</> <optional><replaceable>skipped_transactions</></optional></optional> +</synopsis> + + where <replaceable>interval_start</> is the start of the interval (UNIX epoch + format timestamp), <replaceable>num_of_transactions</> is the number of transactions + within the interval, <replaceable>latency_sum</replaceable> is a sum of latencies + (so you can compute average latency easily). The following two fields are useful + for variance estimation - <replaceable>latency_sum</> is a sum of latencies and + <replaceable>latency_2_sum</> is a sum of 2nd powers of latencies. The last two + fields are <replaceable>min_latency</> - a minimum latency within the interval, and + <replaceable>max_latency</> - maximum latency within the interval. A transaction is + counted into the interval when it was committed. The fields in the end, + <replaceable>lag_sum</>, <replaceable>lag_2_sum</>, <replaceable>min_lag</>, + and <replaceable>max_lag</>, are only present if the <option>--rate</> + option is used. The very last one, <replaceable>skipped_transactions</>, + is only present if the option <option>--latency-limit</> is present, too. + They are calculated from the time each transaction had to wait for the + previous one to finish, i.e. the difference between each transaction's + scheduled start time and the time it actually started. + </para> + + <para> + Here is example outputs: +<screen> +1345828501 5601 1542744 483552416 61 2573 +1345828503 7884 1979812 565806736 60 1479 +1345828505 7208 1979422 567277552 59 1391 +1345828507 7685 1980268 569784714 60 1398 +1345828509 7073 1979779 573489941 236 1411 +</screen></para> + + <para> + Notice that while the plain (unaggregated) log file contains index + of the custom script files, the aggregated log does not. Therefore if + you need per script data, you need to aggregate the data on your own. + </para> + + </refsect2> + + <refsect2> + <title>Per-Statement Latencies</title> + + <para> + With the <option>-r</> option, <application>pgbench</> collects + the elapsed transaction time of each statement executed by every + client. It then reports an average of those values, referred to + as the latency for each statement, after the benchmark has finished. + </para> + + <para> + For the default script, the output will look similar to this: +<screen> +starting vacuum...end. +transaction type: TPC-B (sort of) +scaling factor: 1 +query mode: simple +number of clients: 10 +number of threads: 1 +number of transactions per client: 1000 +number of transactions actually processed: 10000/10000 +tps = 618.764555 (including connections establishing) +tps = 622.977698 (excluding connections establishing) +statement latencies in milliseconds: + 0.004386 \set nbranches 1 * :scale + 0.001343 \set ntellers 10 * :scale + 0.001212 \set naccounts 100000 * :scale + 0.001310 \setrandom aid 1 :naccounts + 0.001073 \setrandom bid 1 :nbranches + 0.001005 \setrandom tid 1 :ntellers + 0.001078 \setrandom delta -5000 5000 + 0.326152 BEGIN; + 0.603376 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; + 0.454643 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; + 5.528491 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; + 7.335435 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; + 0.371851 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP); + 1.212976 END; +</screen> + </para> + + <para> + If multiple script files are specified, the averages are reported + separately for each script file. + </para> + + <para> + Note that collecting the additional timing information needed for + per-statement latency computation adds some overhead. This will slow + average execution speed and lower the computed TPS. The amount + of slowdown varies significantly depending on platform and hardware. + Comparing average TPS values with and without latency reporting enabled + is a good way to measure if the timing overhead is significant. + </para> + </refsect2> + + <refsect2> + <title>Good Practices</title> + + <para> + It is very easy to use <application>pgbench</> to produce completely + meaningless numbers. Here are some guidelines to help you get useful + results. + </para> + + <para> + In the first place, <emphasis>never</> believe any test that runs + for only a few seconds. Use the <option>-t</> or <option>-T</> option + to make the run last at least a few minutes, so as to average out noise. + In some cases you could need hours to get numbers that are reproducible. + It's a good idea to try the test run a few times, to find out if your + numbers are reproducible or not. + </para> + + <para> + For the default TPC-B-like test scenario, the initialization scale factor + (<option>-s</>) should be at least as large as the largest number of + clients you intend to test (<option>-c</>); else you'll mostly be + measuring update contention. There are only <option>-s</> rows in + the <structname>pgbench_branches</> table, and every transaction wants to + update one of them, so <option>-c</> values in excess of <option>-s</> + will undoubtedly result in lots of transactions blocked waiting for + other transactions. + </para> + + <para> + The default test scenario is also quite sensitive to how long it's been + since the tables were initialized: accumulation of dead rows and dead space + in the tables changes the results. To understand the results you must keep + track of the total number of updates and when vacuuming happens. If + autovacuum is enabled it can result in unpredictable changes in measured + performance. + </para> + + <para> + A limitation of <application>pgbench</> is that it can itself become + the bottleneck when trying to test a large number of client sessions. + This can be alleviated by running <application>pgbench</> on a different + machine from the database server, although low network latency will be + essential. It might even be useful to run several <application>pgbench</> + instances concurrently, on several client machines, against the same + database server. + </para> + </refsect2> + </refsect1> +</refentry> |