aboutsummaryrefslogtreecommitdiff
path: root/doc/src/sgml/pgtesttiming.sgml
blob: 86af4e28dc0a7a54a62f2e6a4b080404380ce972 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
<!-- doc/src/sgml/pgtesttiming.sgml -->

<sect1 id="pgtesttiming" xreflabel="pg_test_timing">
 <title>pg_test_timing</title>

 <indexterm zone="pgtesttiming">
  <primary>pg_test_timing</primary>
 </indexterm>

 <para>
  <application>pg_test_timing</> is a tool to measure the timing overhead
  on your system and confirm that the system time never moves backwards.
  Systems that are slow to collect timing data can give less accurate 
  <command>EXPLAIN ANALYZE</command> results.
 </para>

 <sect2>
  <title>Usage</title>

<synopsis>
pg_test_timing [options]
</synopsis>

   <para>
    <application>pg_test_timing</application> accepts the following
    command-line options:

    <variablelist>

     <varlistentry>
      <term><option>-d <replaceable class="parameter">duration</replaceable></option></term>
      <term><option>--duration=<replaceable class="parameter">duration</replaceable></option></term>
      <listitem>
       <para>
        Specifies the test duration, in seconds. Longer durations
        give slightly better accuracy, and are more likely to discover
        problems with the system clock moving backwards. The default
        test duration is 3 seconds.
       </para>
      </listitem>
     </varlistentry>

    </variablelist>
   </para>

 </sect2>

 <sect2>
  <title>Interpreting results</title>

  <para>
        Good results will show most (>90%) individual timing calls take less
        than one microsecond. Average per loop overhead will be even lower,
        below 100 nanoseconds. This example from an Intel i7-860 system using
        a TSC clock source shows excellent performance:
  </para>

<screen>
Testing timing overhead for 3 seconds.
Per loop time including overhead: 35.96 nsec
Histogram of timing durations:
   < usec:      count   percent
       16:          2  0.00000%
        8:         13  0.00002%
        4:        126  0.00015%
        2:    2999652  3.59518%
        1:   80435604 96.40465%
</screen>

  <para>
        Note that different units are used for the per loop time than the
        histogram. The loop can have resolution within a few nanoseconds
        (nsec), while the individual timing calls can only resolve down to
        one microsecond (usec).
  </para>

 </sect2>
 <sect2>
  <title>Measuring executor timing overhead</title>

  <para>
        When the query executor is running a statement using
        <command>EXPLAIN ANALYZE</command>, individual operations are
        timed as well as showing a summary.  The overhead of your system
        can be checked by counting rows with the psql program:
  </para>

<screen>
CREATE TABLE t AS SELECT * FROM generate_series(1,100000);
\timing
SELECT COUNT(*) FROM t;
EXPLAIN ANALYZE SELECT COUNT(*) FROM t;
</screen>

  <para>
        The i7-860 system measured runs the count query in 9.8 ms while
        the <command>EXPLAIN ANALYZE</command> version takes 16.6 ms,
        each processing just over 100,000 rows.  That 6.8 ms difference
        means the timing overhead per row is 68 ns, about twice what
        pg_test_timing estimated it would be.  Even that relatively
        small amount of overhead is making the fully timed count statement
        take almost 70% longer.  On more substantial queries, the
        timing overhead would be less problematic.
  </para>

 </sect2>
 <sect2>
  <title>Changing time sources</title>
  <para>
        On some newer Linux systems, it's possible to change the clock
        source used to collect timing data at any time.  A second example
        shows the slowdown possible from switching to the slower acpi_pm
        time source, on the same system used for the fast results above:
  </para>

<screen>
# cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
tsc hpet acpi_pm
# echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource
# pg_test_timing
Per loop time including overhead: 722.92 nsec
Histogram of timing durations:
   < usec:      count   percent
       16:          3  0.00007%
        8:        563  0.01357%
        4:       3241  0.07810%
        2:    2990371 72.05956%
        1:    1155682 27.84870%
</screen>

  <para>
       In this configuration, the sample <command>EXPLAIN ANALYZE</command>
       above takes 115.9 ms.  That's 1061 nsec of timing overhead, again
       a small multiple of what's measured directly by this utility.
       That much timing overhead means the actual query itself is only
       taking a tiny fraction of the accounted for time, most of it
       is being consumed in overhead instead.  In this configuration,
       any <command>EXPLAIN ANALYZE</command> totals involving many
       timed operations would be inflated significantly by timing overhead.
  </para>

  <para>
       FreeBSD also allows changing the time source on the fly, and
       it logs information about the timer selected during boot:
  </para>

<screen>
dmesg | grep "Timecounter"       
sysctl kern.timecounter.hardware=TSC
</screen>

  <para>
       Other systems may only allow setting the time source on boot.
       On older Linux systems the "clock" kernel setting is the only way
       to make this sort of change.  And even on some more recent ones,
       the only option you'll see for a clock source is "jiffies".  Jiffies
       are the older Linux software clock implementation, which can have
       good resolution when it's backed by fast enough timing hardware,
       as in this example:
  </para>       

<screen>
$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource 
jiffies 
$ dmesg | grep time.c
time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
time.c: Detected 2400.153 MHz processor.
$ ./pg_test_timing 
Testing timing overhead for 3 seconds.
Per timing duration including loop overhead: 97.75 ns
Histogram of timing durations:
   < usec:      count   percent
       32:          1  0.00000%
       16:          1  0.00000%
        8:         22  0.00007%
        4:       3010  0.00981%
        2:    2993204  9.75277%
        1:   27694571 90.23734%
</screen>

 </sect2>
 <sect2>
  <title>Clock hardware and timing accuracy</title>

  <para>
       Collecting accurate timing information is normally done on computers
       using hardware clocks with various levels of accuracy.  With some
       hardware the operating systems can pass the system clock time almost
       directly to programs.  A system clock can also be derived from a chip
       that simply provides timing interrupts, periodic ticks at some known
       time interval.  In either case, operating system kernels provide
       a clock source that hides these details.  But the accuracy of that
       clock source and how quickly it can return results varies based
       on the underlying hardware.
  </para>

  <para>
        Inaccurate time keeping can result in system instability.  Test
        any change to the clock source very carefully.  Operating system
        defaults are sometimes made to favor reliability over best
        accuracy. And if you are using a virtual machine, look into the
        recommended time sources compatible with it.  Virtual hardware
        faces additional difficulties when emulating timers, and there
        are often per operating system settings suggested by vendors.
  </para>

  <para>
        The Time Stamp Counter (TSC) clock source is the most accurate one
        available on current generation CPUs. It's the preferred way to track
        the system time when it's supported by the operating system and the
        TSC clock is reliable. There are several ways that TSC can fail
        to provide an accurate timing source, making it unreliable. Older
        systems can have a TSC clock that varies based on the CPU
        temperature, making it unusable for timing. Trying to use TSC on some
        older multi-core CPUs can give a reported time that's inconsistent
        among multiple cores. This can result in the time going backwards, a
        problem this program checks for.  And even the newest systems can
        fail to provide accurate TSC timing with very aggressive power saving
        configurations.
  </para>

  <para>
        Newer operating systems may check for the known TSC problems and
        switch to a slower, more stable clock source when they are seen. 
        If your system supports TSC time but doesn't default to that, it
        may be disabled for a good reason.  And some operating systems may
        not detect all the possible problems correctly, or will allow using
        TSC even in situations where it's known to be inaccurate.
  </para>

  <para>
        The High Precision Event Timer (HPET) is the preferred timer on
        systems where it's available and TSC is not accurate.  The timer
        chip itself is programmable to allow up to 100 nanosecond resolution,
        but you may not see that much accuracy in your system clock.
  </para>

  <para>
        Advanced Configuration and Power Interface (ACPI) provides a
        Power Management (PM) Timer, which Linux refers to as the acpi_pm.
        The clock derived from acpi_pm will at best provide 300 nanosecond
        resolution.
  </para>

  <para>
        Timers used on older PC hardware including the 8254 Programmable
        Interval Timer (PIT), the real-time clock (RTC), the Advanced
        Programmable Interrupt Controller (APIC) timer, and the Cyclone
        timer.  These timers aim for millisecond resolution.
  </para>
 </sect2>

 <sect2>
  <title>Author</title>

  <para>
   Ants Aasma <email>ants.aasma@eesti.ee</email>
  </para>
 </sect2>

</sect1>