doc/src/sgml/page.sgml


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301

<chapter id="page">

<title>Page Files</title>

<abstract>
<para>
A description of the database file default page format.
</para>
</abstract>

<para>
This section provides an overview of the page format used by <productname>PostgreSQL</productname>
tables.  User-defined access methods need not use this page format.
</para>

<para>
In the following explanation, a
<firstterm>byte</firstterm>
is assumed to contain 8 bits.  In addition, the term
<firstterm>item</firstterm>
refers to data that is stored in <productname>PostgreSQL</productname> tables.
</para>

<para>

<xref linkend="page-table"> shows how pages in both normal
 <productname>PostgreSQL</productname> tables and
 <productname>PostgreSQL</productname> indexes (e.g., a B-tree index)
are structured. This structure is also used for toast tables and sequences.
There are five parts to each page.

</para>

<table tocentry="1" id="page-table">
<title>Sample Page Layout</title>
<titleabbrev>Page Layout</titleabbrev>
<tgroup cols="2">
<thead>
<row>
<entry>
Item
</entry>
<entry>Description</entry>
</row>
</thead>

<tbody>

<row>
 <entry>PageHeaderData</entry>
 <entry>20 bytes long. Contains general information about the page to allow to access it.</entry>
</row>

<row>
<entry>itemPointerData</entry>
<entry>List of (offset,length) pairs pointing to the actual item.</entry>
</row>

<row>
<entry>Free space</entry>
<entry>The unallocated space. All new tuples are allocated from here, generally from the end.</entry>
</row>

<row>
<entry>items</entry>
<entry>The actual items themselves. Different access method have different data here.</entry>
</row>

<row>
<entry>Special Space</entry>
<entry>Access method specific data. Different method store different data. Unused by normal tables.</entry>
</row>

</tbody>
</tgroup>
</table>

 <para>

  The first 20 bytes of each page consists of a page header
  (PageHeaderData). It's format is detailed in <xref
  linkend="pageheaderdata-table">. The first two fields deal with WAL
  related stuff. This is followed by three 2-byte integer fields
  (<firstterm>lower</firstterm>, <firstterm>upper</firstterm>, and
  <firstterm>special</firstterm>). These represent byte offsets to the start
  of unallocated space, to the end of unallocated space, and to the start of
  the special space. 
  
 </para>
 
 <table tocentry="1" id="pageheaderdata-table">
 <title>PageHeaderData Layout</title>
 <titleabbrev>PageHeaderData Layout</titleabbrev>
 <tgroup cols="4">   
 <thead>
  <row> 
   <entry>Field</entry>
   <entry>Type</entry>
   <entry>Length</entry>
   <entry>Description</entry>
  </row>
 </thead>
 <tbody>
  <row>
   <entry>pd_lsn</entry>
   <entry>XLogRecPtr</entry>
   <entry>6 bytes</entry>
   <entry>LSN: next byte after last byte of xlog</entry>
  </row>
  <row>
   <entry>pd_sui</entry>
   <entry>StartUpID</entry>
   <entry>4 bytes</entry>
   <entry>SUI of last changes (currently it's used by heap AM only)</entry>
  </row>
  <row>
   <entry>pd_lower</entry>
   <entry>LocationIndex</entry>
   <entry>2 bytes</entry>
   <entry>Offset to start of free space.</entry>
  </row>
  <row>
   <entry>pd_upper</entry>
   <entry>LocationIndex</entry>
   <entry>2 bytes</entry>
   <entry>Offset to end of free space.</entry>
  </row>
  <row>
   <entry>pd_special</entry>
   <entry>LocationIndex</entry>
   <entry>2 bytes</entry>
   <entry>Offset to start of special space.</entry>
  </row>
  <row>
   <entry>pd_opaque</entry>
   <entry>OpaqueData</entry>
   <entry>2 bytes</entry>
   <entry>AM-generic information. Currently just stores the page size.</entry>
  </row>
 </tbody>
 </tgroup>
 </table>

 <para>  
  Special space is a region at the end of the page that is allocated at page
  initialization time and contains information specific to an access method. 
  The last 2 bytes of the page header, <firstterm>opaque</firstterm>,
  currently only stores the page size.  Page size is stored in each page
  because frames in the buffer pool may be subdivided into equal sized pages
  on a frame by frame basis within a table (is this true? - mvo).

 </para>

 <para>

  Following the page header are item identifiers
  (<firstterm>ItemIdData</firstterm>).  New item identifiers are allocated
  from the first four bytes of unallocated space.  Because an item
  identifier is never moved until it is freed, its index may be used to
  indicate the location of an item on a page.  In fact, every pointer to an
  item (<firstterm>ItemPointer</firstterm>, also know as
  <firstterm>CTID</firstterm>) created by
  <productname>PostgreSQL</productname> consists of a frame number and an
  index of an item identifier.  An item identifier contains a byte-offset to
  the start of an item, its length in bytes, and a set of attribute bits
  which affect its interpretation.

 </para>

 <para>
 
  The items themselves are stored in space allocated backwards from the end
  of unallocated space.  The exact structure varies depending on what the
  table is to contain. Sequences and tables both use a structure named
  <firstterm>HeapTupleHeaderData</firstterm>, describe below.

 </para>
 
 <para>
 
  The final section is the "special section" which may contain anything the
  access method wishes to store. Ordinary tables do not use this at all
  (indicated by setting the offset to the pagesize).
  
 </para>
 
 <para>

  All tuples are structured the same way. A header of around 31 bytes
  followed by an optional null bitmask and the data. The header is detailed
  below in <xref linkend="heaptupleheaderdata-table">.  The null bitmask is
  only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in the
  <firstterm>t_infomask</firstterm>. If it is present it takes up the space
  between the end of the header and the beginning of the data, as indicated
  by the <firstterm>t_hoff</firstterm> field. In this list of bits, a 1 bit
  indicates not-null, a 0 bit is a null.
  
 </para>
 
 <table tocentry="1" id="heaptupleheaderdata-table">
 <title>HeapTupleHeaderData Layout</title>
 <titleabbrev>HeapTupleHeaderData Layout</titleabbrev>
 <tgroup cols="4">   
 <thead>
  <row> 
   <entry>Field</entry>
   <entry>Type</entry>
   <entry>Length</entry>
   <entry>Description</entry>
  </row>
 </thead>
 <tbody>
  <row>
   <entry>t_oid</entry>
   <entry>Oid</entry>
   <entry>4 bytes</entry>
   <entry>OID of this tuple</entry>
  </row>
  <row>
   <entry>t_cmin</entry>
   <entry>CommandId</entry>
   <entry>4 bytes</entry>
   <entry>insert CID stamp</entry>
  </row>
  <row>
   <entry>t_cmax</entry>
   <entry>CommandId</entry>
   <entry>4 bytes</entry>
   <entry>delete CID stamp</entry>
  </row>
  <row>
   <entry>t_xmin</entry>
   <entry>TransactionId</entry>
   <entry>4 bytes</entry>
   <entry>insert XID stamp</entry>
  </row>
  <row>
   <entry>t_xmax</entry>
   <entry>TransactionId</entry>
   <entry>4 bytes</entry>
   <entry>delete XID stamp</entry>
  </row>
  <row>
   <entry>t_ctid</entry>
   <entry>ItemPointerData</entry>
   <entry>6 bytes</entry>
   <entry>current TID of this or newer tuple</entry>
  </row>
  <row>
   <entry>t_natts</entry>
   <entry>int16</entry>
   <entry>2 bytes</entry>
   <entry>number of attributes</entry>
  </row>
  <row>
   <entry>t_infomask</entry>
   <entry>uint16</entry>
   <entry>2 bytes</entry>
   <entry>Various flags</entry>
  </row>
  <row>
   <entry>t_hoff</entry>
   <entry>uint8</entry>
   <entry>1 byte</entry>
   <entry>length of tuple header. Also offset of data.</entry>
  </row>
 </tbody>
 </tgroup>
 </table>

 <para>
 
  All the details may be found in src/include/storage/bufpage.h.
  
 </para>

 <para>
 
  Interpreting the actual data can only be done with information obtained
  from other tables, mostly <firstterm>pg_attribute</firstterm>. The
  particular fields are <firstterm>attlen</firstterm> and
  <firstterm>attalign</firstterm>. There is no way to directly get a
  particular attribute, except when there are only fixed width fields and no
  NULLs. All this trickery is wrapped up in the functions
  <firstterm>heap_getattr</firstterm>, <firstterm>fastgetattr</firstterm>
  and <firstterm>heap_getsysattr</firstterm>.
  
 </para>
 <para>

  To read the data you need to examine each attribute in turn. First check
  whether the field is NULL according to the null bitmap. If it is, go to
  the next. Then make sure you have the right alignment.  If the field is a
  fixed width field, then all the bytes are simply placed. If it's a
  variable length field (attlen == -1) then it's a bit more complicated,
  using the variable length structure <firstterm>varattrib</firstterm>.
  Depending on the flags, the data may be either inline, compressed or in
  another table (TOAST).
  
 </para>
</chapter>