1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
|
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/xindex.sgml,v 1.17 2001/08/31 04:17:13 ishii Exp $
Postgres documentation
-->
<chapter id="xindex">
<title>Interfacing Extensions To Indexes</title>
<para>
The procedures described thus far let you define a new type, new
functions and new operators. However, we cannot yet define a secondary
index (such as a <acronym>B-tree</acronym>, <acronym>R-tree</acronym> or
hash access method) over a new type or its operators.
</para>
<para>
Look back at
<xref linkend="EXTEND-CATALOGS">.
The right half shows the catalogs that we must modify in order to tell
<productname>Postgres</productname> how to use a user-defined type and/or
user-defined operators with an index (i.e., <filename>pg_am, pg_amop,
pg_amproc, pg_operator</filename> and <filename>pg_opclass</filename>).
Unfortunately, there is no simple command to do this. We will demonstrate
how to modify these catalogs through a running example: a new operator
class for the <acronym>B-tree</acronym> access method that stores and
sorts complex numbers in ascending absolute value order.
</para>
<para>
The <filename>pg_am</filename> table contains one row for every index
access method. Support for the heap access method is built into
<productname>Postgres</productname>, but every other access method is
described in <filename>pg_am</filename>. The schema is
<table tocentry="1">
<title>Index Access Method Schema</title>
<tgroup cols="2">
<thead>
<row>
<entry>Column</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>amname</entry>
<entry>name of the access method</entry>
</row>
<row>
<entry>amowner</entry>
<entry>user id of the owner</entry>
</row>
<row>
<entry>amstrategies</entry>
<entry>number of strategies for this access method (see below)</entry>
</row>
<row>
<entry>amsupport</entry>
<entry>number of support routines for this access method (see below)</entry>
</row>
<row>
<entry>amorderstrategy</entry>
<entry>zero if the index offers no sort order, otherwise the strategy
number of the strategy operator that describes the sort order</entry>
</row>
<row>
<entry>amcanunique</entry>
<entry>does AM support UNIQUE indexes?</entry>
</row>
<row>
<entry>amcanmulticol</entry>
<entry>does AM support multi-column indexes?</entry>
</row>
<row>
<entry>amindexnulls</entry>
<entry>does AM support NULL index entries?</entry>
</row>
<row>
<entry>amconcurrent</entry>
<entry>does AM support concurrent updates?</entry>
</row>
<row>
<entry>amgettuple</entry>
</row>
<row>
<entry>aminsert</entry>
</row>
<row>
<entry>...</entry>
<entry>procedure identifiers for interface routines to the access
method. For example, regproc ids for opening, closing, and
getting rows from the access method appear here.</entry>
</row>
</tbody>
</tgroup>
</table>
</para>
<para>
The <acronym>object ID</acronym> of the row in
<filename>pg_am</filename> is used as a foreign key in a lot of other
tables. You do not need to add a new row to this table; all that
you are interested in is the <acronym>object ID</acronym> of the access
method you want to extend:
<programlisting>
SELECT oid FROM pg_am WHERE amname = 'btree';
oid
-----
403
(1 row)
</programlisting>
We will use that <command>SELECT</command> in a <command>WHERE</command>
clause later.
</para>
<para>
The <filename>amstrategies</filename> column exists to standardize
comparisons across data types. For example, <acronym>B-tree</acronym>s
impose a strict ordering on keys, lesser to greater. Since
<productname>Postgres</productname> allows the user to define operators,
<productname>Postgres</productname> cannot look at the name of an operator
(e.g., ">" or "<") and tell what kind of comparison it is. In fact,
some access methods don't impose any ordering at all. For example,
<acronym>R-tree</acronym>s express a rectangle-containment relationship,
whereas a hashed data structure expresses only bitwise similarity based
on the value of a hash function. <productname>Postgres</productname>
needs some consistent way of taking a qualification in your query,
looking at the operator and then deciding if a usable index exists. This
implies that <productname>Postgres</productname> needs to know, for
example, that the "<=" and ">" operators partition a
<acronym>B-tree</acronym>. <productname>Postgres</productname>
uses strategies to express these relationships between
operators and the way they can be used to scan indexes.
</para>
<para>
Defining a new set of strategies is beyond the scope of this discussion,
but we'll explain how <acronym>B-tree</acronym> strategies work because
you'll need to know that to add a new B-tree operator class. In the
<filename>pg_am</filename> table, the amstrategies column is the
number of strategies defined for this access method. For
<acronym>B-tree</acronym>s, this number is 5. These strategies
correspond to
<table tocentry="1">
<title>B-tree Strategies</title>
<titleabbrev>B-tree</titleabbrev>
<tgroup cols="2">
<thead>
<row>
<entry>Operation</entry>
<entry>Index</entry>
</row>
</thead>
<tbody>
<row>
<entry>less than</entry>
<entry>1</entry>
</row>
<row>
<entry>less than or equal</entry>
<entry>2</entry>
</row>
<row>
<entry>equal</entry>
<entry>3</entry>
</row>
<row>
<entry>greater than or equal</entry>
<entry>4</entry>
</row>
<row>
<entry>greater than</entry>
<entry>5</entry>
</row>
</tbody>
</tgroup>
</table>
</para>
<para>
The idea is that you'll need to add operators corresponding to the
comparisons above to the <filename>pg_amop</filename> relation (see below).
The access method code can use these strategy numbers, regardless of data
type, to figure out how to partition the <acronym>B-tree</acronym>,
compute selectivity, and so on. Don't worry about the details of adding
operators yet; just understand that there must be a set of these
operators for <filename>int2, int4, oid,</filename> and every other
data type on which a <acronym>B-tree</acronym> can operate.
</para>
<para>
Sometimes, strategies aren't enough information for the system to figure
out how to use an index. Some access methods require additional support
routines in order to work. For example, the <acronym>B-tree</acronym>
access method must be able to compare two keys and determine whether one
is greater than, equal to, or less than the other. Similarly, the
<acronym>R-tree</acronym> access method must be able to compute
intersections, unions, and sizes of rectangles. These
operations do not correspond to operators used in qualifications in
SQL queries; they are administrative routines used by
the access methods, internally.
</para>
<para>
In order to manage diverse support routines consistently across all
<productname>Postgres</productname> access methods,
<filename>pg_am</filename> includes a column called
<filename>amsupport</filename>. This column records the number of
support routines used by an access method. For <acronym>B-tree</acronym>s,
this number is one -- the routine to take two keys and return -1, 0, or
+1, depending on whether the first key is less than, equal
to, or greater than the second.
<note>
<para>
Strictly speaking, this routine can return a negative
number (< 0), zero, or a non-zero positive number (> 0).
</para>
</note>
</para>
<para>
The <filename>amstrategies</filename> entry in <filename>pg_am</filename>
is just the number
of strategies defined for the access method in question. The operators
for less than, less equal, and so on don't appear in
<filename>pg_am</filename>. Similarly, <filename>amsupport</filename>
is just the number of support routines required by the access
method. The actual routines are listed elsewhere.
</para>
<para>
By the way, the <filename>amorderstrategy</filename> entry tells whether
the access method supports ordered scan. Zero means it doesn't; if it
does, <filename>amorderstrategy</filename> is the number of the strategy
routine that corresponds to the ordering operator. For example, B-tree
has <filename>amorderstrategy</filename> = 1 which is its
"less than" strategy number.
</para>
<para>
The next table of interest is <filename>pg_opclass</filename>. This table
defines operator class names and input data types for each of the operator
classes supported by a given index access method. The same class name
can be used for several different access methods (for example, both B-tree
and hash access methods have operator classes named
<filename>oid_ops</filename>), but a separate
<filename>pg_opclass</filename> row must appear for each access method.
The <filename>oid</filename> of the <filename>pg_opclass</filename> row is
used as a foreign
key in other tables to associate specific operators and support routines
with the operator class.
</para>
<para>
You need to add a row with your opclass name (for example,
<filename>complex_abs_ops</filename>) to
<filename>pg_opclass</filename>:
<programlisting>
INSERT INTO pg_opclass (opcamid, opcname, opcintype, opcdefault, opckeytype)
VALUES (
(SELECT oid FROM pg_am WHERE amname = 'btree'),
'complex_abs_ops',
(SELECT oid FROM pg_type WHERE typname = 'complex'),
true,
0);
SELECT oid, *
FROM pg_opclass
WHERE opcname = 'complex_abs_ops';
oid | opcamid | opcname | opcintype | opcdefault | opckeytype
--------+---------+-----------------+-----------+------------+------------
277975 | 403 | complex_abs_ops | 277946 | t | 0
(1 row)
</programlisting>
Note that the oid for your <filename>pg_opclass</filename> row will
be different! Don't worry about this though. We'll get this number
from the system later just like we got the oid of the type here.
</para>
<para>
The above example assumes that you want to make this new opclass the
default B-tree opclass for the <filename>complex</filename> datatype.
If you don't, just set <filename>opcdefault</filename> to false instead.
<filename>opckeytype</filename> is not described here; it should always
be zero for B-tree opclasses.
</para>
<para>
So now we have an access method and an operator class.
We still need a set of operators. The procedure for
defining operators was discussed earlier in this manual.
For the <filename>complex_abs_ops</filename> operator class on B-trees,
the operators we require are:
<programlisting>
absolute value less-than
absolute value less-than-or-equal
absolute value equal
absolute value greater-than-or-equal
absolute value greater-than
</programlisting>
</para>
<para>
Suppose the code that implements these functions
is stored in the file
<filename>PGROOT/src/tutorial/complex.c</filename>.
</para>
<para>
Part of the C code looks like this: (note that we will only show the
equality operator for the rest of the examples. The other four
operators are very similar. Refer to <filename>complex.c</filename>
or <filename>complex.source</filename> for the details.)
<programlisting>
#define Mag(c) ((c)->x*(c)->x + (c)->y*(c)->y)
bool
complex_abs_eq(Complex *a, Complex *b)
{
double amag = Mag(a), bmag = Mag(b);
return (amag==bmag);
}
</programlisting>
</para>
<para>
We make the function known to Postgres like this:
<programlisting>
CREATE FUNCTION complex_abs_eq(complex, complex)
RETURNS bool
AS 'PGROOT/tutorial/obj/complex.so'
LANGUAGE 'c';
</programlisting>
</para>
<para>
There are some important things that are happening here.
</para>
<para>
First, note that operators for less-than, less-than-or-equal, equal,
greater-than-or-equal, and greater-than for <filename>complex</filename>
are being defined. We can only have one operator named, say, = and
taking type <filename>complex</filename> for both operands. In this case
we don't have any other operator = for <filename>complex</filename>,
but if we were building a practical datatype we'd probably want = to
be the ordinary equality operation for complex numbers. In that case,
we'd need to use some other operator name for complex_abs_eq.
</para>
<para>
Second, although Postgres can cope with operators having
the same name as long as they have different input datatypes, C can only
cope with one global routine having a given name, period. So we shouldn't
name the C function something simple like <filename>abs_eq</filename>.
Usually it's a good practice to include the datatype name in the C
function name, so as not to conflict with functions for other datatypes.
</para>
<para>
Third, we could have made the Postgres name of the function
<filename>abs_eq</filename>, relying on Postgres to distinguish it
by input datatypes from any other Postgres function of the same name.
To keep the example simple, we make the function have the same names
at the C level and Postgres level.
</para>
<para>
Finally, note that these operator functions return Boolean values.
In practice, all operators defined as index access method strategies
must return Boolean, since they must appear at the top level of a WHERE
clause to be used with an index.
(On the other
hand, the support function returns whatever the particular access method
expects -- in this case, a signed integer.)
</para>
<para>
The final routine in the
file is the "support routine" mentioned when we discussed the amsupport
column of the <filename>pg_am</filename> table. We will use this
later on. For now, ignore it.
</para>
<para>
Now we are ready to define the operators:
<programlisting>
CREATE OPERATOR = (
leftarg = complex, rightarg = complex,
procedure = complex_abs_eq,
restrict = eqsel, join = eqjoinsel
);
</programlisting>
The important
things here are the procedure names (which are the <acronym>C</acronym>
functions defined above) and the restriction and join selectivity
functions. You should just use the selectivity functions used in
the example (see <filename>complex.source</filename>).
Note that there
are different such functions for the less-than, equal, and greater-than
cases. These must be supplied, or the optimizer will be unable to
make effective use of the index.
</para>
<para>
The next step is to add entries for these operators to
the <filename>pg_amop</filename> relation. To do this,
we'll need the <filename>oid</filename>s of the operators we just
defined. We'll look up the names of all the operators that take
two <filename>complex</filename>es, and pick ours out:
<programlisting>
SELECT o.oid AS opoid, o.oprname
INTO TEMP TABLE complex_ops_tmp
FROM pg_operator o, pg_type t
WHERE o.oprleft = t.oid and o.oprright = t.oid
and t.typname = 'complex';
opoid | oprname
--------+---------
277963 | +
277970 | <
277971 | <=
277972 | =
277973 | >=
277974 | >
(6 rows)
</programlisting>
(Again, some of your <filename>oid</filename> numbers will almost
certainly be different.) The operators we are interested in are those
with <filename>oid</filename>s 277970 through 277974. The values you
get will probably be different, and you should substitute them for the
values below. We will do this with a select statement.
</para>
<para>
Now we are ready to insert entries into <filename>pg_amop</filename> for
our new operator class. These entries must associate the correct
B-tree strategy numbers with each of the operators we need.
The command to insert the less-than operator looks like:
<programlisting>
INSERT INTO pg_amop (amopclaid, amopstrategy, amopreqcheck, amopopr)
SELECT opcl.oid, 1, false, c.opoid
FROM pg_opclass opcl, complex_ops_tmp c
WHERE
opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
opcname = 'complex_abs_ops' AND
c.oprname = '<';
</programlisting>
Now do this for the other operators substituting for the "1" in the
second line above and the "<" in the last line. Note the order:
"less than" is 1, "less than or equal" is 2, "equal" is 3, "greater
than or equal" is 4, and "greater than" is 5.
</para>
<para>
The field <filename>amopreqcheck</filename> is not discussed here; it
should always be false for B-tree operators.
</para>
<para>
The final step is registration of the "support routine" previously
described in our discussion of <filename>pg_am</filename>. The
<filename>oid</filename> of this support routine is stored in the
<filename>pg_amproc</filename> table, keyed by the operator class
<filename>oid</filename> and the support routine number.
First, we need to register the function in
<productname>Postgres</productname> (recall that we put the
<acronym>C</acronym> code that implements this routine in the bottom of
the file in which we implemented the operator routines):
<programlisting>
CREATE FUNCTION complex_abs_cmp(complex, complex)
RETURNS int4
AS 'PGROOT/tutorial/obj/complex.so'
LANGUAGE 'c';
SELECT oid, proname FROM pg_proc
WHERE proname = 'complex_abs_cmp';
oid | proname
--------+-----------------
277997 | complex_abs_cmp
(1 row)
</programlisting>
(Again, your <filename>oid</filename> number will probably be different.)
We can add the new row as follows:
<programlisting>
INSERT INTO pg_amproc (amopclaid, amprocnum, amproc)
SELECT opcl.oid, 1, p.oid
FROM pg_opclass opcl, pg_proc p
WHERE
opcamid = (SELECT oid FROM pg_am WHERE amname = 'btree') AND
opcname = 'complex_abs_ops' AND
p.proname = 'complex_abs_cmp';
</programlisting>
</para>
<para>
And we're done! (Whew.) It should now be possible to create
and use B-tree indexes on <filename>complex</filename> columns.
</para>
</chapter>
<!-- Keep this comment at the end of the file
Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End:
-->
|