aboutsummaryrefslogtreecommitdiff
path: root/doc/TODO.detail/typeconv
blob: 52cc1ff00d3835bbf81983d0f451f27fcc5be634 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
From pgsql-hackers-owner+M1833@hub.org Sat May 13 22:49:26 2000
Received: from news.tht.net (news.hub.org [216.126.91.242])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id WAA07394
	for <pgman@candle.pha.pa.us>; Sat, 13 May 2000 22:49:24 -0400 (EDT)
Received: from hub.org (majordom@hub.org [216.126.84.1])
	by news.tht.net (8.9.3/8.9.3) with ESMTP id WAB99859;
	Sat, 13 May 2000 22:44:15 -0400 (EDT)
	(envelope-from pgsql-hackers-owner+M1833@hub.org)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
	by hub.org (8.9.3/8.9.3) with ESMTP id WAA51058
	for <pgsql-hackers@postgreSQL.org>; Sat, 13 May 2000 22:41:16 -0400 (EDT)
	(envelope-from tgl@sss.pgh.pa.us)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id WAA18343
	for <pgsql-hackers@postgreSQL.org>; Sat, 13 May 2000 22:40:38 -0400 (EDT)
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] Proposal for fixing numeric type-resolution issues
Date: Sat, 13 May 2000 22:40:38 -0400
Message-ID: <18340.958272038@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
X-Mailing-List: pgsql-hackers@postgresql.org
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: ORr

We've got a collection of problems that are related to the parser's
inability to make good type-resolution choices for numeric constants.
In some cases you get a hard error; for example "NumericVar + 4.4"
yields
ERROR:  Unable to identify an operator '+' for types 'numeric' and 'float8'
        You will have to retype this query using an explicit cast
because "4.4" is initially typed as float8 and the system can't figure
out whether to use numeric or float8 addition.  A more subtle problem
is that a query like "... WHERE Int2Var < 42" is unable to make use of
an index on the int2 column: 42 is resolved as int4, so the operator
is int24lt, which works but is not in the opclass of an int2 index.

Here is a proposal for fixing these problems.  I think we could get this
done for 7.1 if people like it.

The basic problem is that there's not enough smarts in the type resolver
about the interrelationships of the numeric datatypes.  All it has is
a concept of a most-preferred type within the category of numeric types.
(We are abusing the most-preferred-type mechanism, BTW, because both
FLOAT8 and NUMERIC claim to be the most-preferred type in the numeric
category!  This is in fact why the resolver can't make a choice for
"numeric+float8".)  We need more intelligence than that.

I propose that we set up a strictly-ordered hierarchy of numeric
datatypes, running from least preferred to most preferred:
	int2, int4, int8, numeric, float4, float8.
Rather than simply considering coercions to the most-preferred type,
the type resolver should use the following rules:

1. No value will be down-converted (eg int4 to int2) except by an
explicit conversion.

2. If there is not an exact matching operator, numeric values will be
up-converted to the highest numeric datatype present among the operator
or function's arguments.  For example, given "int2 + int8" we'd up-
convert the int2 to int8 and apply int8 addition.

The final piece of the puzzle is that the type initially assigned to
an undecorated numeric constant should be NUMERIC if it contains a
decimal point or exponent, and otherwise the smallest of int2, int4,
int8, NUMERIC that will represent it.  This is a considerable change
from the current lexer behavior, where you get either int4 or float8.

For example, given "NumericVar + 4.4", the constant 4.4 will initially
be assigned type NUMERIC, we will resolve the operator as numeric plus,
and everything's fine.  Given "Float8Var + 4.4", the constant is still
initially numeric, but will be up-converted to float8 so that float8
addition can be used.  The end result is the same as in traditional
Postgres: you get float8 addition.  Given "Int2Var < 42", the constant
is initially typed as int2, since it fits, and we end up selecting
int2lt, thereby allowing use of an int2 index.  (On the other hand,
given "Int2Var < 100000", we'd end up using int4lt, which is correct
to avoid overflow.)

A couple of crucial subtleties here:

1. We are assuming that the parser or optimizer will constant-fold
any conversion functions that are introduced.  Thus, in the
"Float8Var + 4.4" case, the 4.4 is represented as a float8 4.4 by the
time execution begins, so there's no performance loss.

2. We cannot lose precision by initially representing a constant as
numeric and later converting it to float.  Nor can we exceed NUMERIC's
range (the default 1000-digit limit is more than the range of IEEE
float8 data).  It would not work as well to start out by representing
a constant as float and then converting it to numeric.

Presently, the pg_proc and pg_operator tables contain a pretty fair
collection of cross-datatype numeric operators, such as int24lt,
float48pl, etc.  We could perhaps leave these in, but I believe that
it is better to remove them.  For example, if int42lt is left in place,
then it would capture cases like "Int4Var < 42", whereas we need that
to be translated to int4lt so that an int4 index can be used.  Removing
these operators will eliminate some code bloat and system-catalog bloat
to boot.

As far as I can tell, this proposal is almost compatible with the rules
given in SQL92: in particular, SQL92 specifies that an operator having
both "approximate numeric" (float) and "exact numeric" (int or numeric)
inputs should deliver an approximate-numeric result.  I propose
deviating from SQL92 in a single respect: SQL92 specifies that a
constant containing an exponent (eg 1.2E34) is approximate numeric,
which implies that the result of an operator using it is approximate
even if the other operand is exact.  I believe it's better to treat
such a constant as exact (ie, type NUMERIC) and only convert it to
float if the other operand is float.  Without doing that, an assignment
like
	UPDATE tab SET NumericVar = 1.234567890123456789012345E34;
will not work as desired because the constant will be prematurely
coerced to float, causing precision loss.

Comments?

			regards, tom lane

From tgl@sss.pgh.pa.us Sun May 14 17:30:56 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA05808
	for <pgman@candle.pha.pa.us>; Sun, 14 May 2000 17:30:52 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2]) by renoir.op.net (o1/$Revision: 1.4 $) with ESMTP id RAA16657 for <pgman@candle.pha.pa.us>; Sun, 14 May 2000 17:29:52 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id RAA20914;
	Sun, 14 May 2000 17:29:30 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: PostgreSQL-development <pgsql-hackers@postgreSQL.org>
Subject: Re: [HACKERS] type conversion discussion 
In-reply-to: <200005141950.PAA04636@candle.pha.pa.us> 
References: <200005141950.PAA04636@candle.pha.pa.us>
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
	message dated "Sun, 14 May 2000 15:50:20 -0400"
Date: Sun, 14 May 2000 17:29:30 -0400
Message-ID: <20911.958339770@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: OR

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> As some point, it seems we need to get all the PostgreSQL minds together
> to discuss type conversion issues.  These problems continue to come up
> from release to release.  We are getting better, but it seems a full
> discussion could help solidify our strategy.

OK, here are a few things that bug me about the current type-resolution
code:

1. Poor choice of type to attribute to numeric literals.  (A possible
   solution is sketched in my earlier message, but do we need similar
   mechanisms for other type categories?)

2. Tensions between treating string literals as "unknown" type and
   as "text" type, per this thread so far.

3. IS_BINARY_COMPATIBLE seems like a bogus concept.  Do we really want a
   fully symmetrical ring of types in each group?  I'd prefer to see a
   one-way equivalence, which allows eg. OID to be silently converted
   to INT4, but *not* vice versa (except perhaps by specific user cast).
   This'd be more like a traditional "is-a" or inheritance relationship
   between datatypes, which has well-understood semantics.

4. I'm also concerned that the behavior of IS_BINARY_COMPATIBLE isn't
   very predictable because it will happily go either way.  For example,
   if I do 
	select * from pg_class where oid = 1234;
   it's unclear whether I will get an oideq or an int4eq operator ---
   and that's a rather critical point since only one of them can exploit
   an index on the oid column.  Currently, there is some klugery in the
   planner that works around this by overriding the parser's choice of
   operator to substitute one that is compatible with an available index.
   That's a pretty ugly solution ... I'm not sure I know a better one,
   but as long as we're discussing type resolution issues ...

5. Lack of extensibility.  There's way too much knowledge hard-wired
   into the parser about type categories, preferred types, binary
   compatibility, etc.  All of it falls down when faced with
   user-defined datatypes.  If we do something like I suggested with
   a hardwired hierarchy of numeric datatypes, it'll get even worse.
   All this stuff ought to be driven off fields in pg_type rather than
   be hardwired into the code, so that the same concepts can be extended
   to user-defined types.

I don't have worked-out proposals for any of these but the first,
but they've all been bothering me for a while.

			regards, tom lane

From tgl@sss.pgh.pa.us Sun May 14 21:02:31 2000
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA07700
	for <pgman@candle.pha.pa.us>; Sun, 14 May 2000 21:02:28 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id VAA21261;
	Sun, 14 May 2000 21:03:17 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: PostgreSQL-development <pgsql-hackers@postgreSQL.org>
Subject: Re: [HACKERS] type conversion discussion 
In-reply-to: <20911.958339770@sss.pgh.pa.us> 
References: <200005141950.PAA04636@candle.pha.pa.us> <20911.958339770@sss.pgh.pa.us>
Comments: In-reply-to Tom Lane <tgl@sss.pgh.pa.us>
	message dated "Sun, 14 May 2000 17:29:30 -0400"
Date: Sun, 14 May 2000 21:03:17 -0400
Message-ID: <21258.958352597@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: OR

Here are the results of some further thoughts about type-conversion
issues.  This is not a complete proposal yet, but a sketch of an
approach that might solve several of the gripes in my previous proposal.

While thinking about this, I realized that my numeric-types proposal
of yesterday would break at least a few cases that work nicely now.
For example, I frequently do things like
	select * from pg_class where oid = 1234;
whilst poking around in system tables and querytree dumps.  If that
constant is initially resolved as int2, as I suggested yesterday,
then we have "oid = int2" for which there is no operator.  To succeed
we must decide to promote the constant to int4 --- but with no int4
visible among the operands of the "=", it will not work to just "promote
numerics to the highest type seen in the operands" as I suggested
yesterday.  So there has to be some more interaction in there.

Anyway, I was complaining about the looseness of the concept of
binary-compatible types and the fact that the parser's type conversion
knowledge is mostly hardwired.  These might be resolved by generalizing
the numeric type hierarchy idea into a "type promotion lattice", which
would work like this:

* Add a "typpromote" column to pg_type, which contains either zero or
  the OID of another type that the parser is allowed to promote this
  type to when searching for usable functions/operators.  For example,
  my numeric-types hierarchy of yesterday would be expressed by making
  int2 promote to int4, int4 to int8, int8 to numeric, numeric to
  float4, and float4 to float8.  The promotion idea also replaces the
  current concept of binary-compatible types: for example, OID would
  link to int4 and varchar would link to text (but not vice versa!).

* Also add a "typpromotebin" boolean column to pg_type, which contains
  't' if the type conversion indicated by typpromote is "free", ie,
  no conversion function need be executed before regarding a value as
  belonging to the promoted type.  This distinguishes binary-compatible
  from non-binary-compatible cases.  If "typpromotebin" is 'f' and the
  parser decides it needs to apply the conversion, then it has to look
  up the appropriate conversion function in pg_proc.  (More about this
  below.)

Now, if the parser fails to find an exact match for a given function
or operator name and the exact set of input data types, it proceeds by
chasing up the promotion chains for the input data types and trying to
locate a set of types for which there is a matching function/operator.
If there are multiple possibilities, we choose the one which is the
"least promoted" by some yet-to-be-determined metric.  (This metric
would probably favor "free" conversions over non-free ones, but other
than that I'm not quite sure how it should work.  The metric would
replace a whole bunch of ad-hoc heuristics that are currently applied
in the type resolver, so even if it seems rather ad-hoc it'd still be
cleaner than what we have ;-).)

In a situation like the "oid = int2" example above, this mechanism would
presumably settle on "int4 = int4" as being the least-promoted
equivalent operator.  (It could not find "oid = oid" since there is
no promotion path from int2 to oid.)  That looks bad since it isn't
compatible with an oidops index --- but I have a solution for that!
I don't think we need the oid opclass at all; why shouldn't indexes
on oid be expressed as int4 indexes to begin with?  In general, if
two types are considered binary-equivalent under the old scheme, then
the one that is considered the subtype probably shouldn't have separate
index operators under this new scheme.  Instead it should just rely on
the index operators of the promoted type.

The point of the proposed typpromotebin field is to save a pg_proc
lookup when trying to determine whether a particular promotion is "free"
or not.  We could save even more lookups if we didn't store the boolean
but instead the actual OID of the conversion function, or zero if the
promotion is "free".  The trouble with that is that it creates a
circularity problem when trying to define a new user type --- you can't
define the conversion function if its input type doesn't exist yet.
In any case, we want the parser to do a function lookup if we've
advanced more than one step in the promotion hierarchy: if we've decided
to promote int4 to float8 (which will be a four-step chain through int8,
numeric, float4) we sure want the thing to use a direct int4tofloat8
conversion function if available, not a chain of four conversion
functions.  So on balance I think we want to look in pg_proc once we've
decided which conversion to perform.  The only reason for having
typpromotebin is that the promotion metric will want to know which
conversions are free, and we don't want to have to do a lookup in
pg_proc for each alternative we consider, only the ones that are finally
selected to be used.

I can think of at least one special case that still isn't cleanly
handled under this scheme, and that is bpchar vs. varchar comparison.
Currently, we have

regression=# select 'a'::bpchar = 'a '::bpchar;
 ?column?
----------
 t
(1 row)

This is correct since trailing blanks are insignificant in bpchar land,
so the two values should be considered equal.  If we try

regression=# select 'a'::bpchar = 'a '::varchar;
ERROR:  Unable to identify an operator '=' for types 'bpchar' and 'varchar'
        You will have to retype this query using an explicit cast

which is pretty bogus but at least it saves the system from making some
random choice about whether bpchar or varchar comparison rules apply.
On the other hand,

regression=# select 'a'::bpchar = 'a '::text;
 ?column?
----------
 f
(1 row)

Here the bpchar value has been promoted to text and then text comparison
(where trailing blanks *are* significant) is applied.  I'm not sure that
we can really justify doing this in this case when we reject the bpchar
vs varchar case, but maybe someone wants to argue that that's correct.

The natural setup in my type-promotion scheme would be that both bpchar
and varchar link to 'text' as their promoted type.  If we do nothing
special then text-style comparison would be used in a bpchar vs varchar
comparison, which is arguably wrong.

One way to deal with this without introducing kluges into the type
resolver is to provide a full set of bpchar vs text and text vs bpchar
operators, and make sure that the promotion metric is such that these
will be used in place of text vs text operators if they apply (which
should hold, I think, for any reasonable metric).  This is probably
the only way to get the "right" behavior in any case --- I think that
the "right" behavior for such comparisons is to strip trailing blanks
from the bpchar side but not the text/varchar side.  (I haven't checked
to see if SQL92 agrees, though.)

Another issue is how to fit resolution of "unknown" literals into this
scheme.  We could probably continue to handle them more or less as we
do now, but they might complicate the promotion metric.

I am not clear yet on whether we'd still need the concept of "type
categories" as they presently exist in the resolver.  It's possible
that we wouldn't, which would be a nice simplification.  (If we do
still need them, we should have a column in pg_type that defines the
category of a type, instead of hard-wiring category assignments.)

			regards, tom lane

From e99re41@DoCS.UU.SE Mon May 15 07:39:03 2000
Received: from meryl.it.uu.se (root@meryl.it.uu.se [130.238.12.42])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id HAA10251
	for <pgman@candle.pha.pa.us>; Mon, 15 May 2000 07:39:01 -0400 (EDT)
Received: from Zebra.DoCS.UU.SE (e99re41@Zebra.DoCS.UU.SE [130.238.9.158])
	by meryl.it.uu.se (8.8.5/8.8.5) with ESMTP id NAA10849;
	Mon, 15 May 2000 13:39:45 +0200 (MET DST)
Received: from localhost (e99re41@localhost) by Zebra.DoCS.UU.SE (8.6.12/8.6.12) with ESMTP id NAA26523; Mon, 15 May 2000 13:39:44 +0200
X-Authentication-Warning: Zebra.DoCS.UU.SE: e99re41 owned process doing -bs
Date: Mon, 15 May 2000 13:39:44 +0200 (MET DST)
From: Peter Eisentraut <e99re41@DoCS.UU.SE>
Reply-To: Peter Eisentraut <peter_e@gmx.net>
To: Tom Lane <tgl@sss.pgh.pa.us>
cc: Bruce Momjian <pgman@candle.pha.pa.us>,
        PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] type conversion discussion 
In-Reply-To: <20911.958339770@sss.pgh.pa.us>
Message-ID: <Pine.GSO.4.02A.10005151309020.26399-100000@Zebra.DoCS.UU.SE>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by candle.pha.pa.us id HAA10251
Status: OR

On Sun, 14 May 2000, Tom Lane wrote:

> 1. Poor choice of type to attribute to numeric literals.  (A possible
>    solution is sketched in my earlier message, but do we need similar
>    mechanisms for other type categories?)

I think your plan looks good for the numerical land. (I'll ponder the oid
issues in a second.) For other type categories, perhaps not. Should a line
be promoted to a polygon so you can check if it contains a point? Or a
polygon to a box? Higher dimensions? :-)


> 2. Tensions between treating string literals as "unknown" type and
>    as "text" type, per this thread so far.

Yes, while we're at it, let's look at this in detail. I claim that
something of the form 'xxx' should always be text (or char or whatever),
period. Let's consider the cases were this could potentially clash with
the current behaviour:

a) The target type is unambiguously clear, e.g., UPDATE ... SET. Then you
cast text to the target type. The effect is identical.

b) The target type is completely unspecified, e.g. CREATE TABLE AS SELECT
'xxx'; This will currently create an "unknown" column. It should arguably
create a "text" column.

Function argument resolution:

c) There is only one function and it has a "text" argument. No-brainer.

d) There is only one function and it has an argument other than text. Try
to cast text to that type. (This is what's done in general, isn't it?)

e) The function is overloaded for many types, amongst which is text. Then
call the text version. I believe this would currently fail, which I'd
consider a deficiency.

f) The function is overloaded for many types, none of which is text. In
that case you have to cast anyway, so you don't lose anything.

On thing to also keep in mind regarding required casting for (b) and (f)
is that SQL never allowed literals of "fancy" types (e.g., DATE) to have
undecorated 'yyyy-mm-dd' constants, you always have to say DATE
'yyyy-mm-dd'. What Postgres allows is a convencience where DATE would be
obvious or implied. In the end it's a win-win situation: you tell the
system what you want, and your code is clearer.

 
> 3. IS_BINARY_COMPATIBLE seems like a bogus concept.

At least it's bogus when used for types which are not actually binary
compatible, e.g. int4 and oid. The result of the current implementation is
that you can perfectly happily insert and retrieve negative numbers from
oid fields.

I'm not so sure about the value of this particular equivalency anyway.
AFAICS the only functions that make sense for oids are comparisons (incl.
min, max), adding integers to them, subtracting one oid from another.
Silent mangling with int4 means that you can multiply them, square them,
add floating point numbers to them (doesn't really work in practice
though), all things that have no business with oids.

I'd say define the operators that are useful for oids explicitly for oids
and require casts for all others, so the users know what they're doing.
The fact that an oid is also a number should be an implementation detail.

In my mind oids are like pointers in C. Indiscriminate mangling of
pointers and integers in C has long been dismissed as questionable coding.


Of course I'd be very willing to consider counterexamples to these
theories ...

-- 
Peter Eisentraut                  Sernanders v�g 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden


From tgl@sss.pgh.pa.us Tue Jun 13 04:58:20 2000
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA24281
	for <pgman@candle.pha.pa.us>; Tue, 13 Jun 2000 03:58:18 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id DAA02571;
	Tue, 13 Jun 2000 03:58:43 -0400 (EDT)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Proposal for fixing numeric type-resolution issues 
In-reply-to: <200006130741.DAA23502@candle.pha.pa.us> 
References: <200006130741.DAA23502@candle.pha.pa.us>
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
	message dated "Tue, 13 Jun 2000 03:41:56 -0400"
Date: Tue, 13 Jun 2000 03:58:43 -0400
Message-ID: <2568.960883123@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: OR

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Again, anything to add to the TODO here?

IIRC, there was some unhappiness with the proposal you quote, so I'm
not sure we've quite agreed what to do... but clearly something must
be done.

			regards, tom lane


>> We've got a collection of problems that are related to the parser's
>> inability to make good type-resolution choices for numeric constants.
>> In some cases you get a hard error; for example "NumericVar + 4.4"
>> yields
>> ERROR:  Unable to identify an operator '+' for types 'numeric' and 'float8'
>> You will have to retype this query using an explicit cast
>> because "4.4" is initially typed as float8 and the system can't figure
>> out whether to use numeric or float8 addition.  A more subtle problem
>> is that a query like "... WHERE Int2Var < 42" is unable to make use of
>> an index on the int2 column: 42 is resolved as int4, so the operator
>> is int24lt, which works but is not in the opclass of an int2 index.
>> 
>> Here is a proposal for fixing these problems.  I think we could get this
>> done for 7.1 if people like it.
>> 
>> The basic problem is that there's not enough smarts in the type resolver
>> about the interrelationships of the numeric datatypes.  All it has is
>> a concept of a most-preferred type within the category of numeric types.
>> (We are abusing the most-preferred-type mechanism, BTW, because both
>> FLOAT8 and NUMERIC claim to be the most-preferred type in the numeric
>> category!  This is in fact why the resolver can't make a choice for
>> "numeric+float8".)  We need more intelligence than that.
>> 
>> I propose that we set up a strictly-ordered hierarchy of numeric
>> datatypes, running from least preferred to most preferred:
>> int2, int4, int8, numeric, float4, float8.
>> Rather than simply considering coercions to the most-preferred type,
>> the type resolver should use the following rules:
>> 
>> 1. No value will be down-converted (eg int4 to int2) except by an
>> explicit conversion.
>> 
>> 2. If there is not an exact matching operator, numeric values will be
>> up-converted to the highest numeric datatype present among the operator
>> or function's arguments.  For example, given "int2 + int8" we'd up-
>> convert the int2 to int8 and apply int8 addition.
>> 
>> The final piece of the puzzle is that the type initially assigned to
>> an undecorated numeric constant should be NUMERIC if it contains a
>> decimal point or exponent, and otherwise the smallest of int2, int4,
>> int8, NUMERIC that will represent it.  This is a considerable change
>> from the current lexer behavior, where you get either int4 or float8.
>> 
>> For example, given "NumericVar + 4.4", the constant 4.4 will initially
>> be assigned type NUMERIC, we will resolve the operator as numeric plus,
>> and everything's fine.  Given "Float8Var + 4.4", the constant is still
>> initially numeric, but will be up-converted to float8 so that float8
>> addition can be used.  The end result is the same as in traditional
>> Postgres: you get float8 addition.  Given "Int2Var < 42", the constant
>> is initially typed as int2, since it fits, and we end up selecting
>> int2lt, thereby allowing use of an int2 index.  (On the other hand,
>> given "Int2Var < 100000", we'd end up using int4lt, which is correct
>> to avoid overflow.)
>> 
>> A couple of crucial subtleties here:
>> 
>> 1. We are assuming that the parser or optimizer will constant-fold
>> any conversion functions that are introduced.  Thus, in the
>> "Float8Var + 4.4" case, the 4.4 is represented as a float8 4.4 by the
>> time execution begins, so there's no performance loss.
>> 
>> 2. We cannot lose precision by initially representing a constant as
>> numeric and later converting it to float.  Nor can we exceed NUMERIC's
>> range (the default 1000-digit limit is more than the range of IEEE
>> float8 data).  It would not work as well to start out by representing
>> a constant as float and then converting it to numeric.
>> 
>> Presently, the pg_proc and pg_operator tables contain a pretty fair
>> collection of cross-datatype numeric operators, such as int24lt,
>> float48pl, etc.  We could perhaps leave these in, but I believe that
>> it is better to remove them.  For example, if int42lt is left in place,
>> then it would capture cases like "Int4Var < 42", whereas we need that
>> to be translated to int4lt so that an int4 index can be used.  Removing
>> these operators will eliminate some code bloat and system-catalog bloat
>> to boot.
>> 
>> As far as I can tell, this proposal is almost compatible with the rules
>> given in SQL92: in particular, SQL92 specifies that an operator having
>> both "approximate numeric" (float) and "exact numeric" (int or numeric)
>> inputs should deliver an approximate-numeric result.  I propose
>> deviating from SQL92 in a single respect: SQL92 specifies that a
>> constant containing an exponent (eg 1.2E34) is approximate numeric,
>> which implies that the result of an operator using it is approximate
>> even if the other operand is exact.  I believe it's better to treat
>> such a constant as exact (ie, type NUMERIC) and only convert it to
>> float if the other operand is float.  Without doing that, an assignment
>> like
>> UPDATE tab SET NumericVar = 1.234567890123456789012345E34;
>> will not work as desired because the constant will be prematurely
>> coerced to float, causing precision loss.
>> 
>> Comments?
>> 
>> regards, tom lane
>> 


> -- 
>   Bruce Momjian                        |  http://www.op.net/~candle
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

From tgl@sss.pgh.pa.us Mon Jun 12 14:09:45 2000
Received: from sss2.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA01993
	for <pgman@candle.pha.pa.us>; Mon, 12 Jun 2000 13:09:43 -0400 (EDT)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
	by sss2.sss.pgh.pa.us (8.9.3/8.9.3) with ESMTP id NAA01515;
	Mon, 12 Jun 2000 13:10:01 -0400 (EDT)
To: Peter Eisentraut <peter_e@gmx.net>
cc: Bruce Momjian <pgman@candle.pha.pa.us>,
        "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>,
        PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Adding time to DATE type 
In-reply-to: <Pine.LNX.4.21.0006110322150.9195-100000@localhost.localdomain> 
References: <Pine.LNX.4.21.0006110322150.9195-100000@localhost.localdomain>
Comments: In-reply-to Peter Eisentraut <peter_e@gmx.net>
	message dated "Sun, 11 Jun 2000 13:41:24 +0200"
Date: Mon, 12 Jun 2000 13:10:00 -0400
Message-ID: <1512.960829800@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: ORr

Peter Eisentraut <peter_e@gmx.net> writes:
> Bruce Momjian writes:
>> Can someone give me a TODO summary for this issue?

> * make 'text' constants default to text type (not unknown)

> (I think not everyone's completely convinced on this issue, but I don't
> recall anyone being firmly opposed to it.)

It would be a mistake to eliminate the distinction between unknown and
text.  See for example my just-posted response to John Cochran on
pgsql-general about why 'BOULEVARD'::text behaves differently from
'BOULEVARD'::char.  If string literals are immediately assigned type
text then we will have serious problems with char(n) fields.

I think it's fine to assign string literals a type of 'unknown'
initially.  What we need to do is add a phase of type resolution that
considers treating them as text, but only after the existing logic fails
to deduce a type.

(BTW it might be better to treat string literals as defaulting to char(n)
instead of text, allowing the normal promotion rules to replace char(n)
with text if necessary.  Not sure if that would make things more or less
confusing for operations that intermix fixed- and variable-width char
types.)

			regards, tom lane

From pgsql-hackers-owner+M1936@postgresql.org Sun Dec 10 13:17:54 2000
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA20676
	for <pgman@candle.pha.pa.us>; Sun, 10 Dec 2000 13:17:54 -0500 (EST)
Received: from mail.postgresql.org (webmail.postgresql.org [216.126.85.28])
	by mail.postgresql.org (8.11.1/8.11.1) with SMTP id eBAIGvZ40566;
	Sun, 10 Dec 2000 13:16:57 -0500 (EST)
	(envelope-from pgsql-hackers-owner+M1936@postgresql.org)
Received: from sss.pgh.pa.us (sss.pgh.pa.us [209.114.132.154])
	by mail.postgresql.org (8.11.1/8.11.1) with ESMTP id eBAI8HZ39820
	for <pgsql-hackers@postgreSQL.org>; Sun, 10 Dec 2000 13:08:17 -0500 (EST)
	(envelope-from tgl@sss.pgh.pa.us)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
	by sss.pgh.pa.us (8.11.1/8.11.1) with ESMTP id eBAI82o28682;
	Sun, 10 Dec 2000 13:08:02 -0500 (EST)
To: Thomas Lockhart <lockhart@alumni.caltech.edu>
cc: pgsql-hackers@postgresql.org
Subject: [HACKERS] Unknown-type resolution rules, redux
Date: Sun, 10 Dec 2000 13:08:02 -0500
Message-ID: <28679.976471682@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Precedence: bulk
Sender: pgsql-hackers-owner@postgresql.org
Status: OR

parse_coerce.c contains the following conversation --- I believe the
first XXX comment is from me and the second from you:

    /*
     * Still too many candidates? Try assigning types for the unknown
     * columns.
     *
     * We do this by examining each unknown argument position to see if all
     * the candidates agree on the type category of that slot.  If so, and
     * if some candidates accept the preferred type in that category,
     * eliminate the candidates with other input types.  If we are down to
     * one candidate at the end, we win.
     *
     * XXX It's kinda bogus to do this left-to-right, isn't it?  If we
     * eliminate some candidates because they are non-preferred at the
     * first slot, we won't notice that they didn't have the same type
     * category for a later slot.
     * XXX Hmm. How else would you do this? These candidates are here because
     * they all have the same number of matches on arguments with explicit
     * types, so from here on left-to-right resolution is as good as any.
     * Need a counterexample to see otherwise...
     */

The comment is out of date anyway because it fails to mention the new
rule about preferring STRING category.  But to answer your request for
a counterexample: consider

	SELECT foo('bar', 'baz')

First, suppose the available candidates are

	foo(float8, int4)
	foo(float8, point)

In this case, we examine the first argument position, see that all the
candidates agree on NUMERIC category, so we consider resolving the first
unknown input to float8.  That eliminates neither candidate so we move
on to the second argument position.  Here there is a conflict of
categories so we can't eliminate anything, and we decide the call is
ambiguous.  That's correct (or at least Operating As Designed ;-)).

But now suppose we have

	foo(float8, int4)
	foo(float4, point)

Here, at the first position we will still see that all candidates agree
on NUMERIC category, and then we will eliminate candidate 2 because it
isn't the preferred type in that category.  Now when we come to the
second argument position, there's only one candidate left so there's
no category conflict.  Result: this call is considered non-ambiguous.

This means there is a left-to-right bias in the algorithm.  For example,
the exact same call *would* be considered ambiguous if the candidates'
argument orders were reversed:

	foo(int4, float8)
	foo(point, float4)

I do not like that.  You could maybe argue that earlier arguments are
more important than later ones for functions, but it's harder to make
that case for binary operators --- and in any case this behavior is
extremely difficult to explain in prose.

To fix this, I think we need to split the loop into two passes.
The first pass does *not* remove any candidates.  What it does is to
look separately at each UNKNOWN-argument position and attempt to deduce
a probable category for it, using the following rules:

* If any candidate has an input type of STRING category, use STRING
category; else if all candidates agree on the category, use that
category; else fail because no resolution can be made.

* The first pass must also remember whether any candidates are of a
preferred type within the selected category.

The probable categories and exists-preferred-type booleans are saved in
local arrays.  (Note this has to be done this way because
IsPreferredType currently allows more than one type to be considered
preferred in a category ... so the first pass cannot try to determine a
unique type, only a category.)

If we find a category for every UNKNOWN arg, then we enter a second loop
in which we discard candidates.  In this pass we discard a candidate if
(a) it is of the wrong category, or (b) it is of the right category but
is not of preferred type in that category, *and* we found candidate(s)
of preferred type at this slot.

If we end with exactly one candidate then we win.

It is clear in this algorithm that there is no order dependency: the
conditions for keeping or discarding a candidate are fixed before we
start the second pass, and do not vary depending on which other
candidates were discarded before it.

Comments?

			regards, tom lane

From pgsql-general-owner+M18949=candle.pha.pa.us=pgman@postgresql.org Sat Dec 29 15:47:47 2001
Return-path: <pgsql-general-owner+M18949=candle.pha.pa.us=pgman@postgresql.org>
Received: from rs.postgresql.org (server1.pgsql.org [64.39.15.238] (may be forged))
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id fBTKlkT05111
	for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 15:47:46 -0500 (EST)
Received: from postgresql.org (postgresql.org [64.49.215.8])
	by rs.postgresql.org (8.11.6/8.11.6) with ESMTP id fBTKhZN74322
	for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 14:43:35 -0600 (CST)
	(envelope-from pgsql-general-owner+M18949=candle.pha.pa.us=pgman@postgresql.org)
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id fBTKaem38452
	for <pgsql-general@postgresql.org>; Sat, 29 Dec 2001 15:36:40 -0500 (EST)
	(envelope-from pgman@candle.pha.pa.us)
Received: (from pgman@localhost)
	by candle.pha.pa.us (8.11.6/8.10.1) id fBTKaTg04256;
	Sat, 29 Dec 2001 15:36:29 -0500 (EST)
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-ID: <200112292036.fBTKaTg04256@candle.pha.pa.us>
Subject: Re: [GENERAL] Casting Varchar to Numeric
In-Reply-To: <20011206150158.O28880-100000@megazone23.bigpanda.com>
To: Stephan Szabo <sszabo@megazone23.bigpanda.com>
Date: Sat, 29 Dec 2001 15:36:29 -0500 (EST)
cc: Andy Marden <amarden@usa.net>, pgsql-general@postgresql.org
X-Mailer: ELM [version 2.4ME+ PL96 (25)]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
Precedence: bulk
Sender: pgsql-general-owner@postgresql.org
Status: OR

> On Mon, 3 Dec 2001, Andy Marden wrote:
> 
> > Martijn,
> >
> > It does work (believe it or not). I've now tried the method you mention
> > below - that also works and is much nicer. I can't believe that PostgreSQL
> > can't work this out. Surely implementing an algorithm that understands that
> > if you can go from a ->b and b->c then you can certainly go from a->c. If
> 
> It's more complicated than that (and postgres does some of this but not
> all), for example the cast text->float8->numeric potentially loses
> precision and should probably not be an automatic cast for that reason.
> 
> > this is viewed as too complex a task for the internals - at least a diagram
> > or some way of understanding how you should go from a->c would be immensely
> > helpful wouldn't it! Daunting for anyone picking up the database and trying
> > to do something simple(!)
> 
> There may be a need for documentation on this.  Would you like to write
> some ;)

OK, I ran some tests:
	
	test=> create table test (x text);
	CREATE
	test=> insert into test values ('323');
	INSERT 5122745 1
	test=> select cast (x as numeric) from test;
	ERROR:  Cannot cast type 'text' to 'numeric'

I can see problems with automatically casting numeric to text because
you have to guess the desired format, but going from text to numeric
seems quite easy to do.  Is there a reason we don't do it?

I can cast to integer and float8 fine:
	
	test=> select cast ( x as integer) from test;
	 ?column? 
	----------
	      323
	(1 row)

	test=> select cast ( x as float8) from test;
	 ?column? 
	----------
	      323
	(1 row)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

From pgsql-general-owner+M18951=candle.pha.pa.us=pgman@postgresql.org Sat Dec 29 19:10:38 2001
Return-path: <pgsql-general-owner+M18951=candle.pha.pa.us=pgman@postgresql.org>
Received: from west.navpoint.com (west.navpoint.com [207.106.42.13])
	by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id fBU0AbT23972
	for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 19:10:37 -0500 (EST)
Received: from rs.postgresql.org (server1.pgsql.org [64.39.15.238] (may be forged))
	by west.navpoint.com (8.11.6/8.10.1) with ESMTP id fBTNVj008959
	for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 18:31:45 -0500 (EST)
Received: from postgresql.org (postgresql.org [64.49.215.8])
	by rs.postgresql.org (8.11.6/8.11.6) with ESMTP id fBTNQrN78655
	for <pgman@candle.pha.pa.us>; Sat, 29 Dec 2001 17:26:53 -0600 (CST)
	(envelope-from pgsql-general-owner+M18951=candle.pha.pa.us=pgman@postgresql.org)
Received: from sss.pgh.pa.us ([192.204.191.242])
	by postgresql.org (8.11.3/8.11.4) with ESMTP id fBTN8Fm47978
	for <pgsql-general@postgresql.org>; Sat, 29 Dec 2001 18:08:15 -0500 (EST)
	(envelope-from tgl@sss.pgh.pa.us)
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
	by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id fBTN7vg20245;
	Sat, 29 Dec 2001 18:07:57 -0500 (EST)
To: Bruce Momjian <pgman@candle.pha.pa.us>
cc: Stephan Szabo <sszabo@megazone23.bigpanda.com>,
   Andy Marden <amarden@usa.net>, pgsql-general@postgresql.org
Subject: Re: [GENERAL] Casting Varchar to Numeric 
In-Reply-To: <200112292036.fBTKaTg04256@candle.pha.pa.us> 
References: <200112292036.fBTKaTg04256@candle.pha.pa.us>
Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>
	message dated "Sat, 29 Dec 2001 15:36:29 -0500"
Date: Sat, 29 Dec 2001 18:07:57 -0500
Message-ID: <20242.1009667277@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Precedence: bulk
Sender: pgsql-general-owner@postgresql.org
Status: OR

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I can see problems with automatically casting numeric to text because
> you have to guess the desired format, but going from text to numeric
> seems quite easy to do.  Is there a reason we don't do it?

I do not think it's a good idea to have implicit casts between text and
everything under the sun, because that essentially destroys the type
checking system.  What we need (see previous discussion) is a flag in
pg_proc that says whether a type conversion function may be invoked
implicitly or not.  I've got no problem with offering text(numeric) and
numeric(text) functions that are invoked by explicit function calls or
casts --- I just don't want the system trying to use them to make
sense of a bogus query.

> I can cast to integer and float8 fine:

I don't believe that those should be available as implicit casts either.
They are, at the moment:

regression=# select 33 || 44.0;
 ?column?
----------
 3344
(1 row)

Ugh.

			regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org