aboutsummaryrefslogtreecommitdiff
path: root/doc/TODO.detail/tablespaces
blob: 7eb866c3115578ecbee437c8eacd9f9fdfbb39e9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
From pgsql-hackers-owner+M174@hub.org Sun Mar 12 22:31:11 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id XAA25886
	for <pgman@candle.pha.pa.us>; Sun, 12 Mar 2000 23:31:10 -0500 (EST)
Received: from news.tht.net (news.hub.org [216.126.91.242]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id XAA04589 for <pgman@candle.pha.pa.us>; Sun, 12 Mar 2000 23:19:33 -0500 (EST)
Received: from hub.org (hub.org [216.126.84.1])
	by news.tht.net (8.9.3/8.9.3) with SMTP id XAA42854;
	Sun, 12 Mar 2000 23:05:05 -0500 (EST)
	(envelope-from pgsql-hackers-owner+M174@hub.org)
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67])
	by hub.org (8.9.3/8.9.3) with ESMTP id XAA95917
	for <pgsql-hackers@postgreSQL.org>; Sun, 12 Mar 2000 23:00:56 -0500 (EST)
	(envelope-from pgman@candle.pha.pa.us)
Received: (from pgman@localhost)
	by candle.pha.pa.us (8.9.0/8.9.0) id WAA25403
	for pgsql-hackers@postgreSQL.org; Sun, 12 Mar 2000 22:59:56 -0500 (EST)
From: Bruce Momjian <pgman@candle.pha.pa.us>
Message-Id: <200003130359.WAA25403@candle.pha.pa.us>
Subject: [HACKERS] Fix for RENAME
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
Date: Sun, 12 Mar 2000 22:59:56 -0500 (EST)
X-Mailer: ELM [version 2.4ME+ PL72 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR

I have thought about the issue with ALTER TABLE RENAME and keeping the
file system in sync with the database.

It seems there are three commands that can cause these to get out of
sync:

	CREATE TABLE/INDEX
	DROP TABLE/INDEX
	ALTER TABLE RENAME

Now, if we had file names based only on the oid, we can eliminate file
renaming for RENAME, but the others are still a problem.

Seems there are three ways to get out of sync:

	ABORT transaction
	backend crash
	OS crash

The last two are the same, except the backend crash restarts the
postmaster, while the OS crash has the postmaster starting up normally.

Here is my idea.  Create a C List of file names to unlink on transaction
commit or abort.  For CREATE, unlink created files on transaction ABORT.
For DROP, unlink dropped files on COMMIT.  For RENAME, create a hard
link for the new table linked to old table, and unlink the old file name
on COMMIT or the new file on ABORT.

That takes care of COMMIT and ABORT.  For backend crash or OS crash, add
a postgres command-line flag for recovery.  Have the postmaster on
startup or shared memory refresh start up a postgres backend on every
database with the recovery flag set.  Have the postgres backend find all
the oids in the pg_class table, and have it go through every file in the
database directory and remove all files that don't match the oids/names
in pg_class.  Also, remove all old sort, noname, and temp files at the
same time.  Seems we should be doing this anyway.

Care would have to be taken that a corrupted database that caused a
postgres crash on connection would not get the postmaster startup into
an infinite loop.

Comments?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

From reedstrm@wallace.ece.rice.edu Tue Mar 14 12:33:31 2000
Received: from wallace.ece.rice.edu (root@wallace.ece.rice.edu [128.42.12.154])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id NAA23826
	for <pgman@candle.pha.pa.us>; Tue, 14 Mar 2000 13:33:29 -0500 (EST)
Received: by wallace.ece.rice.edu
	via sendmail from stdin
	id <m12Uw8K-000LELC@wallace.ece.rice.edu> (Debian Smail3.2.0.102)
	for pgman@candle.pha.pa.us; Tue, 14 Mar 2000 12:33:32 -0600 (CST) 
Date: Tue, 14 Mar 2000 12:33:32 -0600
From: "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>
To: Hiroshi Inoue <Inoue@tpf.co.jp>
Cc: Bruce Momjian <pgman@candle.pha.pa.us>,
        PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Fix for RENAME
Message-ID: <20000314123331.A6094@rice.edu>
References: <200003140317.WAA27733@candle.pha.pa.us> <000c01bf8d75$a0016800$2801007e@tpf.co.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
User-Agent: Mutt/1.0i
In-Reply-To: <000c01bf8d75$a0016800$2801007e@tpf.co.jp>; from Inoue@tpf.co.jp on Tue, Mar 14, 2000 at 02:24:52PM +0900
Status: OR

Hiroshi -
I've just about finished working up a patch to store the physical
file name in the pg_class table. There are only two places that 
require a Rule for generating the filename, and one of them is
only used for bootstrapping. For the initial cut, I used the rule:

The filename consists of the TABLENAME, and underscore, and the OID.
If this is longer than NAMEDATALEN, shorten the TABLENAME.

I implemented this rule by exporting Tom's  makeObjectName function
from analyze.c, which is used to make other system generated names
that are have a requirement to be human readable. Replacing this
rule with any other in the future would be straightforward, except
for bootstrap. There are a number of places in bootstrap that need to
know the filename. I've factored them out into yet another set of 
#defines (in catname.h) to make that easier.


I'm working through the regression tests right now: this is a relatively
extensive change, since it modifies the low level access routines, and the
buffer cache (which I indexed on physical filename, rather than relname,
as it is now) Hopefully, I caught all the places that assume relname ==
filename == unique name within a single database (see, I want schemas...)

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005





On Tue, Mar 14, 2000 at 02:24:52PM +0900, Hiroshi Inoue wrote:
> > -----Original Message-----
> > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> > 
> > > > They use the existing table file.  It is only when
> > > > adding/removing/renaming file system files that this 
> > out-of-sync problem
> > > > happens.
> > > >
> > 
> > Not sure.  I was going to get the CREATE/DROP/RENAME working as it
> > should then as we add more features, we can implement this solution for
> > them too.
> >
> 
> Hmm,is general solution difficult ?
> Is more flexible naming rule bad ?
> 
> This the 3rd or 4th time that I mention the following.
> 
> PostgreSQL doesn't keep the information in itself where tables are
> allocated. So we need a naming rule to find where existent tables
> are allocated.  Don't you wonder the spec ?
> 
> Regards.
> 
> Hiroshi Inoue
> Inoue@tpf.co.jp
>   
> 

From pgsql-hackers-owner+M74@hub.org Tue Mar 14 18:14:15 2000
Received: from hub.org (hub.org [216.126.84.1])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id TAA06093
	for <pgman@candle.pha.pa.us>; Tue, 14 Mar 2000 19:14:13 -0500 (EST)
Received: from hub.org (hub.org [216.126.84.1])
	by hub.org (8.9.3/8.9.3) with SMTP id SAA95465;
	Tue, 14 Mar 2000 18:45:35 -0500 (EST)
	(envelope-from pgsql-hackers-owner+M74@hub.org)
Received: from wallace.ece.rice.edu (root@wallace.ece.rice.edu [128.42.12.154])
	by hub.org (8.9.3/8.9.3) with ESMTP id NAA31276
	for <pgsql-hackers@postgresql.org>; Tue, 14 Mar 2000 13:33:52 -0500 (EST)
	(envelope-from reedstrm@wallace.ece.rice.edu)
Received: by wallace.ece.rice.edu
	via sendmail from stdin
	id <m12Uw8K-000LELC@wallace.ece.rice.edu> (Debian Smail3.2.0.102)
	for pgsql-hackers@postgresql.org; Tue, 14 Mar 2000 12:33:32 -0600 (CST) 
Date: Tue, 14 Mar 2000 12:33:32 -0600
From: "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>
To: Hiroshi Inoue <Inoue@tpf.co.jp>
Cc: Bruce Momjian <pgman@candle.pha.pa.us>,
        PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Fix for RENAME
Message-ID: <20000314123331.A6094@rice.edu>
References: <200003140317.WAA27733@candle.pha.pa.us> <000c01bf8d75$a0016800$2801007e@tpf.co.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
User-Agent: Mutt/1.0i
In-Reply-To: <000c01bf8d75$a0016800$2801007e@tpf.co.jp>; from Inoue@tpf.co.jp on Tue, Mar 14, 2000 at 02:24:52PM +0900
Precedence: bulk
Sender: pgsql-hackers-owner@hub.org
Status: OR

Hiroshi -
I've just about finished working up a patch to store the physical
file name in the pg_class table. There are only two places that 
require a Rule for generating the filename, and one of them is
only used for bootstrapping. For the initial cut, I used the rule:

The filename consists of the TABLENAME, and underscore, and the OID.
If this is longer than NAMEDATALEN, shorten the TABLENAME.

I implemented this rule by exporting Tom's  makeObjectName function
from analyze.c, which is used to make other system generated names
that are have a requirement to be human readable. Replacing this
rule with any other in the future would be straightforward, except
for bootstrap. There are a number of places in bootstrap that need to
know the filename. I've factored them out into yet another set of 
#defines (in catname.h) to make that easier.


I'm working through the regression tests right now: this is a relatively
extensive change, since it modifies the low level access routines, and the
buffer cache (which I indexed on physical filename, rather than relname,
as it is now) Hopefully, I caught all the places that assume relname ==
filename == unique name within a single database (see, I want schemas...)

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005





On Tue, Mar 14, 2000 at 02:24:52PM +0900, Hiroshi Inoue wrote:
> > -----Original Message-----
> > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> > 
> > > > They use the existing table file.  It is only when
> > > > adding/removing/renaming file system files that this 
> > out-of-sync problem
> > > > happens.
> > > >
> > 
> > Not sure.  I was going to get the CREATE/DROP/RENAME working as it
> > should then as we add more features, we can implement this solution for
> > them too.
> >
> 
> Hmm,is general solution difficult ?
> Is more flexible naming rule bad ?
> 
> This the 3rd or 4th time that I mention the following.
> 
> PostgreSQL doesn't keep the information in itself where tables are
> allocated. So we need a naming rule to find where existent tables
> are allocated.  Don't you wonder the spec ?
> 
> Regards.
> 
> Hiroshi Inoue
> Inoue@tpf.co.jp
>   
> 

From mascarm@mascari.com Tue Mar 14 16:34:04 2000
Received: from corvette.mascari.com (dhcp26136016.columbus.rr.com [24.26.136.16])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id RAA04395
	for <pgman@candle.pha.pa.us>; Tue, 14 Mar 2000 17:32:14 -0500 (EST)
Received: from mascari.com (ferrari.mascari.com [192.168.2.1])
	by corvette.mascari.com (8.9.3/8.9.3) with ESMTP id RAA09562;
	Tue, 14 Mar 2000 17:27:22 -0500
Message-ID: <38CEBD0A.52ADB37E@mascari.com>
Date: Tue, 14 Mar 2000 17:28:26 -0500
From: Mike Mascari <mascarm@mascari.com>
X-Mailer: Mozilla 4.7 [en] (Win95; I)
X-Accept-Language: en
MIME-Version: 1.0
To: Bruce Momjian <pgman@candle.pha.pa.us>
CC: Hiroshi Inoue <Inoue@tpf.co.jp>,
        PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Fix for RENAME
References: <200003141545.KAA17518@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> > Hmm,is general solution difficult ?
> > Is more flexible naming rule bad ?
> >
> > This the 3rd or 4th time that I mention the following.
> 
> That's because I didn't understand.
> 
> >
> > PostgreSQL doesn't keep the information in itself where tables are
> > allocated. So we need a naming rule to find where existent tables
> > are allocated.  Don't you wonder the spec ?
> 
> How does naming the files in the database help our DROP/CREATE problem?
> It would help RENAME a little bit.  Not sure about the others because
> currently they don't have a problem.

I've been thinking about this somewhat, and I think the first
step necessary in correctly supporting ROLLBACK-able DDL
statements in transactions is the change to <relname>_<oid>.
Imagine the scenario:

CREATE TABLE test (key int4);

a) Session #1:

BEGIN;

b) Session #2:

BEGIN;
DROP TABLE test;
CREATE TABLE test (value varchar(32));

c) Session #1:

DROP TABLE test;
COMMIT;

d) Session #2:

COMMIT;

What's clear to me is that, if DDL statements are to be
ROLLBACK-able, either (1) an AccessExclusive lock is held on the
relation until transaction commit (like Phillip Warner stated was
Dec/Rdb's behavior) or (2) PostgreSQL must be capable of
supporting "multi-versioned schema" as well as tuples. Before
step 'c' is executed, both tables must simultaneously exist in
the database with the same name, which works fine in the cataloge
thanks to MVCC, but requires that, on disk, there exists:

test_01231  - Session #1's table, available for ROLLBACK
test_13421  - Session #2's table, available for COMMIT

Now, I believe it was Andreas who suggested that VACUUM be
modified to perform cleanup. I agree with this. VACUUM will need
to check for aborted relation tuples in pg_class and remove the
associated file from the filesystem in the event, for example,
that Session #2 aborted -or- Session #1 aborted leaving the
original pg_class tuple the "active" one and Session #2 attempted
to COMMIT, which violates the UNIQUE constraint on the relname of
pg_class. In addition, for "active" relation entries, VACUUM
should verify the filename is
<relname>_<oid> for the given oid. If it is not, it should rename
the filename on the filesystem. Again, this is purely cosmetic
for administrative purposes only, but would allow
for lack of atomicity only with respect to the label of the
relation file, until the next
VACUUM is run. 

For the case of ALTER TABLE RENAME, ALTER TABLE DROP COLUMN,
etc., the same functionality would apply. But, as in previous
discussions regarding ALTER TABLE DROP COLUMN, PostgreSQL MUST be
capable of allowing multiple tuples with different attribute
counts and types within the same relation:

CREATE TABLE test (key int4);

a) Session #1:

BEGIN;

b) Session #2:

BEGIN;
ALTER TABLE test ADD COLUMN value int4;
INSERT INTO test values (1, 1);

c) Session #1:

INSERT INTO test values (0);
COMMIT;

d) Session #2:

COMMIT;

This also means that Hiroshi's plan to suppress the visibility of
attributes for ALTER TABLE DROP COLUMN would be required anyway,
to allow for "multi-versioning" of attributes within a single
tuple (i.e., like multi-versioning of tuples within relations),
an attribute is either visible or not, but the tuple should
always grow, until, of course, the next VACUUM.

So, to support rollback-able DDL statements ("multi-versioning
schema", if you will), PostgreSQL needs:

1) relation names of the form <relname>_<oid>
2) support "multi-versioning" of attributes within a single tuple
3) modify VACUUM to:

  A) Remove filesystem files whose pg_class tuples are no longer
valid
  B) Rename filesystem files to relname of pg_class when the
<relname>_<oid> doesn't match
  C) Reconstruct relations after attributes have been
added/dropped.

4) All DDL statements should perform their non-create filesystem
functions in the now infamous "post-transaction-commit" trigger.
If the backend should crash between the time the transaction
committed and the rename() or unlink(), no adverse affects would
be encountered with the database WRT data, VACUUM would clean up
the rename() problem, and, worst-case scenario, an old
<relname>_<oid> file would lie around unused. But at least it
would no longer prohibit the creation of a table by the same
name....

Just my humble opinion, 

Mike Mascari

From Inoue@tpf.co.jp Tue Mar 14 20:31:35 2000
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA08792
	for <pgman@candle.pha.pa.us>; Tue, 14 Mar 2000 21:30:35 -0500 (EST)
Received: from cadzone ([126.0.1.40] (may be forged))
          by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
   id LAA00515; Wed, 15 Mar 2000 11:29:09 +0900
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
To: "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>,
        "Bruce Momjian" <pgman@candle.pha.pa.us>
Cc: "PostgreSQL-development" <pgsql-hackers@postgresql.org>
Subject: RE: [HACKERS] Fix for RENAME
Date: Wed, 15 Mar 2000 11:35:46 +0900
Message-ID: <000c01bf8e27$2b3c3ce0$2801007e@tpf.co.jp>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
In-Reply-To: <20000314123331.A6094@rice.edu>
Importance: Normal
Status: ORr

> -----Original Message-----
> From: Ross J. Reedstrom [mailto:reedstrm@wallace.ece.rice.edu]
> 
> Hiroshi -
> I've just about finished working up a patch to store the physical
> file name in the pg_class table. There are only two places that 
> require a Rule for generating the filename, and one of them is
> only used for bootstrapping.

Thanks for your trial.
It's nice that only two places require naming rule.

I don't stick to one naming rule.
The only limitation is the uniqueness and the rule
could be changed according to situations.
For example,we could change the naming rule according to
the kind of relation such as system/user relations.

I'm now inclined to introduce a new system relation to store
the physical path name. It could also have table(data)space
information in the (near ?) future. 
It seems better to separate it from pg_class because table(data?)
space may change the concept of table allocation.

Comments ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


From Inoue@tpf.co.jp Wed Mar 15 02:00:58 2000
Received: from renoir.op.net (root@renoir.op.net [207.29.195.4])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id DAA17887
	for <pgman@candle.pha.pa.us>; Wed, 15 Mar 2000 03:00:57 -0500 (EST)
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34]) by renoir.op.net (o1/$Revision: 1.1 $) with ESMTP id CAA02974 for <pgman@candle.pha.pa.us>; Wed, 15 Mar 2000 02:54:44 -0500 (EST)
Received: from cadzone ([126.0.1.40] (may be forged))
          by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
   id QAA00734; Wed, 15 Mar 2000 16:53:56 +0900
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
To: "Bruce Momjian" <pgman@candle.pha.pa.us>
Cc: "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>,
        "PostgreSQL-development" <pgsql-hackers@postgresql.org>
Subject: RE: [HACKERS] Fix for RENAME
Date: Wed, 15 Mar 2000 17:00:35 +0900
Message-ID: <001101bf8e54$8b941cc0$2801007e@tpf.co.jp>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
In-Reply-To: <200003150433.XAA13256@candle.pha.pa.us>
Importance: Normal
Status: ORr

> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> 
> > I'm now inclined to introduce a new system relation to store
> > the physical path name. It could also have table(data)space
> > information in the (near ?) future. 
> > It seems better to separate it from pg_class because table(data?)
> > space may change the concept of table allocation.
> 
> Why not just put it in pg_class?
>

Not sure,it's only my feeling.
Comments please,everyone.

We have taken a practical way which doesn't break file per table
assumption in this thread and it wouldn't so difficult  to implement.
In fact Ross has already tried it.

However there was a discussion about data(table)space for
months ago and currently a new discussion is there.
Judging from the previous discussion,I can't expect so much
that it could get a practical consensus(How many opinions there
were). We can make a practical step toward future by encapsulating
the information of table allocation. Separating table alloc info from
pg_class seems one of the way. 
There may be more essential things for encapsulation. 

Comments ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp