Replace our traditional initial-catalog-data format with a better design.

Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com
author: Tom Lane <tgl@sss.pgh.pa.us> 2018-04-08 13:16:50 -0400
committer: Tom Lane <tgl@sss.pgh.pa.us> 2018-04-08 13:17:27 -0400
commit: 372728b0d49552641f0ea83d9d2e08817de038fa (patch)
tree: 5beca037d3fdfeaa09467c8b559c83eab5030878 /src/backend
parent: 02f3e558f21c0fbec9f94d5de9ad34f321eb0e57 (diff)
download: postgresql-372728b0d49552641f0ea83d9d2e08817de038fa.tar.gz
postgresql-372728b0d49552641f0ea83d9d2e08817de038fa.zip
8 files changed, 670 insertions, 432 deletions
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 42a0748ade5..82a59eac2d7 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -53,7 +53,7 @@ endif
 
 ##########################################################################
 
-all: submake-libpgport submake-schemapg postgres $(POSTGRES_IMP)
+all: submake-libpgport submake-catalog-headers postgres $(POSTGRES_IMP)
 
 ifneq ($(PORTNAME), cygwin)
 ifneq ($(PORTNAME), win32)
@@ -151,19 +151,17 @@ utils/errcodes.h: utils/generate-errcodes.pl utils/errcodes.txt
 utils/fmgrprotos.h: utils/fmgroids.h
 	touch $@
 
-utils/fmgroids.h: utils/Gen_fmgrtab.pl catalog/Catalog.pm $(top_srcdir)/src/include/catalog/pg_proc.h
+utils/fmgroids.h: utils/Gen_fmgrtab.pl catalog/Catalog.pm $(top_srcdir)/src/include/catalog/pg_proc.dat $(top_srcdir)/src/include/access/transam.h
 	$(MAKE) -C utils fmgroids.h fmgrprotos.h
 
 utils/probes.h: utils/probes.d
 	$(MAKE) -C utils probes.h
 
 # run this unconditionally to avoid needing to know its dependencies here:
-catalog/schemapg.h: | submake-schemapg
+submake-catalog-headers:
+	$(MAKE) -C catalog distprep generated-header-symlinks
 
-submake-schemapg:
-	$(MAKE) -C catalog schemapg.h
-
-.PHONY: submake-schemapg
+.PHONY: submake-catalog-headers
 
 # Make symlinks for these headers in the include directory. That way
 # we can cut down on the -I options. Also, a symlink is automatically
@@ -178,18 +176,13 @@ submake-schemapg:
 
 .PHONY: generated-headers
 
-generated-headers: $(top_builddir)/src/include/parser/gram.h $(top_builddir)/src/include/catalog/schemapg.h $(top_builddir)/src/include/storage/lwlocknames.h $(top_builddir)/src/include/utils/errcodes.h $(top_builddir)/src/include/utils/fmgroids.h $(top_builddir)/src/include/utils/fmgrprotos.h $(top_builddir)/src/include/utils/probes.h
+generated-headers: $(top_builddir)/src/include/parser/gram.h $(top_builddir)/src/include/storage/lwlocknames.h $(top_builddir)/src/include/utils/errcodes.h $(top_builddir)/src/include/utils/fmgroids.h $(top_builddir)/src/include/utils/fmgrprotos.h $(top_builddir)/src/include/utils/probes.h submake-catalog-headers
 
 $(top_builddir)/src/include/parser/gram.h: parser/gram.h
 	prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
 	  cd '$(dir $@)' && rm -f $(notdir $@) && \
 	  $(LN_S) "$$prereqdir/$(notdir $<)" .
 
-$(top_builddir)/src/include/catalog/schemapg.h: catalog/schemapg.h
-	prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
-	  cd '$(dir $@)' && rm -f $(notdir $@) && \
-	  $(LN_S) "$$prereqdir/$(notdir $<)" .
-
 $(top_builddir)/src/include/storage/lwlocknames.h: storage/lmgr/lwlocknames.h
 	prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
 	  cd '$(dir $@)' && rm -f $(notdir $@) && \
@@ -225,7 +218,7 @@ utils/probes.o: utils/probes.d $(SUBDIROBJS)
 distprep:
 	$(MAKE) -C parser	gram.c gram.h scan.c
 	$(MAKE) -C bootstrap	bootparse.c bootscanner.c
-	$(MAKE) -C catalog	schemapg.h postgres.bki postgres.description postgres.shdescription
+	$(MAKE) -C catalog	distprep
 	$(MAKE) -C replication	repl_gram.c repl_scanner.c syncrep_gram.c syncrep_scanner.c
 	$(MAKE) -C storage/lmgr	lwlocknames.h lwlocknames.c
 	$(MAKE) -C utils	fmgrtab.c fmgroids.h fmgrprotos.h errcodes.h
@@ -327,13 +320,7 @@ endif
 ##########################################################################
 
 clean:
-	rm -f $(LOCALOBJS) postgres$(X) $(POSTGRES_IMP) \
-		$(top_builddir)/src/include/parser/gram.h \
-		$(top_builddir)/src/include/catalog/schemapg.h \
-		$(top_builddir)/src/include/storage/lwlocknames.h \
-		$(top_builddir)/src/include/utils/fmgroids.h \
-		$(top_builddir)/src/include/utils/fmgrprotos.h \
-		$(top_builddir)/src/include/utils/probes.h
+	rm -f $(LOCALOBJS) postgres$(X) $(POSTGRES_IMP)
 ifeq ($(PORTNAME), cygwin)
 	rm -f postgres.dll libpostgres.a
 endif
@@ -345,15 +332,12 @@ distclean: clean
 	rm -f port/tas.s port/dynloader.c port/pg_sema.c port/pg_shmem.c
 
 maintainer-clean: distclean
+	$(MAKE) -C catalog $@
 	rm -f bootstrap/bootparse.c \
 	      bootstrap/bootscanner.c \
 	      parser/gram.c \
 	      parser/gram.h \
 	      parser/scan.c \
-	      catalog/schemapg.h \
-	      catalog/postgres.bki \
-	      catalog/postgres.description \
-	      catalog/postgres.shdescription \
 	      replication/repl_gram.c \
 	      replication/repl_scanner.c \
 	      replication/syncrep_gram.c \
diff --git a/src/backend/catalog/.gitignore b/src/backend/catalog/.gitignore
index 557af3c0e5e..9abe91d6e64 100644
--- a/src/backend/catalog/.gitignore
+++ b/src/backend/catalog/.gitignore
@@ -2,3 +2,5 @@
 /postgres.description
 /postgres.shdescription
 /schemapg.h
+/pg_*_d.h
+/bki-stamp
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 9ced1547f6b..3b3bb6bc6ca 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -1,7 +1,7 @@
 #----------------------------------------------------------------------
 #
 # Catalog.pm
-#    Perl module that extracts info from catalog headers into Perl
+#    Perl module that extracts info from catalog files into Perl
 #    data structures
 #
 # Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
@@ -16,12 +16,11 @@ package Catalog;
 use strict;
 use warnings;
 
-# Call this function with an array of names of header files to parse.
-# Returns a nested data structure describing the data in the headers.
-sub Catalogs
+# Parses a catalog header file into a data structure describing the schema
+# of the catalog.
+sub ParseHeader
 {
-	my (%catalogs, $catname, $declaring_attributes, $most_recent);
-	$catalogs{names} = [];
+	my $input_file = shift;
 
 	# There are a few types which are given one name in the C source, but a
 	# different name at the SQL level.  These are enumerated here.
@@ -34,108 +33,68 @@ sub Catalogs
 		'TransactionId' => 'xid',
 		'XLogRecPtr'    => 'pg_lsn');
 
-	foreach my $input_file (@_)
-	{
 		my %catalog;
+		my $declaring_attributes = 0;
 		my $is_varlen     = 0;
+		my $is_client_code = 0;
 
 		$catalog{columns} = [];
-		$catalog{data}    = [];
+		$catalog{toasting} = [];
+		$catalog{indexing} = [];
+		$catalog{client_code} = [];
 
 		open(my $ifh, '<', $input_file) || die "$input_file: $!";
 
-		my ($filename) = ($input_file =~ m/(\w+)\.h$/);
-		my $natts_pat = "Natts_$filename";
-
 		# Scan the input file.
 		while (<$ifh>)
 		{
 
-			# Strip C-style comments.
-			s;/\*(.|\n)*\*/;;g;
-			if (m;/\*;)
-			{
-
-				# handle multi-line comments properly.
-				my $next_line = <$ifh>;
-				die "$input_file: ends within C-style comment\n"
-				  if !defined $next_line;
-				$_ .= $next_line;
-				redo;
-			}
-
-			# Remember input line number for later.
-			my $input_line_number = $.;
-
-			# Strip useless whitespace and trailing semicolons.
-			chomp;
-			s/^\s+//;
-			s/;\s*$//;
-			s/\s+/ /g;
-
-			# Push the data into the appropriate data structure.
-			if (/$natts_pat\s+(\d+)/)
-			{
-				$catalog{natts} = $1;
-			}
-			elsif (
-				/^DATA\(insert(\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
-			{
-				check_natts($filename, $catalog{natts}, $3, $input_file,
-					$input_line_number);
-
-				push @{ $catalog{data} }, { oid => $2, bki_values => $3 };
-			}
-			elsif (/^DESCR\(\"(.*)\"\)$/)
+			# Set appropriate flag when we're in certain code sections.
+			if (/^#/)
 			{
-				$most_recent = $catalog{data}->[-1];
-
-				# this tests if most recent line is not a DATA() statement
-				if (ref $most_recent ne 'HASH')
-				{
-					die "DESCR() does not apply to any catalog ($input_file)";
-				}
-				if (!defined $most_recent->{oid})
-				{
-					die "DESCR() does not apply to any oid ($input_file)";
-				}
-				elsif ($1 ne '')
+				$is_varlen = 1 if /^#ifdef\s+CATALOG_VARLEN/;
+				if (/^#ifdef\s+EXPOSE_TO_CLIENT_CODE/)
 				{
-					$most_recent->{descr} = $1;
+					$is_client_code = 1;
+					next;
 				}
+				next if !$is_client_code;
 			}
-			elsif (/^SHDESCR\(\"(.*)\"\)$/)
-			{
-				$most_recent = $catalog{data}->[-1];
 
-				# this tests if most recent line is not a DATA() statement
-				if (ref $most_recent ne 'HASH')
-				{
-					die
-					  "SHDESCR() does not apply to any catalog ($input_file)";
-				}
-				if (!defined $most_recent->{oid})
-				{
-					die "SHDESCR() does not apply to any oid ($input_file)";
-				}
-				elsif ($1 ne '')
+			if (!$is_client_code)
+			{
+				# Strip C-style comments.
+				s;/\*(.|\n)*\*/;;g;
+				if (m;/\*;)
 				{
-					$most_recent->{shdescr} = $1;
+
+					# handle multi-line comments properly.
+					my $next_line = <$ifh>;
+					die "$input_file: ends within C-style comment\n"
+					  if !defined $next_line;
+					$_ .= $next_line;
+					redo;
 				}
+
+				# Strip useless whitespace and trailing semicolons.
+				chomp;
+				s/^\s+//;
+				s/;\s*$//;
+				s/\s+/ /g;
 			}
-			elsif (/^DECLARE_TOAST\(\s*(\w+),\s*(\d+),\s*(\d+)\)/)
+
+			# Push the data into the appropriate data structure.
+			if (/^DECLARE_TOAST\(\s*(\w+),\s*(\d+),\s*(\d+)\)/)
 			{
-				$catname = 'toasting';
 				my ($toast_name, $toast_oid, $index_oid) = ($1, $2, $3);
-				push @{ $catalog{data} },
+				push @{ $catalog{toasting} },
 				  "declare toast $toast_oid $index_oid on $toast_name\n";
 			}
 			elsif (/^DECLARE_(UNIQUE_)?INDEX\(\s*(\w+),\s*(\d+),\s*(.+)\)/)
 			{
-				$catname = 'indexing';
 				my ($is_unique, $index_name, $index_oid, $using) =
 				  ($1, $2, $3, $4);
-				push @{ $catalog{data} },
+				push @{ $catalog{indexing} },
 				  sprintf(
 					"declare %sindex %s %s %s\n",
 					$is_unique ? 'unique ' : '',
@@ -143,37 +102,51 @@ sub Catalogs
 			}
 			elsif (/^BUILD_INDICES/)
 			{
-				push @{ $catalog{data} }, "build indices\n";
+				push @{ $catalog{indexing} }, "build indices\n";
 			}
-			elsif (/^CATALOG\(([^,]*),(\d+)\)/)
+			elsif (/^CATALOG\((\w+),(\d+),(\w+)\)/)
 			{
-				$catname = $1;
+				$catalog{catname} = $1;
 				$catalog{relation_oid} = $2;
-
-				# Store pg_* catalog names in the same order we receive them
-				push @{ $catalogs{names} }, $catname;
+				$catalog{relation_oid_macro} = $3;
 
 				$catalog{bootstrap} = /BKI_BOOTSTRAP/ ? ' bootstrap' : '';
 				$catalog{shared_relation} =
 				  /BKI_SHARED_RELATION/ ? ' shared_relation' : '';
 				$catalog{without_oids} =
 				  /BKI_WITHOUT_OIDS/ ? ' without_oids' : '';
-				$catalog{rowtype_oid} =
-				  /BKI_ROWTYPE_OID\((\d+)\)/ ? " rowtype_oid $1" : '';
+				if (/BKI_ROWTYPE_OID\((\d+),(\w+)\)/)
+				{
+					$catalog{rowtype_oid} = $1;
+					$catalog{rowtype_oid_clause} = " rowtype_oid $1";
+					$catalog{rowtype_oid_macro} = $2;
+				}
+				else
+				{
+					$catalog{rowtype_oid} = '';
+					$catalog{rowtype_oid_clause} = '';
+					$catalog{rowtype_oid_macro} = '';
+				}
 				$catalog{schema_macro} = /BKI_SCHEMA_MACRO/ ? 1 : 0;
 				$declaring_attributes = 1;
 			}
-			elsif ($declaring_attributes)
+			elsif ($is_client_code)
 			{
-				next if (/^{|^$/);
-				if (/^#/)
+				if (/^#endif/)
 				{
-					$is_varlen = 1 if /^#ifdef\s+CATALOG_VARLEN/;
-					next;
+					$is_client_code = 0;
+				}
+				else
+				{
+					push @{ $catalog{client_code} }, $_;
 				}
+			}
+			elsif ($declaring_attributes)
+			{
+				next if (/^{|^$/);
 				if (/^}/)
 				{
-					undef $declaring_attributes;
+					$declaring_attributes = 0;
 				}
 				else
 				{
@@ -208,10 +181,17 @@ sub Catalogs
 						{
 							$column{forcenotnull} = 1;
 						}
-						elsif ($attopt =~ /BKI_DEFAULT\((\S+)\)/)
+						# We use quotes for values like \0 and \054, to
+						# make sure all compilers and syntax highlighters
+						# can recognize them properly.
+						elsif ($attopt =~ /BKI_DEFAULT\(['"]?([^'"]+)['"]?\)/)
 						{
 							$column{default} = $1;
 						}
+						elsif ($attopt =~ /BKI_LOOKUP\((\w+)\)/)
+						{
+							$column{lookup} = $1;
+						}
 						else
 						{
 							die
@@ -227,41 +207,89 @@ sub Catalogs
 				}
 			}
 		}
-		$catalogs{$catname} = \%catalog;
 		close $ifh;
-	}
-	return \%catalogs;
+	return \%catalog;
 }
 
-# Split a DATA line into fields.
-# Call this on the bki_values element of a DATA item returned by Catalogs();
-# it returns a list of field values.  We don't strip quoting from the fields.
-# Note: it should be safe to assign the result to a list of length equal to
-# the nominal number of catalog fields, because check_natts already checked
-# the number of fields.
-sub SplitDataLine
+# Parses a file containing Perl data structure literals, returning live data.
+#
+# The parameter $preserve_formatting needs to be set for callers that want
+# to work with non-data lines in the data files, such as comments and blank
+# lines. If a caller just wants to consume the data, leave it unset.
+sub ParseData
 {
-	my $bki_values = shift;
-
-	# This handling of quoted strings might look too simplistic, but it
-	# matches what bootscanner.l does: that has no provision for quote marks
-	# inside quoted strings, either.  If we don't have a quoted string, just
-	# snarf everything till next whitespace.  That will accept some things
-	# that bootscanner.l will see as erroneous tokens; but it seems wiser
-	# to do that and let bootscanner.l complain than to silently drop
-	# non-whitespace characters.
-	my @result = $bki_values =~ /"[^"]*"|\S+/g;
-
-	return @result;
+	my ($input_file, $schema, $preserve_formatting) = @_;
+
+	open(my $ifd, '<', $input_file) || die "$input_file: $!";
+	$input_file =~ /(\w+)\.dat$/
+	  or die "Input file needs to be a .dat file.\n";
+	my $catname = $1;
+	my $data = [];
+
+	# Scan the input file.
+	while (<$ifd>)
+	{
+		my $hash_ref;
+
+		if (/{/)
+		{
+			# Capture the hash ref
+			# NB: Assumes that the next hash ref can't start on the
+			# same line where the present one ended.
+			# Not foolproof, but we shouldn't need a full parser,
+			# since we expect relatively well-behaved input.
+
+			# Quick hack to detect when we have a full hash ref to
+			# parse. We can't just use a regex because of values in
+			# pg_aggregate and pg_proc like '{0,0}'.
+			my $lcnt = tr/{//;
+			my $rcnt = tr/}//;
+
+			if ($lcnt == $rcnt)
+			{
+				eval '$hash_ref = ' . $_;
+				if (!ref $hash_ref)
+				{
+					die "Error parsing $_\n$!";
+				}
+
+				# Expand tuples to their full representation.
+				AddDefaultValues($hash_ref, $schema, $catname);
+			}
+			else
+			{
+				my $next_line = <$ifd>;
+				die "$input_file: ends within Perl hash\n"
+				  if !defined $next_line;
+				$_ .= $next_line;
+				redo;
+			}
+		}
+
+		# If we found a hash reference, keep it
+		# and annotate the line number.
+		# Only keep non-data strings if we
+		# are told to preserve formatting.
+		if (defined $hash_ref)
+		{
+			$hash_ref->{line_number} = $.;
+			push @$data, $hash_ref;
+		}
+		elsif ($preserve_formatting)
+		{
+			push @$data, $_;
+		}
+	}
+	close $ifd;
+	return $data;
 }
 
-# Fill in default values of a record using the given schema. It's the
-# caller's responsibility to specify other values beforehand.
+# Fill in default values of a record using the given schema.
+# It's the caller's responsibility to specify other values beforehand.
 sub AddDefaultValues
 {
-	my ($row, $schema) = @_;
+	my ($row, $schema, $catname) = @_;
 	my @missing_fields;
-	my $msg;
 
 	foreach my $column (@$schema)
 	{
@@ -276,6 +304,13 @@ sub AddDefaultValues
 		{
 			$row->{$attname} = $column->{default};
 		}
+		elsif ($catname eq 'pg_proc' && $attname eq 'pronargs' &&
+			   defined($row->{proargtypes}))
+		{
+			# pg_proc.pronargs can be derived from proargtypes.
+			my @proargtypes = split /\s+/, $row->{proargtypes};
+			$row->{$attname} = scalar(@proargtypes);
+		}
 		else
 		{
 			# Failed to find a value.
@@ -285,14 +320,15 @@ sub AddDefaultValues
 
 	if (@missing_fields)
 	{
-		$msg = "Missing values for: " . join(', ', @missing_fields);
-		$msg .= "\nShowing other values for context:\n";
+		my $msg = "Failed to form full tuple for $catname\n";
+		$msg .= "Missing values for: " . join(', ', @missing_fields);
+		$msg .= "\nOther values for row:\n";
 		while (my($key, $value) = each %$row)
 		{
 			$msg .= "$key => $value, ";
 		}
+		die $msg;
 	}
-	return $msg;
 }
 
 # Rename temporary files to final names.
@@ -308,7 +344,6 @@ sub RenameTempFile
 	rename($temp_name, $final_name) || die "rename: $temp_name: $!";
 }
 
-
 # Find a symbol defined in a particular header file and extract the value.
 #
 # The include path has to be passed as a reference to an array.
@@ -340,22 +375,18 @@ sub FindDefinedSymbol
 	die "$catalog_header: not found in any include directory\n";
 }
 
-
-# verify the number of fields in the passed-in DATA line
-sub check_natts
+# Similar to FindDefinedSymbol, but looks in the bootstrap metadata.
+sub FindDefinedSymbolFromData
 {
-	my ($catname, $natts, $bki_val, $file, $line) = @_;
-
-	die
-"Could not find definition for Natts_${catname} before start of DATA() in $file\n"
-	  unless defined $natts;
-
-	my $nfields = scalar(SplitDataLine($bki_val));
-
-	die sprintf
-"Wrong number of attributes in DATA() entry at %s:%d (expected %d but got %d)\n",
-	  $file, $line, $natts, $nfields
-	  unless $natts == $nfields;
+	my ($data, $symbol) = @_;
+	foreach my $row (@{ $data })
+	{
+		if ($row->{oid_symbol} eq $symbol)
+		{
+			return $row->{oid};
+		}
+	}
+	die "no definition found for $symbol\n";
 }
 
 1;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 30ca5095347..d25d98a40b8 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -2,6 +2,9 @@
 #
 # Makefile for backend/catalog
 #
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
+#
 # src/backend/catalog/Makefile
 #
 #-------------------------------------------------------------------------
@@ -22,13 +25,11 @@ BKIFILES = postgres.bki postgres.description postgres.shdescription
 
 include $(top_srcdir)/src/backend/common.mk
 
-all: $(BKIFILES) schemapg.h
-
-# Note: there are some undocumented dependencies on the ordering in which
-# the catalog header files are assembled into postgres.bki.  In particular,
-# indexing.h had better be last, and toasting.h just before it.
-
-POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
+# Note: the order of this list determines the order in which the catalog
+# header files are assembled into postgres.bki.  BKI_BOOTSTRAP catalogs
+# must appear first, and there are reputedly other, undocumented ordering
+# dependencies.
+CATALOG_HEADERS := \
 	pg_proc.h pg_type.h pg_attribute.h pg_class.h \
 	pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
 	pg_opfamily.h pg_opclass.h pg_am.h pg_amop.h pg_amproc.h \
@@ -45,34 +46,63 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\
 	pg_default_acl.h pg_init_privs.h pg_seclabel.h pg_shseclabel.h \
 	pg_collation.h pg_partitioned_table.h pg_range.h pg_transform.h \
 	pg_sequence.h pg_publication.h pg_publication_rel.h pg_subscription.h \
-	pg_subscription_rel.h \
-	toasting.h indexing.h \
-    )
+	pg_subscription_rel.h
+
+GENERATED_HEADERS := $(CATALOG_HEADERS:%.h=%_d.h) schemapg.h
+
+# In the list of headers used to assemble postgres.bki, indexing.h needs
+# be last, and toasting.h just before it.  This ensures we don't try to
+# create indexes or toast tables before their catalogs exist.
+POSTGRES_BKI_SRCS := $(addprefix $(top_srcdir)/src/include/catalog/,\
+	$(CATALOG_HEADERS) toasting.h indexing.h \
+	)
+
+# The .dat files we need can just be listed alphabetically.
+POSTGRES_BKI_DATA = $(addprefix $(top_srcdir)/src/include/catalog/,\
+	pg_aggregate.dat pg_am.dat pg_amop.dat pg_amproc.dat pg_authid.dat \
+	pg_cast.dat pg_class.dat pg_collation.dat \
+	pg_database.dat pg_language.dat \
+	pg_namespace.dat pg_opclass.dat pg_operator.dat pg_opfamily.dat \
+	pg_pltemplate.dat pg_proc.dat pg_range.dat pg_tablespace.dat \
+	pg_ts_config.dat pg_ts_config_map.dat pg_ts_dict.dat pg_ts_parser.dat \
+	pg_ts_template.dat pg_type.dat \
+	)
 
 # location of Catalog.pm
 catalogdir = $(top_srcdir)/src/backend/catalog
 
-# locations of headers that genbki.pl needs to read
-pg_includes = -I$(top_srcdir)/src/include/catalog -I$(top_builddir)/src/include/catalog
+all: distprep generated-header-symlinks
 
-# see explanation in ../parser/Makefile
-postgres.description: postgres.bki ;
+distprep: bki-stamp
 
-postgres.shdescription: postgres.bki ;
+.PHONY: generated-header-symlinks
 
-schemapg.h: postgres.bki ;
+generated-header-symlinks: $(top_builddir)/src/include/catalog/header-stamp
 
-# Technically, this should depend on Makefile.global, but then
-# postgres.bki would need to be rebuilt after every configure run,
-# even in distribution tarballs.  So this is cheating a bit, but it
-# will achieve the goal of updating the version number when it
-# changes.
-postgres.bki: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(top_srcdir)/configure $(top_srcdir)/src/include/catalog/duplicate_oids
+# Technically, this should depend on Makefile.global which supplies
+# $(MAJORVERSION); but then postgres.bki would need to be rebuilt after every
+# configure run, even in distribution tarballs.  So depending on configure.in
+# instead is cheating a bit, but it will achieve the goal of updating the
+# version number when it changes.
+bki-stamp: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(POSTGRES_BKI_DATA) $(top_srcdir)/configure.in $(top_srcdir)/src/include/catalog/duplicate_oids
 	cd $(top_srcdir)/src/include/catalog && $(PERL) ./duplicate_oids
-	$(PERL) -I $(catalogdir) $< $(pg_includes) --set-version=$(MAJORVERSION) $(POSTGRES_BKI_SRCS)
-
+	$(PERL) -I $(catalogdir) $< --set-version=$(MAJORVERSION) $(POSTGRES_BKI_SRCS)
+	touch $@
+
+# The generated headers must all be symlinked into builddir/src/include/,
+# using absolute links for the reasons explained in src/backend/Makefile.
+# We use header-stamp to record that we've done this because the symlinks
+# themselves may appear older than bki-stamp.
+$(top_builddir)/src/include/catalog/header-stamp: bki-stamp
+	prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
+	cd '$(dir $@)' && for file in $(GENERATED_HEADERS); do \
+	  rm -f $$file && $(LN_S) "$$prereqdir/$$file" . ; \
+	done
+	touch $@
+
+# Note: installation of generated headers is handled elsewhere
 .PHONY: install-data
-install-data: $(BKIFILES) installdirs
+install-data: bki-stamp installdirs
 	$(INSTALL_DATA) $(call vpathsearch,postgres.bki) '$(DESTDIR)$(datadir)/postgres.bki'
 	$(INSTALL_DATA) $(call vpathsearch,postgres.description) '$(DESTDIR)$(datadir)/postgres.description'
 	$(INSTALL_DATA) $(call vpathsearch,postgres.shdescription) '$(DESTDIR)$(datadir)/postgres.shdescription'
@@ -87,9 +117,10 @@ installdirs:
 uninstall-data:
 	rm -f $(addprefix '$(DESTDIR)$(datadir)'/, $(BKIFILES) system_views.sql information_schema.sql sql_features.txt)
 
-# postgres.bki, postgres.description, postgres.shdescription, and schemapg.h
-# are in the distribution tarball, so they are not cleaned here.
+# postgres.bki, postgres.description, postgres.shdescription,
+# and the generated headers are in the distribution tarball,
+# so they are not cleaned here.
 clean:
 
 maintainer-clean: clean
-	rm -f $(BKIFILES)
+	rm -f bki-stamp $(BKIFILES) $(GENERATED_HEADERS)
diff --git a/src/backend/catalog/README b/src/backend/catalog/README
deleted file mode 100644
index 7e0ddf312dd..00000000000
--- a/src/backend/catalog/README
+++ /dev/null
@@ -1,111 +0,0 @@
-src/backend/catalog/README
-
-System Catalog
-==============
-
-This directory contains .c files that manipulate the system catalogs;
-src/include/catalog contains the .h files that define the structure
-of the system catalogs.
-
-When the compile-time scripts (Gen_fmgrtab.pl and genbki.pl)
-execute, they grep the DATA statements out of the .h files and munge
-these in order to generate the postgres.bki file.  The .bki file is then
-used as input to initdb (which is just a wrapper around postgres
-running single-user in bootstrapping mode) in order to generate the
-initial (template) system catalog relation files.
-
------------------------------------------------------------------
-
-People who are going to hose around with the .h files should be aware
-of the following facts:
-
-- It is very important that the DATA statements be properly formatted
-(e.g., no broken lines, proper use of white-space and _null_).  The
-scripts are line-oriented and break easily.  In addition, the only
-documentation on the proper format for them is the code in the
-bootstrap/ directory.  Just be careful when adding new DATA
-statements.
-
-- Some catalogs require that OIDs be preallocated to tuples because
-of cross-references from other pre-loaded tuples.  For example, pg_type
-contains pointers into pg_proc (e.g., pg_type.typinput), and pg_proc
-contains back-pointers into pg_type (pg_proc.proargtypes).  For such
-cases, the OID assigned to a tuple may be explicitly set by use of the
-"OID = n" clause of the .bki insert statement.  If no such pointers are
-required to a given tuple, then the OID = n clause may be omitted
-(then the system generates an OID in the usual way, or leaves it 0 in a
-catalog that has no OIDs).  In practice we usually preassign OIDs
-for all or none of the pre-loaded tuples in a given catalog, even if only
-some of them are actually cross-referenced.
-
-- We also sometimes preallocate OIDs for catalog tuples whose OIDs must
-be known directly in the C code.  In such cases, put a #define in the
-catalog's .h file, and use the #define symbol in the C code.  Writing
-the actual numeric value of any OID in C code is considered very bad form.
-Direct references to pg_proc OIDs are common enough that there's a special
-mechanism to create the necessary #define's automatically: see
-backend/utils/Gen_fmgrtab.pl.  We also have standard conventions for setting
-up #define's for the pg_class OIDs of system catalogs and indexes.  For all
-the other system catalogs, you have to manually create any #define's you
-need.
-
-- If you need to find a valid OID for a new predefined tuple,
-use the unused_oids script.  It generates inclusive ranges of
-*unused* OIDs (e.g., the line "45-900" means OIDs 45 through 900 have
-not been allocated yet).  Currently, OIDs 1-9999 are reserved for manual
-assignment; the unused_oids script simply looks through the include/catalog
-headers to see which ones do not appear in "OID =" clauses in DATA lines.
-(As of Postgres 8.1, it also looks at CATALOG and DECLARE_INDEX lines.)
-You can also use the duplicate_oids script to check for mistakes.
-
-- The OID counter starts at 10000 at bootstrap.  If a catalog row is in a
-table that requires OIDs, but no OID was preassigned by an "OID =" clause,
-then it will receive an OID of 10000 or above.
-
-- To create a "BOOTSTRAP" table you have to do a lot of extra work: these
-tables are not created through a normal CREATE TABLE operation, but spring
-into existence when first written to during initdb.  Therefore, you must
-manually create appropriate entries for them in the pre-loaded contents of
-pg_class, pg_attribute, and pg_type.  Avoid making new catalogs be bootstrap
-catalogs if at all possible; generally, only tables that must be written to
-in order to create a table should be bootstrapped.
-
-- Certain BOOTSTRAP tables must be at the start of the Makefile
-POSTGRES_BKI_SRCS variable, as these cannot be created through the standard
-heap_create_with_catalog process, because it needs these tables to exist
-already.  The list of files this currently includes is:
-	pg_proc.h pg_type.h pg_attribute.h pg_class.h
-Within this list, pg_type.h must come before pg_attribute.h.
-Also, indexing.h must be last, since the indexes can't be created until all
-the tables are in place, and toasting.h should probably be next-to-last
-(or at least after all the tables that need toast tables).  There are
-reputedly some other order dependencies in the .bki list, too.
-
------------------------------------------------------------------
-
-When munging the .c files, you should be aware of certain conventions:
-
-- The system catalog cache code (and most catalog-munging code in
-general) assumes that the fixed-length portions of all system catalog
-tuples are in fact present, because it maps C struct declarations onto
-them.  Thus, the variable-length fields must all be at the end, and
-only the variable-length fields of a catalog tuple are permitted to be
-NULL.  For example, if you set pg_type.typrelid to be NULL, a
-piece of code will likely perform "typetup->typrelid" (or, worse,
-"typetup->typelem", which follows typrelid).  This will result in
-random errors or even segmentation violations.  Hence, do NOT insert
-catalog tuples that contain NULL attributes except in their
-variable-length portions!  (The bootstrapping code is fairly good about
-marking NOT NULL each of the columns that can legally be referenced via
-C struct declarations ... but those markings won't be enforced against
-DATA commands, so you must get it right in a DATA line.)
-
-- Modification of the catalogs must be performed with the proper
-updating of catalog indexes!  That is, most catalogs have indexes
-on them; when you munge them using the executor, the executor will
-take care of doing the index updates, but if you make direct access
-method calls to insert new or modified tuples into a heap, you must
-also make the calls to insert the tuple into ALL of its indexes!  If
-not, the new tuple will generally be "invisible" to the system because
-most of the accesses to the catalogs in question will be through the
-associated indexes.
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index b4abbff101f..56312ded8a0 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -3,9 +3,9 @@
 #
 # genbki.pl
 #    Perl script that generates postgres.bki, postgres.description,
-#    postgres.shdescription, and schemapg.h from specially formatted
-#    header files.  The .bki files are used to initialize the postgres
-#    template database.
+#    postgres.shdescription, and symbol definition headers from specially
+#    formatted header files and data files.  The BKI files are used to
+#    initialize the postgres template database.
 #
 # Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
 # Portions Copyright (c) 1994, Regents of the University of California
@@ -20,7 +20,6 @@ use strict;
 use warnings;
 
 my @input_files;
-my @include_path;
 my $output_path = '';
 my $major_version;
 
@@ -36,10 +35,6 @@ while (@ARGV)
 	{
 		$output_path = length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
 	}
-	elsif ($arg =~ /^-I/)
-	{
-		push @include_path, length($arg) > 2 ? substr($arg, 2) : shift @ARGV;
-	}
 	elsif ($arg =~ /^--set-version=(.*)$/)
 	{
 		$major_version = $1;
@@ -53,8 +48,7 @@ while (@ARGV)
 }
 
 # Sanity check arguments.
-die "No input files.\n"                                     if !@input_files;
-die "No include path; you must specify -I at least once.\n" if !@include_path;
+die "No input files.\n" if !@input_files;
 die "--set-version must be specified.\n" if !defined $major_version;
 
 # Make sure output_path ends in a slash.
@@ -78,25 +72,151 @@ my $shdescrfile = $output_path . 'postgres.shdescription';
 open my $shdescr, '>', $shdescrfile . $tmpext
   or die "can't open $shdescrfile$tmpext: $!";
 
+# Read all the files into internal data structures. Not all catalogs
+# will have a data file.
+my @catnames;
+my %catalogs;
+my %catalog_data;
+my @toast_decls;
+my @index_decls;
+foreach my $header (@input_files)
+{
+	$header =~ /(.+)\.h$/
+	  or die "Input files need to be header files.\n";
+	my $datfile = "$1.dat";
+
+	my $catalog = Catalog::ParseHeader($header);
+	my $catname = $catalog->{catname};
+	my $schema  = $catalog->{columns};
+
+	if (defined $catname)
+	{
+		push @catnames, $catname;
+		$catalogs{$catname} = $catalog;
+	}
+
+	if (-e $datfile)
+	{
+		$catalog_data{$catname} = Catalog::ParseData($datfile, $schema, 0);
+	}
+
+	foreach my $toast_decl (@{ $catalog->{toasting} })
+	{
+		push @toast_decls, $toast_decl;
+	}
+	foreach my $index_decl (@{ $catalog->{indexing} })
+	{
+		push @index_decls, $index_decl;
+	}
+}
+
 # Fetch some special data that we will substitute into the output file.
 # CAUTION: be wary about what symbols you substitute into the .bki file here!
 # It's okay to substitute things that are expected to be really constant
 # within a given Postgres release, such as fixed OIDs.  Do not substitute
 # anything that could depend on platform or configuration.  (The right place
 # to handle those sorts of things is in initdb.c's bootstrap_template1().)
-# NB: make sure that the files used here are known to be part of the .bki
-# file's dependencies by src/backend/catalog/Makefile.
-my $BOOTSTRAP_SUPERUSERID =
-  Catalog::FindDefinedSymbol('pg_authid.h', \@include_path,
-							 'BOOTSTRAP_SUPERUSERID');
-my $PG_CATALOG_NAMESPACE =
-  Catalog::FindDefinedSymbol('pg_namespace.h', \@include_path,
-							 'PG_CATALOG_NAMESPACE');
+my $BOOTSTRAP_SUPERUSERID = Catalog::FindDefinedSymbolFromData(
+	$catalog_data{pg_authid}, 'BOOTSTRAP_SUPERUSERID');
+my $PG_CATALOG_NAMESPACE  = Catalog::FindDefinedSymbolFromData(
+	$catalog_data{pg_namespace}, 'PG_CATALOG_NAMESPACE');
+
+
+# Build lookup tables for OID macro substitutions and for pg_attribute
+# copies of pg_type values.
 
-# Read all the input header files into internal data structures
-my $catalogs = Catalog::Catalogs(@input_files);
+# index access method OID lookup
+my %amoids;
+foreach my $row (@{ $catalog_data{pg_am} })
+{
+	$amoids{ $row->{amname} } = $row->{oid};
+}
 
-# Generate postgres.bki, postgres.description, and postgres.shdescription
+# opclass OID lookup
+my %opcoids;
+foreach my $row (@{ $catalog_data{pg_opclass} })
+{
+	# There is no unique name, so we need to combine access method
+	# and opclass name.
+	my $key = sprintf "%s/%s",
+	  $row->{opcmethod}, $row->{opcname};
+	$opcoids{$key} = $row->{oid};
+}
+
+# operator OID lookup
+my %operoids;
+foreach my $row (@{ $catalog_data{pg_operator} })
+{
+	# There is no unique name, so we need to invent one that contains
+	# the relevant type names.
+	my $key = sprintf "%s(%s,%s)",
+	  $row->{oprname}, $row->{oprleft}, $row->{oprright};
+	$operoids{$key} = $row->{oid};
+}
+
+# opfamily OID lookup
+my %opfoids;
+foreach my $row (@{ $catalog_data{pg_opfamily} })
+{
+	# There is no unique name, so we need to combine access method
+	# and opfamily name.
+	my $key = sprintf "%s/%s",
+	  $row->{opfmethod}, $row->{opfname};
+	$opfoids{$key} = $row->{oid};
+}
+
+# procedure OID lookup
+my %procoids;
+foreach my $row (@{ $catalog_data{pg_proc} })
+{
+	# Generate an entry under just the proname (corresponds to regproc lookup)
+	my $prokey = $row->{proname};
+	if (defined $procoids{$prokey})
+	{
+		$procoids{$prokey} = 'MULTIPLE';
+	}
+	else
+	{
+		$procoids{$prokey} = $row->{oid};
+	}
+	# Also generate an entry using proname(proargtypes).  This is not quite
+	# identical to regprocedure lookup because we don't worry much about
+	# special SQL names for types etc; we just use the names in the source
+	# proargtypes field.  These *should* be unique, but do a multiplicity
+	# check anyway.
+	$prokey .= '(' . join(',', split(/\s+/, $row->{proargtypes})) . ')';
+	if (defined $procoids{$prokey})
+	{
+		$procoids{$prokey} = 'MULTIPLE';
+	}
+	else
+	{
+		$procoids{$prokey} = $row->{oid};
+	}
+}
+
+# type lookups
+my %typeoids;
+my %types;
+foreach my $row (@{ $catalog_data{pg_type} })
+{
+	$typeoids{ $row->{typname} } = $row->{oid};
+	$types{ $row->{typname} } = $row;
+}
+
+# Map catalog name to OID lookup.
+my %lookup_kind = (
+	pg_am       => \%amoids,
+	pg_opclass  => \%opcoids,
+	pg_operator => \%operoids,
+	pg_opfamily => \%opfoids,
+	pg_proc     => \%procoids,
+	pg_type     => \%typeoids
+);
+
+
+# Generate postgres.bki, postgres.description, postgres.shdescription,
+# and pg_*_d.h headers.
 
 # version marker for .bki file
 print $bki "# PostgreSQL $major_version\n";
@@ -104,32 +224,69 @@ print $bki "# PostgreSQL $major_version\n";
 # vars to hold data needed for schemapg.h
 my %schemapg_entries;
 my @tables_needing_macros;
-my %regprocoids;
-my %types;
 
 # produce output, one catalog at a time
-foreach my $catname (@{ $catalogs->{names} })
+foreach my $catname (@catnames)
 {
+	my $catalog = $catalogs{$catname};
+
+	# Create one definition header with macro definitions for each catalog.
+	my $def_file = $output_path . $catname . '_d.h';
+	open my $def, '>', $def_file . $tmpext
+	  or die "can't open $def_file$tmpext: $!";
+
+	# Opening boilerplate for pg_*_d.h
+	printf $def <<EOM, $catname, $catname, uc $catname, uc $catname;
+/*-------------------------------------------------------------------------
+ *
+ * %s_d.h
+ *    Macro definitions for %s
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * NOTES
+ *  ******************************
+ *  *** DO NOT EDIT THIS FILE! ***
+ *  ******************************
+ *
+ *  It has been GENERATED by src/backend/catalog/genbki.pl
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef %s_D_H
+#define %s_D_H
+
+EOM
+
+	# Emit OID macros for catalog's OID and rowtype OID, if wanted
+	printf $def "#define %s %s\n",
+	  $catalog->{relation_oid_macro}, $catalog->{relation_oid}
+	  if $catalog->{relation_oid_macro};
+	printf $def "#define %s %s\n",
+	  $catalog->{rowtype_oid_macro}, $catalog->{rowtype_oid}
+	  if $catalog->{rowtype_oid_macro};
+	print $def "\n";
 
 	# .bki CREATE command for this catalog
-	my $catalog = $catalogs->{$catname};
 	print $bki "create $catname $catalog->{relation_oid}"
 	  . $catalog->{shared_relation}
 	  . $catalog->{bootstrap}
 	  . $catalog->{without_oids}
-	  . $catalog->{rowtype_oid} . "\n";
+	  . $catalog->{rowtype_oid_clause};
 
-	my @attnames;
 	my $first = 1;
 
-	print $bki " (\n";
+	print $bki "\n (\n";
 	my $schema = $catalog->{columns};
+	my $attnum = 0;
 	foreach my $column (@$schema)
 	{
+		$attnum++;
 		my $attname = $column->{name};
 		my $atttype = $column->{type};
-		push @attnames, $attname;
 
+		# Emit column definitions
 		if (!$first)
 		{
 			print $bki " ,\n";
@@ -146,10 +303,23 @@ foreach my $catname (@{ $catalogs->{names} })
 		{
 			print $bki " FORCE NULL";
 		}
+
+		# Emit Anum_* constants
+		print $def
+		  sprintf("#define Anum_%s_%s %s\n", $catname, $attname, $attnum);
 	}
 	print $bki "\n )\n";
 
-	# Open it, unless bootstrap case (create bootstrap does this
+	# Emit Natts_* constant
+	print $def "\n#define Natts_$catname $attnum\n\n";
+
+	# Emit client code copied from source header
+	foreach my $line (@{ $catalog->{client_code} })
+	{
+		print $def $line;
+	}
+
+	# Open it, unless it's a bootstrap catalog (create bootstrap does this
 	# automatically)
 	if (!$catalog->{bootstrap})
 	{
@@ -157,21 +327,15 @@ foreach my $catname (@{ $catalogs->{names} })
 	}
 
 	# For pg_attribute.h, we generate data entries ourselves.
-	# NB: pg_type.h must come before pg_attribute.h in the input list
-	# of catalog names, since we use info from pg_type.h here.
 	if ($catname eq 'pg_attribute')
 	{
-		gen_pg_attribute($schema, @attnames);
+		gen_pg_attribute($schema);
 	}
 
-	# Ordinary catalog with DATA line(s)
-	foreach my $row (@{ $catalog->{data} })
+	# Ordinary catalog with a data file
+	foreach my $row (@{ $catalog_data{$catname} })
 	{
-
-		# Split line into tokens without interpreting their meaning.
-		my %bki_values;
-		@bki_values{@attnames} =
-		  Catalog::SplitDataLine($row->{bki_values});
+		my %bki_values = %$row;
 
 		# Perform required substitutions on fields
 		foreach my $column (@$schema)
@@ -184,71 +348,102 @@ foreach my $catname (@{ $catalogs->{names} })
 			$bki_values{$attname} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
 			$bki_values{$attname} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;
 
-			# Replace regproc columns' values with OIDs.
-			# If we don't have a unique value to substitute,
-			# just do nothing (regprocin will complain).
-			if ($atttype eq 'regproc')
+			# Replace OID synonyms with OIDs per the appropriate lookup rule.
+			#
+			# If the column type is oidvector or oid[], we have to replace
+			# each element of the array as per the lookup rule.
+			if ($column->{lookup})
 			{
-				my $procoid = $regprocoids{ $bki_values{$attname} };
-				$bki_values{$attname} = $procoid
-				  if defined($procoid) && $procoid ne 'MULTIPLE';
+				my $lookup = $lookup_kind{ $column->{lookup} };
+				my @lookupnames;
+				my @lookupoids;
+
+				die "unrecognized BKI_LOOKUP type " . $column->{lookup}
+				  if !defined($lookup);
+
+				if ($atttype eq 'oidvector')
+				{
+					@lookupnames = split /\s+/, $bki_values{$attname};
+					@lookupoids = lookup_oids($lookup, $catname,
+											  \%bki_values, @lookupnames);
+					$bki_values{$attname} = join(' ', @lookupoids);
+				}
+				elsif ($atttype eq 'oid[]')
+				{
+					if ($bki_values{$attname} ne '_null_')
+					{
+						$bki_values{$attname} =~ s/[{}]//g;
+						@lookupnames = split /,/, $bki_values{$attname};
+						@lookupoids = lookup_oids($lookup, $catname,
+												  \%bki_values, @lookupnames);
+						$bki_values{$attname} =
+							sprintf "{%s}", join(',', @lookupoids);
+					}
+				}
+				else
+				{
+					$lookupnames[0] = $bki_values{$attname};
+					@lookupoids = lookup_oids($lookup, $catname,
+											  \%bki_values, @lookupnames);
+					$bki_values{$attname} = $lookupoids[0];
+				}
 			}
 		}
 
-		# Save pg_proc oids for use in later regproc substitutions.
-		# This relies on the order we process the files in!
-		if ($catname eq 'pg_proc')
+		# Special hack to generate OID symbols for pg_type entries
+		# that lack one.
+		if ($catname eq 'pg_type' and !exists $bki_values{oid_symbol})
 		{
-			if (defined($regprocoids{ $bki_values{proname} }))
-			{
-				$regprocoids{ $bki_values{proname} } = 'MULTIPLE';
-			}
-			else
-			{
-				$regprocoids{ $bki_values{proname} } = $row->{oid};
-			}
-		}
-
-		# Save pg_type info for pg_attribute processing below
-		if ($catname eq 'pg_type')
-		{
-			my %type = %bki_values;
-			$type{oid} = $row->{oid};
-			$types{ $type{typname} } = \%type;
+			my $symbol = form_pg_type_symbol($bki_values{typname});
+			$bki_values{oid_symbol} = $symbol
+			  if defined $symbol;
 		}
 
 		# Write to postgres.bki
-		my $oid = $row->{oid} ? "OID = $row->{oid} " : '';
-		printf $bki "insert %s( %s )\n", $oid,
-		  join(' ', @bki_values{@attnames});
+		print_bki_insert(\%bki_values, $schema);
 
 		# Write comments to postgres.description and
 		# postgres.shdescription
-		if (defined $row->{descr})
+		if (defined $bki_values{descr})
 		{
-			printf $descr "%s\t%s\t0\t%s\n",
-			  $row->{oid}, $catname, $row->{descr};
+			if ($catalog->{shared_relation})
+			{
+				printf $shdescr "%s\t%s\t%s\n",
+				  $bki_values{oid}, $catname, $bki_values{descr};
+			}
+			else
+			{
+				printf $descr "%s\t%s\t0\t%s\n",
+				  $bki_values{oid}, $catname, $bki_values{descr};
+			}
 		}
-		if (defined $row->{shdescr})
+
+		# Emit OID symbol
+		if (defined $bki_values{oid_symbol})
 		{
-			printf $shdescr "%s\t%s\t%s\n",
-			  $row->{oid}, $catname, $row->{shdescr};
+			printf $def "#define %s %s\n",
+			  $bki_values{oid_symbol}, $bki_values{oid};
 		}
 	}
 
 	print $bki "close $catname\n";
+	print $def sprintf("\n#endif\t\t\t\t\t\t\t/* %s_D_H */\n", uc $catname);
+
+	# Close and rename definition header
+	close $def;
+	Catalog::RenameTempFile($def_file, $tmpext);
 }
 
 # Any information needed for the BKI that is not contained in a pg_*.h header
 # (i.e., not contained in a header with a CATALOG() statement) comes here
 
 # Write out declare toast/index statements
-foreach my $declaration (@{ $catalogs->{toasting}->{data} })
+foreach my $declaration (@toast_decls)
 {
 	print $bki $declaration;
 }
 
-foreach my $declaration (@{ $catalogs->{indexing}->{data} })
+foreach my $declaration (@index_decls)
 {
 	print $bki $declaration;
 }
@@ -288,7 +483,7 @@ foreach my $table_name (@tables_needing_macros)
 }
 
 # Closing boilerplate for schemapg.h
-print $schemapg "\n#endif /* SCHEMAPG_H */\n";
+print $schemapg "\n#endif\t\t\t\t\t\t\t/* SCHEMAPG_H */\n";
 
 # We're done emitting data
 close $bki;
@@ -314,11 +509,16 @@ exit 0;
 sub gen_pg_attribute
 {
 	my $schema = shift;
-	my @attnames = @_;
 
-	foreach my $table_name (@{ $catalogs->{names} })
+	my @attnames;
+	foreach my $column (@$schema)
+	{
+		push @attnames, $column->{name};
+	}
+
+	foreach my $table_name (@catnames)
 	{
-		my $table = $catalogs->{$table_name};
+		my $table = $catalogs{$table_name};
 
 		# Currently, all bootstrapped relations also need schemapg.h
 		# entries, so skip if the relation isn't to be in schemapg.h.
@@ -341,7 +541,7 @@ sub gen_pg_attribute
 			$priornotnull &= ($row{attnotnull} eq 't');
 
 			# If it's bootstrapped, put an entry in postgres.bki.
-			print_bki_insert(\%row, @attnames) if $table->{bootstrap};
+			print_bki_insert(\%row, $schema) if $table->{bootstrap};
 
 			# Store schemapg entries for later.
 			morph_row_for_schemapg(\%row, $schema);
@@ -377,7 +577,7 @@ sub gen_pg_attribute
 					  && $attr->{name} eq 'oid';
 
 				morph_row_for_pgattr(\%row, $schema, $attr, 1);
-				print_bki_insert(\%row, @attnames);
+				print_bki_insert(\%row, $schema);
 			}
 		}
 	}
@@ -441,21 +641,54 @@ sub morph_row_for_pgattr
 		$row->{attnotnull} = 'f';
 	}
 
-	my $error = Catalog::AddDefaultValues($row, $pgattr_schema);
-	if ($error)
-	{
-		die "Failed to form full tuple for pg_attribute: ", $error;
-	}
+	Catalog::AddDefaultValues($row, $pgattr_schema, 'pg_attribute');
 }
 
-# Write a pg_attribute entry to postgres.bki
+# Write an entry to postgres.bki. Adding quotes here allows us to keep
+# most double quotes out of the catalog data files for readability. See
+# bootscanner.l for what tokens need quoting.
 sub print_bki_insert
 {
-	my $row        = shift;
-	my @attnames   = @_;
-	my $oid        = $row->{oid} ? "OID = $row->{oid} " : '';
-	my $bki_values = join ' ', @{$row}{@attnames};
-	printf $bki "insert %s( %s )\n", $oid, $bki_values;
+	my $row    = shift;
+	my $schema = shift;
+
+	my @bki_values;
+	my $oid = $row->{oid} ? "OID = $row->{oid} " : '';
+
+	foreach my $column (@$schema)
+	{
+		my $attname   = $column->{name};
+		my $atttype   = $column->{type};
+		my $bki_value = $row->{$attname};
+
+		# Fold backslash-zero to empty string if it's the entire string,
+		# since that represents a NUL char in C code.
+		$bki_value = '' if $bki_value eq '\0';
+
+		$bki_value = sprintf(qq'"%s"', $bki_value)
+		  if  $bki_value ne '_null_'
+		  and $bki_value !~ /^"[^"]+"$/
+		  and ( length($bki_value) == 0       # Empty string
+				or $bki_value =~ /\s/         # Contains whitespace
+
+				# To preserve historical formatting, operator names are
+				# always quoted. Likewise for values of multi-element types,
+				# even if they only contain a single element.
+				or $attname eq 'oprname'
+				or $atttype eq 'oidvector'
+				or $atttype eq 'int2vector'
+				or $atttype =~ /\[\]$/
+
+				# Quote strings that have non-word characters. We make
+				# exceptions for values that are octals or negative numbers,
+				# for the same historical reason as above.
+				or (    $bki_value =~ /\W/
+					and $bki_value !~ /^\\\d{3}$/
+					and $bki_value !~ /^-\d*$/));
+
+		push @bki_values, $bki_value;
+	}
+	printf $bki "insert %s( %s )\n", $oid, join(' ', @bki_values);
 }
 
 # Given a row reference, modify it so that it becomes a valid entry for
@@ -481,8 +714,7 @@ sub morph_row_for_schemapg
 		}
 		elsif ($atttype eq 'char')
 		{
-			# Replace empty string by zero char constant; add single quotes
-			$row->{$attname} = '\0' if $row->{$attname} eq q|""|;
+			# Add single quotes
 			$row->{$attname} = sprintf("'%s'", $row->{$attname});
 		}
 
@@ -501,18 +733,66 @@ sub morph_row_for_schemapg
 	}
 }
 
+# Perform OID lookups on an array of OID names.
+# If we don't have a unique value to substitute, warn and
+# leave the entry unchanged.
+sub lookup_oids
+{
+	my ($lookup, $catname, $bki_values, @lookupnames) = @_;
+
+	my @lookupoids;
+	foreach my $lookupname (@lookupnames)
+	{
+		my $lookupoid = $lookup->{$lookupname};
+		if (defined($lookupoid) and $lookupoid ne 'MULTIPLE')
+		{
+			push @lookupoids, $lookupoid;
+		}
+		else
+		{
+			push @lookupoids, $lookupname;
+			warn sprintf "unresolved OID reference \"%s\" in %s.dat line %s",
+				$lookupname, $catname, $bki_values->{line_number}
+				if $lookupname ne '-' and $lookupname ne '0';
+		}
+	}
+	return @lookupoids;
+}
+
+# Determine canonical pg_type OID #define symbol from the type name.
+sub form_pg_type_symbol
+{
+	my $typename = shift;
+
+	# Skip for rowtypes of bootstrap tables, since they have their
+	# own naming convention defined elsewhere.
+	return
+	  if $typename eq 'pg_type'
+	    or $typename eq 'pg_proc'
+	    or $typename eq 'pg_attribute'
+	    or $typename eq 'pg_class';
+
+	# Transform like so:
+	#  foo_bar  ->  FOO_BAROID
+	# _foo_bar  ->  FOO_BARARRAYOID
+	$typename =~ /(_)?(.+)/;
+	my $arraystr = $1 ? 'ARRAY' : '';
+	my $name = uc $2;
+	return $name . $arraystr . 'OID';
+}
+
 sub usage
 {
 	die <<EOM;
 Usage: genbki.pl [options] header...
 
 Options:
-    -I               path to include files
     -o               output path
     --set-version    PostgreSQL version number for initdb cross-check
 
-genbki.pl generates BKI files from specially formatted
-header files.  These BKI files are used to initialize the
+genbki.pl generates BKI files and symbol definition
+headers from specially formatted header files and .dat
+files.  The BKI files are used to initialize the
 postgres template database.
 
 Report bugs to <pgsql-bugs\@postgresql.org>.
diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index 4ae86df1f71..3b112c69d8e 100644
--- a/src/backend/utils/Gen_fmgrtab.pl
+++ b/src/backend/utils/Gen_fmgrtab.pl
@@ -3,7 +3,7 @@
 #
 # Gen_fmgrtab.pl
 #    Perl script that generates fmgroids.h, fmgrprotos.h, and fmgrtab.c
-#    from pg_proc.h
+#    from pg_proc.dat
 #
 # Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
 # Portions Copyright (c) 1994, Regents of the University of California
@@ -20,7 +20,7 @@ use strict;
 use warnings;
 
 # Collect arguments
-my $infile;    # pg_proc.h
+my @input_files;
 my $output_path = '';
 my @include_path;
 
@@ -29,7 +29,7 @@ while (@ARGV)
 	my $arg = shift @ARGV;
 	if ($arg !~ /^-/)
 	{
-		$infile = $arg;
+		push @input_files, $arg;
 	}
 	elsif ($arg =~ /^-o/)
 	{
@@ -52,38 +52,50 @@ if ($output_path ne '' && substr($output_path, -1) ne '/')
 }
 
 # Sanity check arguments.
-die "No input files.\n"                                     if !$infile;
+die "No input files.\n"                                     if !@input_files;
 die "No include path; you must specify -I at least once.\n" if !@include_path;
 
-my $FirstBootstrapObjectId =
-	Catalog::FindDefinedSymbol('access/transam.h', \@include_path, 'FirstBootstrapObjectId');
-my $INTERNALlanguageId =
-	Catalog::FindDefinedSymbol('catalog/pg_language.h', \@include_path, 'INTERNALlanguageId');
+# Read all the input files into internal data structures.
+# Note: We pass data file names as arguments and then look for matching
+# headers to parse the schema from. This is backwards from genbki.pl,
+# but the Makefile dependencies look more sensible this way.
+my %catalogs;
+my %catalog_data;
+foreach my $datfile (@input_files)
+{
+	$datfile =~ /(.+)\.dat$/
+	  or die "Input files need to be data (.dat) files.\n";
 
-# Read all the data from the include/catalog files.
-my $catalogs = Catalog::Catalogs($infile);
+	my $header = "$1.h";
+	die "There in no header file corresponding to $datfile"
+	  if ! -e $header;
 
-# Collect the raw data from pg_proc.h.
-my @fmgr = ();
-my @attnames;
-foreach my $column (@{ $catalogs->{pg_proc}->{columns} })
-{
-	push @attnames, $column->{name};
+	my $catalog = Catalog::ParseHeader($header);
+	my $catname = $catalog->{catname};
+	my $schema  = $catalog->{columns};
+
+	$catalogs{$catname} = $catalog;
+	$catalog_data{$catname} = Catalog::ParseData($datfile, $schema, 0);
 }
 
-my $data = $catalogs->{pg_proc}->{data};
-foreach my $row (@$data)
-{
+# Fetch some values for later.
+my $FirstBootstrapObjectId = Catalog::FindDefinedSymbol(
+	'access/transam.h', \@include_path, 'FirstBootstrapObjectId');
+my $INTERNALlanguageId = Catalog::FindDefinedSymbolFromData(
+	$catalog_data{pg_language}, 'INTERNALlanguageId');
+
+# Collect certain fields from pg_proc.dat.
+my @fmgr = ();
 
-	# Split line into tokens without interpreting their meaning.
-	my %bki_values;
-	@bki_values{@attnames} = Catalog::SplitDataLine($row->{bki_values});
+foreach my $row (@{ $catalog_data{pg_proc} })
+{
+	my %bki_values = %$row;
 
 	# Select out just the rows for internal-language procedures.
 	next if $bki_values{prolang} ne $INTERNALlanguageId;
 
 	push @fmgr,
-	  { oid    => $row->{oid},
+	  { oid    => $bki_values{oid},
 		strict => $bki_values{proisstrict},
 		retset => $bki_values{proretset},
 		nargs  => $bki_values{pronargs},
@@ -281,10 +293,10 @@ Catalog::RenameTempFile($tabfile,    $tmpext);
 sub usage
 {
 	die <<EOM;
-Usage: perl -I [directory of Catalog.pm] Gen_fmgrtab.pl [path to pg_proc.h]
+Usage: perl -I [directory of Catalog.pm] Gen_fmgrtab.pl -I [include path] [path to pg_proc.dat]
 
 Gen_fmgrtab.pl generates fmgroids.h, fmgrprotos.h, and fmgrtab.c from
-pg_proc.h
+pg_proc.dat
 
 Report bugs to <pgsql-bugs\@postgresql.org>.
 EOM
diff --git a/src/backend/utils/Makefile b/src/backend/utils/Makefile
index 163c81a1c22..343637af858 100644
--- a/src/backend/utils/Makefile
+++ b/src/backend/utils/Makefile
@@ -1,8 +1,13 @@
+#-------------------------------------------------------------------------
 #
-# Makefile for utils
+# Makefile for backend/utils
+#
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
 #
 # src/backend/utils/Makefile
 #
+#-------------------------------------------------------------------------
 
 subdir = src/backend/utils
 top_builddir = ../../..
@@ -20,6 +25,10 @@ all: errcodes.h fmgroids.h fmgrprotos.h probes.h
 
 $(SUBDIRS:%=%-recursive): fmgroids.h fmgrprotos.h
 
+FMGR_DATA := $(addprefix $(top_srcdir)/src/include/catalog/,\
+	pg_language.dat pg_proc.dat \
+	)
+
 # see notes in src/backend/parser/Makefile
 fmgrprotos.h: fmgroids.h
 	touch $@
@@ -27,8 +36,8 @@ fmgrprotos.h: fmgroids.h
 fmgroids.h: fmgrtab.c
 	touch $@
 
-fmgrtab.c: Gen_fmgrtab.pl $(catalogdir)/Catalog.pm $(top_srcdir)/src/include/catalog/pg_proc.h
-	$(PERL) -I $(catalogdir) $< -I $(top_srcdir)/src/include/ $(top_srcdir)/src/include/catalog/pg_proc.h
+fmgrtab.c: Gen_fmgrtab.pl $(catalogdir)/Catalog.pm $(FMGR_DATA) $(top_srcdir)/src/include/access/transam.h
+	$(PERL) -I $(catalogdir) $< -I $(top_srcdir)/src/include/ $(FMGR_DATA)
 
 errcodes.h: $(top_srcdir)/src/backend/utils/errcodes.txt generate-errcodes.pl
 	$(PERL) $(srcdir)/generate-errcodes.pl $< > $@
author	Tom Lane <tgl@sss.pgh.pa.us>	2018-04-08 13:16:50 -0400
committer	Tom Lane <tgl@sss.pgh.pa.us>	2018-04-08 13:17:27 -0400
commit	372728b0d49552641f0ea83d9d2e08817de038fa (patch)
tree	5beca037d3fdfeaa09467c8b559c83eab5030878 /src/backend
parent	02f3e558f21c0fbec9f94d5de9ad34f321eb0e57 (diff)
download	postgresql-372728b0d49552641f0ea83d9d2e08817de038fa.tar.gz postgresql-372728b0d49552641f0ea83d9d2e08817de038fa.zip