aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/src/sgml/ref/copy.sgml127
1 files changed, 40 insertions, 87 deletions
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 48fa5562934..f60388bb05b 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -1,5 +1,5 @@
<!--
-$Header: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v 1.44 2003/04/20 01:52:55 momjian Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v 1.45 2003/05/07 22:23:27 tgl Exp $
PostgreSQL documentation
-->
@@ -119,7 +119,7 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
<term><literal>BINARY</literal></term>
<listitem>
<para>
- Forces all data to be stored or read in binary format rather
+ Causes all data to be stored or read in binary format rather
than as text. You cannot specify the <option>DELIMITER</option>
or <option>NULL</option> options in binary mode.
</para>
@@ -193,17 +193,18 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
</para>
<para>
- The <literal>BINARY</literal> key word will force all data to be
+ The <literal>BINARY</literal> key word causes all data to be
stored/read as binary format rather than as text. It is
- somewhat faster than the normal text mode, but a binary format
- file is not portable across machine architectures.
+ somewhat faster than the normal text mode, but a binary-format
+ file is less portable across machine architectures and
+ <productname>PostgreSQL</productname> versions.
</para>
<para>
- You must have select privilege on any table
+ You must have select privilege on the table
whose values are read by <command>COPY TO</command>, and
- insert privilege on a table into which values
- are being inserted by <command>COPY FROM</command>.
+ insert privilege on the table into which values
+ are inserted by <command>COPY FROM</command>.
</para>
<para>
@@ -279,8 +280,8 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
End of data can be represented by a single line containing just
backslash-period (<literal>\.</>). An end-of-data marker is
not necessary when reading from a file, since the end of file
- serves perfectly well; but an end marker must be provided when copying
- data to or from a client application.
+ serves perfectly well; it is needed only when copying data to or from
+ client applications using pre-3.0 client protocol.
</para>
<para>
@@ -358,6 +359,9 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
possible to represent a data carriage return by a backslash and carriage
return, and to represent a data newline by a backslash and newline.
However, these representations might not be accepted in future releases.
+ They are also highly vulnerable to corruption if the COPY file is
+ transferred across different machines (for example, from Unix to Windows
+ or vice versa).
</para>
<para>
@@ -374,7 +378,7 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
<para>
The file format used for <command>COPY BINARY</command> changed in
- <application>PostgreSQL</application> 7.1. The new format consists
+ <application>PostgreSQL</application> 7.4. The new format consists
of a file header, zero or more tuples containing the row data, and
a file trailer.
</para>
@@ -383,7 +387,7 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
<title>File Header</title>
<para>
- The file header consists of 24 bytes of fixed fields, followed
+ The file header consists of 15 bytes of fixed fields, followed
by a variable-length header extension area. The fixed fields are:
<variablelist>
@@ -391,7 +395,7 @@ COPY <replaceable class="parameter">table</replaceable> [ ( <replaceable class="
<term>Signature</term>
<listitem>
<para>
-12-byte sequence <literal>PGBCOPY\n\377\r\n\0</> --- note that the zero byte
+11-byte sequence <literal>PGCOPY\n\377\r\n\0</> --- note that the zero byte
is a required part of the signature. (The signature is designed to allow
easy identification of files that have been munged by a non-8-bit-clean
transfer. This signature will be changed by end-of-line-translation
@@ -401,23 +405,13 @@ filters, dropped zero bytes, dropped high bits, or parity changes.)
</varlistentry>
<varlistentry>
- <term>Integer layout field</term>
- <listitem>
- <para>
-32-bit integer constant 0x01020304 in source's byte order. Potentially, a reader
-could engage in byte-flipping of subsequent fields if the wrong byte
-order is detected here.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
<term>Flags field</term>
<listitem>
<para>
-32-bit integer bit mask to denote important aspects of the file format. Bits are
-numbered from 0 (<acronym>LSB</>) to 31 (<acronym>MSB</>) --- note that this field is stored
-with source's endianness, as are all subsequent integer fields. Bits
+32-bit integer bit mask to denote important aspects of the file format. Bits
+are numbered from 0 (<acronym>LSB</>) to 31 (<acronym>MSB</>). Note that
+this field is stored in network byte order (most significant byte first),
+as are all the integer fields used in the file format. Bits
16-31 are reserved to denote critical file format issues; a reader
should abort if it finds an unexpected bit set in this range. Bits 0-15
are reserved to signal backwards-compatible format issues; a reader
@@ -471,72 +465,28 @@ is left for a later release.
<title>Tuples</title>
<para>
Each tuple begins with a 16-bit integer count of the number of fields in the
-tuple. (Presently, all tuples in a table will have the same count, but
-that might not always be true.) Then, repeated for each field in the
-tuple, there is a 16-bit integer <structfield>typlen</> word possibly followed by field data.
-The <structfield>typlen</> field is interpreted thus:
-
- <variablelist>
- <varlistentry>
- <term>Zero</term>
- <listitem>
- <para>
- Field is null. No data follows.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>&gt; 0</term>
- <listitem>
- <para>
- Field is a fixed-length data type. Exactly that many
- bytes of data follow the <structfield>typlen</> word.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>-1</term>
- <listitem>
- <para>
- Field is a <literal>varlena</> data type. The next four
- bytes are the <literal>varlena</> header, which contains
- the total value length including the header itself.
- </para>
- </listitem>
- </varlistentry>
-
- <varlistentry>
- <term>&lt; -1</term>
- <listitem>
- <para>
- Reserved for future use.
- </para>
- </listitem>
- </varlistentry>
- </variablelist>
+tuple. (Presently, all tuples in a table will have the same count, but that
+might not always be true.) Then, repeated for each field in the tuple, there
+is a 32-bit length word followed by that many bytes of field data. (The
+length word does not include itself, and can be zero.) As a special case,
+-1 indicates a NULL field value. No value bytes follow in the NULL case.
</para>
<para>
-For nonnull fields, the reader can check that the <structfield>typlen</> matches the
-expected <structfield>typlen</> for the destination column. This provides a simple
-but very useful check that the data is as expected.
+There is no alignment padding or any other extra data between fields.
</para>
<para>
-There is no alignment padding or any other extra data between fields.
-Note also that the format does not distinguish whether a data type is
-pass-by-reference or pass-by-value. Both of these provisions are
-deliberate: they might help improve portability of the files (although
-of course endianness and floating-point-format issues can still keep
-you from moving a binary file across machines).
+Presently, all data values in a <command>COPY BINARY</command> file are
+assumed to be in binary format (format code one). It is anticipated that a
+future extension may add a header field that allows per-column format codes
+to be specified.
</para>
<para>
If OIDs are included in the file, the OID field immediately follows the
field-count word. It is a normal field except that it's not included
-in the field-count. In particular it has a <structfield>typlen</> --- this will allow
+in the field-count. In particular it has a length word --- this will allow
handling of 4-byte vs. 8-byte OIDs without too much pain, and will allow
OIDs to be shown as null if that ever proves desirable.
</para>
@@ -546,8 +496,8 @@ OIDs to be shown as null if that ever proves desirable.
<title>File Trailer</title>
<para>
- The file trailer consists of an 16-bit integer word containing -1. This is
- easily distinguished from a tuple's field-count word.
+ The file trailer consists of a 16-bit integer word containing -1. This
+ is easily distinguished from a tuple's field-count word.
</para>
<para>
@@ -579,20 +529,23 @@ COPY country FROM '/usr1/proj/bray/sql/country_data';
<para>
Here is a sample of data suitable for copying into a table from
- <literal>STDIN</literal> (so it must have the termination sequence on the
- last line):
+ <literal>STDIN</literal>:
<programlisting>
AF AFGHANISTAN
AL ALBANIA
DZ ALGERIA
ZM ZAMBIA
ZW ZIMBABWE
-\.
</programlisting>
Note that the white space on each line is actually a tab character.
</para>
<para>
+ XXX the following example is OBSOLETE and needs to be updated for the
+ 7.4 binary format:
+ </para>
+
+ <para>
The following is the same data, output in binary format on a
Linux/i586 machine. The data is shown after filtering through the
Unix utility <command>od -c</command>. The table has three columns;