aboutsummaryrefslogtreecommitdiff
path: root/doc/src/sgml/charset.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/charset.sgml')
-rw-r--r--doc/src/sgml/charset.sgml60
1 files changed, 31 insertions, 29 deletions
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 287cabc33b4..eeef7a22c43 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.95 2009/05/18 08:59:28 petere Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.96 2010/02/03 17:25:05 momjian Exp $ -->
<chapter id="charset">
<title>Localization</>
@@ -6,8 +6,8 @@
<para>
This chapter describes the available localization features from the
point of view of the administrator.
- <productname>PostgreSQL</productname> supports localization with
- two approaches:
+ <productname>PostgreSQL</productname> supports two localization
+ facilities:
<itemizedlist>
<listitem>
@@ -67,10 +67,10 @@ initdb --locale=sv_SE
(<literal>sv</>) as spoken
in Sweden (<literal>SE</>). Other possibilities might be
<literal>en_US</> (U.S. English) and <literal>fr_CA</> (French
- Canadian). If more than one character set can be useful for a
+ Canadian). If more than one character set can be used for a
locale then the specifications look like this:
- <literal>cs_CZ.ISO8859-2</>. What locales are available under what
- names on your system depends on what was provided by the operating
+ <literal>cs_CZ.ISO8859-2</>. What locales are available on your
+ system under what names depends on what was provided by the operating
system vendor and what was installed. On most Unix systems, the command
<literal>locale -a</> will provide a list of available locales.
Windows uses more verbose locale names, such as <literal>German_Germany</>
@@ -80,8 +80,8 @@ initdb --locale=sv_SE
<para>
Occasionally it is useful to mix rules from several locales, e.g.,
use English collation rules but Spanish messages. To support that, a
- set of locale subcategories exist that control only a certain
- aspect of the localization rules:
+ set of locale subcategories exist that control only certain
+ aspects of the localization rules:
<informaltable>
<tgroup cols="2">
@@ -127,13 +127,13 @@ initdb --locale=sv_SE
</para>
<para>
- The nature of some locale categories is that their value has to be
+ Some locale categories must have their values
fixed when the database is created. You can use different settings
for different databases, but once a database is created, you cannot
change them for that database anymore. <literal>LC_COLLATE</literal>
- and <literal>LC_CTYPE</literal> are these categories. They affect
+ and <literal>LC_CTYPE</literal> are these type of categories. They affect
the sort order of indexes, so they must be kept fixed, or indexes on
- text columns will become corrupt. The default values for these
+ text columns would become corrupt. The default values for these
categories are determined when <command>initdb</command> is run, and
those values are used when new databases are created, unless
specified otherwise in the <command>CREATE DATABASE</command> command.
@@ -146,7 +146,7 @@ initdb --locale=sv_SE
linkend="runtime-config-client-format"> for details). The values
that are chosen by <command>initdb</command> are actually only written
into the configuration file <filename>postgresql.conf</filename> to
- serve as defaults when the server is started. If you delete these
+ serve as defaults when the server is started. If you disable these
assignments from <filename>postgresql.conf</filename> then the
server will inherit the settings from its execution environment.
</para>
@@ -178,7 +178,7 @@ initdb --locale=sv_SE
settings for the purpose of setting the language of messages. If
in doubt, please refer to the documentation of your operating
system, in particular the documentation about
- <application>gettext</>, for more information.
+ <application>gettext</>.
</para>
</note>
@@ -320,8 +320,9 @@ initdb --locale=sv_SE
<para>
An important restriction, however, is that each database's character set
- must be compatible with the database's <envar>LC_CTYPE</> and
- <envar>LC_COLLATE</> locale settings. For <literal>C</> or
+ must be compatible with the database's <envar>LC_CTYPE</> (character
+ classification) and <envar>LC_COLLATE</> (string sort order) locale
+ settings. For <literal>C</> or
<literal>POSIX</> locale, any character set is allowed, but for other
locales there is only one character set that will work correctly.
(On Windows, however, UTF-8 encoding can be used with any locale.)
@@ -543,7 +544,7 @@ initdb --locale=sv_SE
<entry>LATIN1 with Euro and accents</entry>
<entry>Yes</entry>
<entry>1</entry>
- <entry>ISO885915</entry>
+ <entry><literal>ISO885915</></entry>
</row>
<row>
<entry><literal>LATIN10</literal></entry>
@@ -694,7 +695,7 @@ initdb --locale=sv_SE
</table>
<para>
- Not all <acronym>API</>s support all the listed character sets. For example, the
+ Not all client <acronym>API</>s support all the listed character sets. For example, the
<productname>PostgreSQL</>
JDBC driver does not support <literal>MULE_INTERNAL</>, <literal>LATIN6</>,
<literal>LATIN8</>, and <literal>LATIN10</>.
@@ -710,7 +711,7 @@ initdb --locale=sv_SE
much a declaration that a specific encoding is in use, as a declaration
of ignorance about the encoding. In most cases, if you are
working with any non-ASCII data, it is unwise to use the
- <literal>SQL_ASCII</> setting, because
+ <literal>SQL_ASCII</> setting because
<productname>PostgreSQL</productname> will be unable to help you by
converting or validating non-ASCII characters.
</para>
@@ -720,17 +721,17 @@ initdb --locale=sv_SE
<title>Setting the Character Set</title>
<para>
- <command>initdb</> defines the default character set
+ <command>initdb</> defines the default character set (encoding)
for a <productname>PostgreSQL</productname> cluster. For example,
<screen>
initdb -E EUC_JP
</screen>
- sets the default character set (encoding) to
+ sets the default character set to
<literal>EUC_JP</literal> (Extended Unix Code for Japanese). You
can use <option>--encoding</option> instead of
- <option>-E</option> if you prefer to type longer option strings.
+ <option>-E</option> if you prefer longer option strings.
If no <option>-E</> or <option>--encoding</option> option is
given, <command>initdb</> attempts to determine the appropriate
encoding to use based on the specified or default locale.
@@ -762,8 +763,8 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE=
<para>
The encoding for a database is stored in the system catalog
<literal>pg_database</literal>. You can see it by using the
- <option>-l</option> option or the <command>\l</command> command
- of <command>psql</command>.
+ <command>psql</command> <option>-l</option> option or the
+ <command>\l</command> command.
<screen>
$ <userinput>psql -l</userinput>
@@ -784,11 +785,11 @@ $ <userinput>psql -l</userinput>
<important>
<para>
On most modern operating systems, <productname>PostgreSQL</productname>
- can determine which character set is implied by an <envar>LC_CTYPE</>
+ can determine which character set is implied by the <envar>LC_CTYPE</>
setting, and it will enforce that only the matching database encoding is
used. On older systems it is your responsibility to ensure that you use
the encoding expected by the locale you have selected. A mistake in
- this area is likely to lead to strange misbehavior of locale-dependent
+ this area is likely to lead to strange behavior of locale-dependent
operations such as sorting.
</para>
@@ -1190,9 +1191,9 @@ RESET client_encoding;
<para>
If the conversion of a particular character is not possible
&mdash; suppose you chose <literal>EUC_JP</literal> for the
- server and <literal>LATIN1</literal> for the client, then some
- Japanese characters do not have a representation in
- <literal>LATIN1</literal> &mdash; then an error is reported.
+ server and <literal>LATIN1</literal> for the client, and some
+ Japanese characters are returned that do not have a representation in
+ <literal>LATIN1</literal> &mdash; an error is reported.
</para>
<para>
@@ -1249,7 +1250,8 @@ RESET client_encoding;
<listitem>
<para>
- <acronym>UTF</acronym>-8 is defined here.
+ <acronym>UTF</acronym>-8 (8-bit UCS/Unicode Transformation
+ Format) is defined here.
</para>
</listitem>
</varlistentry>