diff options
Diffstat (limited to 'doc/src/sgml/charset.sgml')
-rw-r--r-- | doc/src/sgml/charset.sgml | 60 |
1 files changed, 31 insertions, 29 deletions
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 287cabc33b4..eeef7a22c43 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -1,4 +1,4 @@ -<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.95 2009/05/18 08:59:28 petere Exp $ --> +<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.96 2010/02/03 17:25:05 momjian Exp $ --> <chapter id="charset"> <title>Localization</> @@ -6,8 +6,8 @@ <para> This chapter describes the available localization features from the point of view of the administrator. - <productname>PostgreSQL</productname> supports localization with - two approaches: + <productname>PostgreSQL</productname> supports two localization + facilities: <itemizedlist> <listitem> @@ -67,10 +67,10 @@ initdb --locale=sv_SE (<literal>sv</>) as spoken in Sweden (<literal>SE</>). Other possibilities might be <literal>en_US</> (U.S. English) and <literal>fr_CA</> (French - Canadian). If more than one character set can be useful for a + Canadian). If more than one character set can be used for a locale then the specifications look like this: - <literal>cs_CZ.ISO8859-2</>. What locales are available under what - names on your system depends on what was provided by the operating + <literal>cs_CZ.ISO8859-2</>. What locales are available on your + system under what names depends on what was provided by the operating system vendor and what was installed. On most Unix systems, the command <literal>locale -a</> will provide a list of available locales. Windows uses more verbose locale names, such as <literal>German_Germany</> @@ -80,8 +80,8 @@ initdb --locale=sv_SE <para> Occasionally it is useful to mix rules from several locales, e.g., use English collation rules but Spanish messages. To support that, a - set of locale subcategories exist that control only a certain - aspect of the localization rules: + set of locale subcategories exist that control only certain + aspects of the localization rules: <informaltable> <tgroup cols="2"> @@ -127,13 +127,13 @@ initdb --locale=sv_SE </para> <para> - The nature of some locale categories is that their value has to be + Some locale categories must have their values fixed when the database is created. You can use different settings for different databases, but once a database is created, you cannot change them for that database anymore. <literal>LC_COLLATE</literal> - and <literal>LC_CTYPE</literal> are these categories. They affect + and <literal>LC_CTYPE</literal> are these type of categories. They affect the sort order of indexes, so they must be kept fixed, or indexes on - text columns will become corrupt. The default values for these + text columns would become corrupt. The default values for these categories are determined when <command>initdb</command> is run, and those values are used when new databases are created, unless specified otherwise in the <command>CREATE DATABASE</command> command. @@ -146,7 +146,7 @@ initdb --locale=sv_SE linkend="runtime-config-client-format"> for details). The values that are chosen by <command>initdb</command> are actually only written into the configuration file <filename>postgresql.conf</filename> to - serve as defaults when the server is started. If you delete these + serve as defaults when the server is started. If you disable these assignments from <filename>postgresql.conf</filename> then the server will inherit the settings from its execution environment. </para> @@ -178,7 +178,7 @@ initdb --locale=sv_SE settings for the purpose of setting the language of messages. If in doubt, please refer to the documentation of your operating system, in particular the documentation about - <application>gettext</>, for more information. + <application>gettext</>. </para> </note> @@ -320,8 +320,9 @@ initdb --locale=sv_SE <para> An important restriction, however, is that each database's character set - must be compatible with the database's <envar>LC_CTYPE</> and - <envar>LC_COLLATE</> locale settings. For <literal>C</> or + must be compatible with the database's <envar>LC_CTYPE</> (character + classification) and <envar>LC_COLLATE</> (string sort order) locale + settings. For <literal>C</> or <literal>POSIX</> locale, any character set is allowed, but for other locales there is only one character set that will work correctly. (On Windows, however, UTF-8 encoding can be used with any locale.) @@ -543,7 +544,7 @@ initdb --locale=sv_SE <entry>LATIN1 with Euro and accents</entry> <entry>Yes</entry> <entry>1</entry> - <entry>ISO885915</entry> + <entry><literal>ISO885915</></entry> </row> <row> <entry><literal>LATIN10</literal></entry> @@ -694,7 +695,7 @@ initdb --locale=sv_SE </table> <para> - Not all <acronym>API</>s support all the listed character sets. For example, the + Not all client <acronym>API</>s support all the listed character sets. For example, the <productname>PostgreSQL</> JDBC driver does not support <literal>MULE_INTERNAL</>, <literal>LATIN6</>, <literal>LATIN8</>, and <literal>LATIN10</>. @@ -710,7 +711,7 @@ initdb --locale=sv_SE much a declaration that a specific encoding is in use, as a declaration of ignorance about the encoding. In most cases, if you are working with any non-ASCII data, it is unwise to use the - <literal>SQL_ASCII</> setting, because + <literal>SQL_ASCII</> setting because <productname>PostgreSQL</productname> will be unable to help you by converting or validating non-ASCII characters. </para> @@ -720,17 +721,17 @@ initdb --locale=sv_SE <title>Setting the Character Set</title> <para> - <command>initdb</> defines the default character set + <command>initdb</> defines the default character set (encoding) for a <productname>PostgreSQL</productname> cluster. For example, <screen> initdb -E EUC_JP </screen> - sets the default character set (encoding) to + sets the default character set to <literal>EUC_JP</literal> (Extended Unix Code for Japanese). You can use <option>--encoding</option> instead of - <option>-E</option> if you prefer to type longer option strings. + <option>-E</option> if you prefer longer option strings. If no <option>-E</> or <option>--encoding</option> option is given, <command>initdb</> attempts to determine the appropriate encoding to use based on the specified or default locale. @@ -762,8 +763,8 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE= <para> The encoding for a database is stored in the system catalog <literal>pg_database</literal>. You can see it by using the - <option>-l</option> option or the <command>\l</command> command - of <command>psql</command>. + <command>psql</command> <option>-l</option> option or the + <command>\l</command> command. <screen> $ <userinput>psql -l</userinput> @@ -784,11 +785,11 @@ $ <userinput>psql -l</userinput> <important> <para> On most modern operating systems, <productname>PostgreSQL</productname> - can determine which character set is implied by an <envar>LC_CTYPE</> + can determine which character set is implied by the <envar>LC_CTYPE</> setting, and it will enforce that only the matching database encoding is used. On older systems it is your responsibility to ensure that you use the encoding expected by the locale you have selected. A mistake in - this area is likely to lead to strange misbehavior of locale-dependent + this area is likely to lead to strange behavior of locale-dependent operations such as sorting. </para> @@ -1190,9 +1191,9 @@ RESET client_encoding; <para> If the conversion of a particular character is not possible — suppose you chose <literal>EUC_JP</literal> for the - server and <literal>LATIN1</literal> for the client, then some - Japanese characters do not have a representation in - <literal>LATIN1</literal> — then an error is reported. + server and <literal>LATIN1</literal> for the client, and some + Japanese characters are returned that do not have a representation in + <literal>LATIN1</literal> — an error is reported. </para> <para> @@ -1249,7 +1250,8 @@ RESET client_encoding; <listitem> <para> - <acronym>UTF</acronym>-8 is defined here. + <acronym>UTF</acronym>-8 (8-bit UCS/Unicode Transformation + Format) is defined here. </para> </listitem> </varlistentry> |