diff options
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/admin.sgml | 4 | ||||
-rw-r--r-- | doc/src/sgml/charset.sgml | 310 | ||||
-rw-r--r-- | doc/src/sgml/installation.sgml | 12 | ||||
-rw-r--r-- | doc/src/sgml/postgres.sgml | 4 | ||||
-rw-r--r-- | doc/src/sgml/runtime.sgml | 122 |
5 files changed, 299 insertions, 153 deletions
diff --git a/doc/src/sgml/admin.sgml b/doc/src/sgml/admin.sgml index 3fa9da921af..de2a85ab61a 100644 --- a/doc/src/sgml/admin.sgml +++ b/doc/src/sgml/admin.sgml @@ -1,5 +1,5 @@ <!-- -$Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.26 2000/09/12 05:37:07 thomas Exp $ +$Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.27 2000/09/30 16:58:20 petere Exp $ Postgres Administrator's Guide. Derived from postgres.sgml. @@ -98,9 +98,9 @@ Derived from postgres.sgml. &intro-ag; &installation; &installw; - &charset; &runtime; &client-auth; + &charset; &manage-ag; &user-manag; &backup; diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 7f2d4f73a17..cb76535c1c6 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -1,44 +1,235 @@ - <chapter id="charset"> - <title>Character Sets</title> +<!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.3 2000/09/30 16:58:20 petere Exp $ --> - <abstract> - <para> - Describes the available language and character set support in - <productname>Postgres</productname>. - </para> - </abstract> +<chapter id="charset"> + <title>Localization</> + + <abstract> + <para> + Describes the available localization features from the point of + view of the administrator. + </para> + </abstract> <para> - <productname>Postgres</productname> supports non-ASCII character - sets with two approaches: + <productname>Postgres</productname> supports localization with + three approaches: <itemizedlist> <listitem> <para> - Using locale features in underlying - system libraries. This allows single-byte character sets to be - configured with a locale-specific collation order, provided that - the underlying system supports the required locale. This - technique supports only one character set per server, and can - not support multi-byte character sets. + Using the locale features of the operating system to provide + locale-specific collation order, number formatting, and other + aspects. </para> </listitem> <listitem> <para> Using explicit multiple-byte character sets defined in the - <productname>Postgres</productname> server. These character sets - are also known to some client libraries. The number of character - sets is fixed at the time the server is compiled, and internal - operations such as string comparisons require expansion of each - character into a 32-bit word. + <productname>Postgres</productname> server to support languages + that require more characters than will fit into a single byte, + and to provide character set recoding between client and server. + The number of supported character sets is fixed at the time the + server is compiled, and internal operations such as string + comparisons require expansion of each character into a 32-bit + word. + </para> + </listitem> + + <listitem> + <para> + Single byte character recoding provides a more light-weight + solution for users of multiple, yet single-byte character sets. </para> </listitem> </itemizedlist> </para> + + <sect1 id="locale"> + <title>Locale Support</title> + + <para> + <firstterm>Locale</> support refers to an application respecting + cultural preferences regarding alphabets, sorting, number + formatting, etc. <productname>PostgreSQL</> uses the standard ISO + C and POSIX-like locale facilities provided by the server operating + system. For additional information refer the documentation of your + system. + </para> + + <sect2> + <title>Overview</> + + <para> + Locale support is not build into <productname>PostgreSQL</> by + default; to enable it, supply the <option>--enable-locale</> option + to the <filename>configure</> script: +<informalexample> +<screen> +<prompt>$ </><userinput>./configure --enable-locale</> +</screen> +</informalexample> + Locale support only affects the server; all clients are compatible + with servers with or without locale support. + </para> + + <para> + The information about which particular cultural rules to use is + determined by standard environment variables. If you are getting + localized behavior from other programs you probably have them set + up already. The simplest way to set the localization information + is the <envar>LANG</> variable, for example: +<programlisting> +export LANG=sv_SE +</programlisting> + This sets the locale to Swedish (<literal>sv</>) as spoken in + Sweden (<literal>SE</>). Other possibilities might be + <literal>en_US</> (U.S. English) and <literal>fr_CA</> (Canada, + French). If more than one character set can be useful for a locale + then the specifications look like this: + <literal>cs_CZ.ISO8859-2</>. What locales are available under what + names on your system depends on what was provided by the operating + system vendor and what was installed. + </para> + + <para> + Occasionally it is useful to mix rules from several locales, e.g., + use U.S. rules but Spanish messages. To do that a set of + environment variables exist that override the default of + <envar>LANG</> for a particular category: + + <informaltable> + <tgroup cols="2"> + <tbody> + <row> + <entry>LC_COLLATE</> + <entry>String sort order</> + </row> + <row> + <entry>LC_CTYPE</> + <entry>Character classification (What is a letter? What is the upper-case equivalent of this letter?)</> + </row> + <row> + <entry>LC_MESSAGES</> + <entry>Language of messages</> + </row> + <row> + <entry>LC_MONETARY</> + <entry>Formatting of currency amounts</> + </row> + <row> + <entry>LC_NUMERIC</> + <entry>Formatting of numbers</> + </row> + <row> + <entry>LC_TIME</> + <entry>Formatting of dates and times</> + </row> + </tbody> + </tgroup> + </informaltable> + + <envar>LC_MESSAGES</> only affects the messages that come from the + operating system, not <productname>PostgreSQL</>. + </para> + + <para> + If you want the system to behave as if it had no locale support, + use the special locale <literal>C</> or <literal>POSIX</>, or + simply unset all locale related variables. + </para> + + <para> + Once you have chosen a set of localization rules this way you must + keep them fixed for any particular database cluster. That means + that the locales that were active when you ran <filename>initdb</> + must be kept the same when you start the postmaster. Otherwise, + the changed sort order can corrupt indexes or make your data + disappear mysteriously. It is currently not possible to change the + locales after database initialization or to use more than one set + of locales for a given database cluster. + </para> + </sect2> + + <sect2> + <title>Benefits</> + + <para> + Locale support influences in particular the following features: + + <itemizedlist> + <listitem> + <para> + Sort order in <command>ORDER BY</> queries. + </para> + </listitem> + + <listitem> + <para> + The <function>to_char</> family of functions + </para> + </listitem> + + <listitem> + <para> + The <literal>LIKE</> and <literal>~</> operators for pattern + matching + </para> + </listitem> + </itemizedlist> + </para> + + <para> + The only severe drawback of using the locale support in + <productname>PostgreSQL</> is its speed. So use locale only if you + actually need it. + </para> + </sect2> + + <sect2> + <title>Problems</> + + <para> + If locale support doesn't work in spite of the explanation above, + check that the locale support in your operating system is okay. + To check whether a given locale is installed and functional you + can use <application>Perl</>, for example. Perl has also support + for locales and if a locale is broken <command>perl -v</> will + complain something like this: +<screen> +<prompt>$</> <userinput>export LC_CTYPE='not_exist'</> +<prompt>$</> <userinput>perl -v</> +<computeroutput> +perl: warning: Setting locale failed. +perl: warning: Please check that your locale settings: +LC_ALL = (unset), +LC_CTYPE = "not_exist", +LANG = (unset) +are supported and installed on your system. +perl: warning: Falling back to the standard locale ("C"). +</computeroutput> +</screen> + </para> + + <para> + Check that your locale files are in the right location. Possible + locations include: <filename>/usr/lib/locale</filename> (Linux, + Solaris), <filename>/usr/share/locale</filename> (Linux), + <filename>/usr/lib/nls/loc</filename> (DUX 4.0). Check the locale + man page of your system if you are not sure. + </para> + + <para> + The directory <filename>src/test/locale</> contains a test suite + for <productname>PostgreSQL</>'s locale support. + </para> + </sect2> + </sect1> + + <sect1 id="multibyte"> - <title>Multi-byte Support</title> + <title>Multibyte Support</title> <note> <title>Author</title> @@ -53,7 +244,7 @@ </note> <para> - Multi-byte (<acronym>MB</acronym>) support is intended to allow + Multibyte (<acronym>MB</acronym>) support is intended to allow <productname>Postgres</productname> to handle multiple-byte character sets such as EUC (Extended Unix Code), Unicode and Mule internal code. With <acronym>MB</acronym> enabled you can use multi-byte @@ -680,7 +871,78 @@ SET CLIENT_ENCODING = 'WIN1250'; </procedure> </sect2> </sect1> - </chapter> + + + <sect1 id="recode"> + <title>Single-byte character set recoding</> +<!-- formerly in README.charsets, by Josef Balatka, <balatka@email.cz> --> + + <para> + You can set up this feature with the <option>--enable-recode</> option + to <filename>configure</>. This option was formerly described as + <quote>Cyrillic recode support</> which doesn't express all its + power. It can be used for <emphasis>any</> single-byte character + set recoding. + </para> + + <para> + This method uses a file <filename>charset.conf</> file located in + the database directory (<envar>PGDATA</>). It's a typical + configuration text file where spaces and newlines separate items + and records and # specifies comments. Three keywords with the + following syntax are recognized here: +<synopsis> +BaseCharset <replaceable>server_charset</> +RecodeTable <replaceable>from_charset</> <replaceable>to_charset</> <replaceable>file_name</> +HostCharset <replaceable>host_spec</> <replaceable>host_charset</> +</synopsis> + </para> + + <para> + <token>BaseCharset</> defines the encoding of the database server. + All character set names are only used for mapping inside of + <filename>charset.conf</> so you can freely use typing-friendly + names. + </para> + + <para> + <token>RecodeTable</> records specify translation tables between + server and client. The file name is relative to the + <envar>PGDATA</> directory. The table file format is very + simple. There are no keywords and characters are represented by a + pair of decimal or hexadecimal (0x prefixed) values on single + lines: +<synopsis> +<replaceable>char_value</> <replaceable>translated_char_value</> +</synopsis> + </para> + + <para> + <token>HostCharset</> records define the client character set by IP + address. You can use a single IP address, an IP mask range starting + from the given address or an IP interval (e.g., 127.0.0.1, + 192.168.1.100/24, 192.168.1.20-192.168.1.40). + </para> + + <para> + The <filename>charset.conf</> file is always processed up to the + end, so you can easily specify exceptions from the previous + rules. In the src/data you will find charset.conf example and a few + recoding tables. + </para> + + <para> + As this solution is based on the client's IP address and character + set mapping there are obviously some restrictions as well. You + cannot use different encodings on the same host at the same + time. It is also inconvenient when you boot your client hosts into + more operating systems. Nevertheless, when these restrictions are + not limiting and you do not need multi-byte characters than it is a + simple and effective solution. + </para> + </sect1> + +</chapter> <!-- Keep this comment at the end of the file Local variables: diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index 87797543853..e637969f360 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -1,4 +1,4 @@ -<!-- $Header: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v 1.21 2000/09/29 20:21:34 petere Exp $ --> +<!-- $Header: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v 1.22 2000/09/30 16:58:20 petere Exp $ --> <chapter id="installation"> <title><![%flattext-install-include[<productname>PostgreSQL</> ]]>Installation Instructions</title> @@ -447,8 +447,9 @@ su - postgres <term>--enable-recode</term> <listitem> <para> - Enables character set recode support. See - <filename>doc/README.Charsets</> for details on this feature. + Enables single-byte character set recode support. See + <![%flattext-install-include[the <citetitle>Administrator's Guide</citetitle>]]> + <![%flattext-install-ignore[<xref linkend="recode">]]> about this feature. </para> </listitem> </varlistentry> @@ -459,7 +460,10 @@ su - postgres <para> Allows the use of multibyte character encodings. This is primarily for languages like Japanese, Korean, and Chinese. - Read <filename>doc/README.mb</> for details. + Read + <![%flattext-install-include[the <citetitle>Administrator's Guide</citetitle>]]> + <![%flattext-install-ignore[<xref linkend="multibyte">]]> + for details. </para> </listitem> </varlistentry> diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index 19f93c5aae3..f3fa3912d4f 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -1,5 +1,5 @@ <!-- -$Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.41 2000/09/12 05:37:09 thomas Exp $ +$Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.42 2000/09/30 16:58:20 petere Exp $ --> <!doctype set PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [ @@ -173,9 +173,9 @@ $Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.41 2000/09/12 05:37:09 th --> &installation; &installw; - &charset; &runtime; &client-auth; + &charset; &manage-ag; &user-manag; &backup; diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index 5ae49bfdd5b..b27b13294af 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -1,5 +1,5 @@ <!-- -$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.25 2000/09/29 20:21:34 petere Exp $ +$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.26 2000/09/30 16:58:20 petere Exp $ --> <Chapter Id="runtime"> @@ -1553,126 +1553,6 @@ set semsys:seminfo_semmsl=32 </sect1> - <sect1 id="locale"> - <title>Locale Support</title> - - <note> - <title>Acknowledgement</title> - <para> - Written by Oleg Bartunov. See <ulink - url="http://www.sai.msu.su/~megera/postgres/">Oleg's web - page</ulink> for additional information on locale and Russian - language support. - </para> - </note> - - <para> - While doing a project for a company in Moscow, Russia, I - encountered the problem that <productname>Postgres</> had no - support of national alphabets. After looking for possible - workarounds I decided to develop support of locale myself. I'm not - a C programmer but already had some experience with locale - programming when I work with <productname>Perl</> (debugging) and - <productname>Glimpse</>. After several days of digging through the - <productname>Postgres</> source tree I made very minor corections - to <filename>src/backend/utils/adt/varlena.c</> and - <filename>src/backend/main/main.c</> and got what I needed! I did - support only for <envar>LC_CTYPE</envar> and - <envar>LC_COLLATE</envar>, but later <envar>LC_MONETARY</envar> was - added by others. I got many messages from people about this patch - so I decided to send it to developers and (to my surprise) it was - incorporated into the <productname>Postgres</> distribution. - </para> - - <para> - People often complain that locale doesn't work for them. There are - several common mistakes: - - <itemizedlist> - <listitem> - <para> - Didn't properly configure <productname>Postgres</> before - compilation. You must run <filename>configure</> with the - <option>--enable-locale</> option to enable locale support. - </para> - </listitem> - - <listitem> - <para> - Didn't setup environment correctly when starting postmaster. You - must define environment variables <envar>LC_CTYPE</envar> and - <envar>LC_COLLATE</envar> before running postmaster because - backend gets information about locale from environment. I use - following shell script: -<programlisting> -#!/bin/sh - -export LC_CTYPE=koi8-r -export LC_COLLATE=koi8-r -postmaster -B 1024 -S -D/usr/local/pgsql/data/ -o '-Fe' -</programlisting> - </para> - </listitem> - - <listitem> - <para> - Broken locale support in the operating system (for example, - locale support in libc under Linux several times has changed and - this caused a lot of problems). Perl has also support of locale - and if locale is broken <command>perl -v</> will complain - something like: -<screen> -<prompt>$</> <userinput>export LC_CTYPE='not_exist'</> -<prompt>$</> <userinput>perl -v</> -<computeroutput> -perl: warning: Setting locale failed. -perl: warning: Please check that your locale settings: -LC_ALL = (unset), -LC_CTYPE = "not_exist", -LANG = (unset) -are supported and installed on your system. -perl: warning: Falling back to the standard locale ("C"). -</computeroutput> -</screen> - </para> - </listitem> - - <listitem> - <para> - Wrong location of locale files. Possible locations include: - <filename>/usr/lib/locale</filename> (Linux, Solaris), - <filename>/usr/share/locale</filename> (Linux), - <filename>/usr/lib/nls/loc</filename> (DUX 4.0). - - Check <command>man locale</command> to find the correct - location. Under Linux I made a symbolic link between - <filename>/usr/lib/locale</filename> and - <filename>/usr/share/locale</filename> to be sure that the next - libc will not break my locale. - </para> - </listitem> - </itemizedlist> - </para> - - <formalpara> - <title>What are the Benefits?</title> - <para> - You can use ~* and order by operators for strings contain - characters from national alphabets. Non-english users definitely - need that. - </para> - </formalpara> - - <formalpara> - <title>What are the Drawbacks?</title> - <para> - There is one evident drawback of using locale - its speed! So, use - locale only if you really need it. - </para> - </formalpara> - </sect1> - - <sect1 id="postmaster-shutdown"> <title>Shutting down the server</title> |