diff options
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/catalogs.sgml | 16 | ||||
-rw-r--r-- | doc/src/sgml/charset.sgml | 73 | ||||
-rw-r--r-- | doc/src/sgml/indices.sgml | 6 | ||||
-rw-r--r-- | doc/src/sgml/ref/create_database.sgml | 45 | ||||
-rw-r--r-- | doc/src/sgml/ref/initdb.sgml | 41 | ||||
-rw-r--r-- | doc/src/sgml/ref/pg_controldata.sgml | 4 | ||||
-rw-r--r-- | doc/src/sgml/ref/pg_resetxlog.sgml | 14 | ||||
-rw-r--r-- | doc/src/sgml/ref/select.sgml | 5 | ||||
-rw-r--r-- | doc/src/sgml/ref/show.sgml | 10 | ||||
-rw-r--r-- | doc/src/sgml/runtime.sgml | 13 | ||||
-rw-r--r-- | doc/src/sgml/textsearch.sgml | 4 |
11 files changed, 137 insertions, 94 deletions
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 40f1ce568ed..bf1ac314f73 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -1,4 +1,4 @@ -<!-- $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.175 2008/09/19 19:03:40 tgl Exp $ --> +<!-- $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.176 2008/09/23 09:20:33 heikki Exp $ --> <!-- Documentation of the system catalogs, directed toward PostgreSQL developers --> @@ -2150,6 +2150,20 @@ </row> <row> + <entry><structfield>datcollate</structfield></entry> + <entry><type>name</type></entry> + <entry></entry> + <entry>LC_COLLATE for this database</entry> + </row> + + <row> + <entry><structfield>datctype</structfield></entry> + <entry><type>name</type></entry> + <entry></entry> + <entry>LC_CTYPE for this database</entry> + </row> + + <row> <entry><structfield>datistemplate</structfield></entry> <entry><type>bool</type></entry> <entry></entry> diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 1f4866b203c..c012294ef81 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -1,4 +1,4 @@ -<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.87 2008/07/15 17:45:03 momjian Exp $ --> +<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.88 2008/09/23 09:20:34 heikki Exp $ --> <chapter id="charset"> <title>Localization</> @@ -130,23 +130,23 @@ initdb --locale=sv_SE <para> The nature of some locale categories is that their value has to be - fixed for the lifetime of a database cluster. That is, once - <command>initdb</command> has run, you cannot change them anymore. - <literal>LC_COLLATE</literal> and <literal>LC_CTYPE</literal> are - those categories. They affect the sort order of indexes, so they - must be kept fixed, or indexes on text columns will become corrupt. - <productname>PostgreSQL</productname> enforces this by recording - the values of <envar>LC_COLLATE</> and <envar>LC_CTYPE</> that are - seen by <command>initdb</>. The server automatically adopts - those two values when it is started. + fixed when the database is created. You can use different settings + for different databases, but once a database is created, you cannot + change them for that database anymore. <literal>LC_COLLATE</literal> + and <literal>LC_CTYPE</literal> are those categories. They affect + the sort order of indexes, so they must be kept fixed, or indexes on + text columns will become corrupt. The default values for these + categories are defined when <command>initdb</command> is run, and + those values are used when new databases are created, unless + specified otherwise in the <command>CREATE DATABASE</command> command. </para> <para> The other locale categories can be changed as desired whenever the server is running by setting the run-time configuration variables that have the same name as the locale categories (see <xref - linkend="runtime-config-client-format"> for details). The defaults that are - chosen by <command>initdb</command> are actually only written into + linkend="runtime-config-client-format"> for details). The defaults + that are chosen by <command>initdb</command> are actually only written into the configuration file <filename>postgresql.conf</filename> to serve as defaults when the server is started. If you delete these assignments from <filename>postgresql.conf</filename> then the @@ -261,7 +261,7 @@ initdb --locale=sv_SE <para> Check that <productname>PostgreSQL</> is actually using the locale - that you think it is. <envar>LC_COLLATE</> and <envar>LC_CTYPE</> + that you think it is. The default <envar>LC_COLLATE</> and <envar>LC_CTYPE</> settings are determined at <command>initdb</> time and cannot be changed without repeating <command>initdb</>. Other locale settings including <envar>LC_MESSAGES</> and <envar>LC_MONETARY</> @@ -319,17 +319,11 @@ initdb --locale=sv_SE </para> <para> - An important restriction, however, is that each database character set - must be compatible with the server's <envar>LC_CTYPE</> setting. + An important restriction, however, is that each database's character set + must be compatible with the database's <envar>LC_CTYPE</> setting. When <envar>LC_CTYPE</> is <literal>C</> or <literal>POSIX</>, any character set is allowed, but for other settings of <envar>LC_CTYPE</> there is only one character set that will work correctly. - Since the <envar>LC_CTYPE</> setting is frozen by <command>initdb</>, the - apparent flexibility to use different encodings in different databases - of a cluster is more theoretical than real, except when you select - <literal>C</> or <literal>POSIX</> locale (thus disabling any real locale - awareness). It is likely that these mechanisms will be revisited in future - versions of <productname>PostgreSQL</productname>. </para> <sect2 id="multibyte-charset-supported"> @@ -734,19 +728,19 @@ initdb -E EUC_JP </para> <para> - If you have selected <literal>C</> or <literal>POSIX</> locale, - you can create a database with a different character set: + You can specify a non-default encoding at database creation time, + provided that the encoding is compatible with the selected locale: <screen> -createdb -E EUC_KR korean +createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean </screen> This will create a database named <literal>korean</literal> that - uses the character set <literal>EUC_KR</literal>. Another way to - accomplish this is to use this SQL command: + uses the character set <literal>EUC_KR</literal>, and locale <literal>ko_KR</literal>. + Another way to accomplish this is to use this SQL command: <programlisting> -CREATE DATABASE korean WITH ENCODING 'EUC_KR'; +CREATE DATABASE korean WITH ENCODING 'EUC_KR' COLLATE='ko_KR.euckr' CTYPE='ko_KR.euckr' TEMPLATE=template0; </programlisting> The encoding for a database is stored in the system catalog @@ -756,20 +750,17 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR'; <screen> $ <userinput>psql -l</userinput> - List of databases - Database | Owner | Encoding ----------------+---------+--------------- - euc_cn | t-ishii | EUC_CN - euc_jp | t-ishii | EUC_JP - euc_kr | t-ishii | EUC_KR - euc_tw | t-ishii | EUC_TW - mule_internal | t-ishii | MULE_INTERNAL - postgres | t-ishii | EUC_JP - regression | t-ishii | SQL_ASCII - template1 | t-ishii | EUC_JP - test | t-ishii | EUC_JP - utf8 | t-ishii | UTF8 -(9 rows) + List of databases + Name | Owner | Encoding | Collation | Ctype | Access Privileges +-----------+----------+-----------+-------------+-------------+------------------------------------- + clocaledb | hlinnaka | SQL_ASCII | C | C | + englishdb | hlinnaka | UTF8 | en_GB.UTF8 | en_GB.UTF8 | + japanese | hlinnaka | UTF8 | ja_JP.UTF8 | ja_JP.UTF8 | + korean | hlinnaka | EUC_KR | ko_KR.euckr | ko_KR.euckr | + postgres | hlinnaka | UTF8 | fi_FI.UTF8 | fi_FI.UTF8 | + template0 | hlinnaka | UTF8 | fi_FI.UTF8 | fi_FI.UTF8 | {=c/hlinnaka,hlinnaka=CTc/hlinnaka} + template1 | hlinnaka | UTF8 | fi_FI.UTF8 | fi_FI.UTF8 | {=c/hlinnaka,hlinnaka=CTc/hlinnaka} +(7 rows) </screen> </para> diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml index 2ab713c39be..0993a8be03f 100644 --- a/doc/src/sgml/indices.sgml +++ b/doc/src/sgml/indices.sgml @@ -1,4 +1,4 @@ -<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.74 2008/07/11 21:06:28 tgl Exp $ --> +<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.75 2008/09/23 09:20:34 heikki Exp $ --> <chapter id="indexes"> <title id="indexes-title">Indexes</title> @@ -157,7 +157,7 @@ CREATE INDEX test1_id_index ON test1 (id); <emphasis>if</emphasis> the pattern is a constant and is anchored to the beginning of the string — for example, <literal>col LIKE 'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not - <literal>col LIKE '%bar'</literal>. However, if your server does not + <literal>col LIKE '%bar'</literal>. However, if your database does not use the C locale you will need to create the index with a special operator class to support indexing of pattern-matching queries. See <xref linkend="indexes-opclass"> below. It is also possible to use @@ -922,7 +922,7 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> according to the locale-specific collation rules. This makes these operator classes suitable for use by queries involving pattern matching expressions (<literal>LIKE</literal> or POSIX - regular expressions) when the server does not use the standard + regular expressions) when the database does not use the standard <quote>C</quote> locale. As an example, you might index a <type>varchar</type> column like this: <programlisting> diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml index b1b13332456..5e72768981c 100644 --- a/doc/src/sgml/ref/create_database.sgml +++ b/doc/src/sgml/ref/create_database.sgml @@ -1,5 +1,5 @@ <!-- -$PostgreSQL: pgsql/doc/src/sgml/ref/create_database.sgml,v 1.48 2007/09/28 22:25:49 tgl Exp $ +$PostgreSQL: pgsql/doc/src/sgml/ref/create_database.sgml,v 1.49 2008/09/23 09:20:34 heikki Exp $ PostgreSQL documentation --> @@ -24,6 +24,8 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable> [ [ WITH ] [ OWNER [=] <replaceable class="parameter">dbowner</replaceable> ] [ TEMPLATE [=] <replaceable class="parameter">template</replaceable> ] [ ENCODING [=] <replaceable class="parameter">encoding</replaceable> ] + [ COLLATE [=] <replaceable class="parameter">collate</replaceable> ] + [ CTYPE [=] <replaceable class="parameter">ctype</replaceable> ] [ TABLESPACE [=] <replaceable class="parameter">tablespace</replaceable> ] [ CONNECTION LIMIT [=] <replaceable class="parameter">connlimit</replaceable> ] ] </synopsis> @@ -113,6 +115,29 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable> </listitem> </varlistentry> <varlistentry> + <term><replaceable class="parameter">collate</replaceable></term> + <listitem> + <para> + Collation order (<literal>LC_COLLATE</>) to use in the new database. + This affects the sort order applied to strings, e.g in queries with + ORDER BY, as well as the order used in indexes on text columns. + The default is to use the collation order of the template database. + See below for additional restrictions. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><replaceable class="parameter">ctype</replaceable></term> + <listitem> + <para> + Character classification (<literal>LC_CTYPE</>) to use in the new + database. This affects the categorization of characters, e.g. lower, + upper and digit. The default is to use the character classification of + the template database. See below for additional restrictions. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><replaceable class="parameter">tablespace</replaceable></term> <listitem> <para> @@ -180,13 +205,11 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable> </para> <para> - Any character set encoding specified for the new database must be - compatible with the server's <envar>LC_CTYPE</> locale setting. + The character set encoding specified for the new database must be + compatible with the chosen COLLATE and CTYPE settings. If <envar>LC_CTYPE</> is <literal>C</> (or equivalently <literal>POSIX</>), then all encodings are allowed, but for other - locale settings there is only one encoding that will work properly, - and so the apparent freedom to specify an encoding is illusory if - you didn't initialize the database cluster in <literal>C</> locale. + locale settings there is only one encoding that will work properly. <command>CREATE DATABASE</> will allow superusers to specify <literal>SQL_ASCII</> encoding regardless of the locale setting, but this choice is deprecated and may result in misbehavior of @@ -195,6 +218,16 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable> </para> <para> + The <literal>COLLATE</> and <literal>CTYPE</> settings must match + those of the template database, except when template0 is used as + template. This is because <literal>COLLATE</> and <literal>CTYPE</> + affects the ordering in indexes, so that any indexes copied from the + template database would be invalid in the new database with different + settings. <literal>template0</literal>, however, is known to not + contain any indexes that would be affected. + </para> + + <para> The <literal>CONNECTION LIMIT</> option is only enforced approximately; if two new sessions start at about the same time when just one connection <quote>slot</> remains for the database, it is possible that diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index 312da7085a9..110c21eb8c5 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -1,5 +1,5 @@ <!-- -$PostgreSQL: pgsql/doc/src/sgml/ref/initdb.sgml,v 1.43 2007/03/26 17:23:36 tgl Exp $ +$PostgreSQL: pgsql/doc/src/sgml/ref/initdb.sgml,v 1.44 2008/09/23 09:20:34 heikki Exp $ PostgreSQL documentation --> @@ -76,25 +76,34 @@ PostgreSQL documentation <para> <command>initdb</command> initializes the database cluster's default - locale and character set encoding. The collation order - (<literal>LC_COLLATE</>) and character set classes - (<literal>LC_CTYPE</>, e.g. upper, lower, digit) are fixed for all - databases and cannot be changed. Collation orders other than - <literal>C</> or <literal>POSIX</> also have a performance penalty. - For these reasons it is important to choose the right locale when - running <command>initdb</command>. The remaining locale categories - can be changed later when the server is started. All server locale - values (<literal>lc_*</>) can be displayed via <command>SHOW ALL</>. + locale and character set encoding. The character set encoding, + collation order (<literal>LC_COLLATE</>) and character set classes + (<literal>LC_CTYPE</>, e.g. upper, lower, digit) can be set separately + for a database when it is created. <command>initdb</command> determines + those settings for the <literal>template1</literal> database, which will + serve as the default for all other databases. + </para> + + <para> + To alter the default collation order or character set classes, use the + <option>--lc-collate</option> and <option>--lc-ctype</option> options. + Collation orders other than <literal>C</> or <literal>POSIX</> also have + a performance penalty. For these reasons it is important to choose the + right locale when running <command>initdb</command>. + </para> + + <para> + The remaining locale categories can be changed later when the server + is started. You can also use <option>--locale</option> to set the + default for all locale categories, including collation order and + character set classes. All server locale values (<literal>lc_*</>) can + be displayed via <command>SHOW ALL</>. More details can be found in <xref linkend="locale">. </para> <para> - The character set encoding can be set separately for a database when - it is created. <command>initdb</command> determines the encoding for - the <literal>template1</literal> database, which will serve as the - default for all other databases. To alter the default encoding use - the <option>--encoding</option> option. More details can be found in - <xref linkend="multibyte">. + To alter the default encoding, use the <option>--encoding</option>. + More details can be found in <xref linkend="multibyte">. </para> </refsect1> diff --git a/doc/src/sgml/ref/pg_controldata.sgml b/doc/src/sgml/ref/pg_controldata.sgml index 466c03e2244..62695963e2b 100644 --- a/doc/src/sgml/ref/pg_controldata.sgml +++ b/doc/src/sgml/ref/pg_controldata.sgml @@ -1,5 +1,5 @@ <!-- -$PostgreSQL: pgsql/doc/src/sgml/ref/pg_controldata.sgml,v 1.10 2007/02/20 18:10:58 momjian Exp $ +$PostgreSQL: pgsql/doc/src/sgml/ref/pg_controldata.sgml,v 1.11 2008/09/23 09:20:35 heikki Exp $ PostgreSQL documentation --> @@ -30,7 +30,7 @@ PostgreSQL documentation <title>Description</title> <para> <command>pg_controldata</command> prints information initialized during - <command>initdb</>, such as the catalog version and server locale. + <command>initdb</>, such as the catalog version. It also shows information about write-ahead logging and checkpoint processing. This information is cluster-wide, and not specific to any one database. diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml index 588ff38c1bb..a9d34298e4c 100644 --- a/doc/src/sgml/ref/pg_resetxlog.sgml +++ b/doc/src/sgml/ref/pg_resetxlog.sgml @@ -1,5 +1,5 @@ <!-- -$PostgreSQL: pgsql/doc/src/sgml/ref/pg_resetxlog.sgml,v 1.20 2007/01/31 23:26:04 momjian Exp $ +$PostgreSQL: pgsql/doc/src/sgml/ref/pg_resetxlog.sgml,v 1.21 2008/09/23 09:20:35 heikki Exp $ PostgreSQL documentation --> @@ -62,14 +62,10 @@ PostgreSQL documentation by specifying the <literal>-f</> (force) switch. In this case plausible values will be substituted for the missing data. Most of the fields can be expected to match, but manual assistance might be needed for the next OID, - next transaction ID and epoch, next multitransaction ID and offset, - WAL starting address, and database locale fields. - The first six of these can be set using the switches discussed below. - <command>pg_resetxlog</command>'s own environment is the source for its - guess at the locale fields; take care that <envar>LANG</> and so forth - match the environment that <command>initdb</> was run in. - If you are not able to determine correct values for all these fields, - <literal>-f</> can still be used, but + next transaction ID and epoch, next multitransaction ID and offset, and + WAL starting address fields. These fields can be set using the switches + discussed below. If you are not able to determine correct values for all + these fields, <literal>-f</> can still be used, but the recovered database must be treated with even more suspicion than usual: an immediate dump and reload is imperative. <emphasis>Do not</> execute any data-modifying operations in the database before you dump, diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 000b5614dd2..d8ed7aef9c6 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -1,5 +1,5 @@ <!-- -$PostgreSQL: pgsql/doc/src/sgml/ref/select.sgml,v 1.103 2008/02/15 22:17:06 tgl Exp $ +$PostgreSQL: pgsql/doc/src/sgml/ref/select.sgml,v 1.104 2008/09/23 09:20:35 heikki Exp $ PostgreSQL documentation --> @@ -747,8 +747,7 @@ SELECT name FROM distributors ORDER BY code; <para> Character-string data is sorted according to the locale-specific - collation order that was established when the database cluster - was initialized. + collation order that was established when the database was created. </para> </refsect2> diff --git a/doc/src/sgml/ref/show.sgml b/doc/src/sgml/ref/show.sgml index ebd1acee35a..fdc348053ea 100644 --- a/doc/src/sgml/ref/show.sgml +++ b/doc/src/sgml/ref/show.sgml @@ -1,5 +1,5 @@ <!-- -$PostgreSQL: pgsql/doc/src/sgml/ref/show.sgml,v 1.45 2008/01/03 21:23:15 tgl Exp $ +$PostgreSQL: pgsql/doc/src/sgml/ref/show.sgml,v 1.46 2008/09/23 09:20:35 heikki Exp $ PostgreSQL documentation --> @@ -82,8 +82,8 @@ SHOW ALL <para> Shows the database's locale setting for collation (text ordering). At present, this parameter can be shown but not - set, because the setting is determined at - <command>initdb</> time. + set, because the setting is determined at database creation + time. </para> </listitem> </varlistentry> @@ -94,8 +94,8 @@ SHOW ALL <para> Shows the database's locale setting for character classification. At present, this parameter can be shown but - not set, because the setting is determined at - <command>initdb</> time. + not set, because the setting is determined at database creation + time. </para> </listitem> </varlistentry> diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index 75c6d266e9d..adde49e1a39 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -1,4 +1,4 @@ -<!-- $PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.416 2008/04/26 22:47:40 tgl Exp $ --> +<!-- $PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.417 2008/09/23 09:20:34 heikki Exp $ --> <chapter Id="runtime"> <title>Operating System Environment</title> @@ -145,11 +145,12 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput> Normally, it will just take the locale settings in the environment and apply them to the initialized database. It is possible to specify a different locale for the database; more information about - that can be found in <xref linkend="locale">. The sort order used - within a particular database cluster is set by - <command>initdb</command> and cannot be changed later, short of - dumping all data, rerunning <command>initdb</command>, and reloading - the data. There is also a performance impact for using locales + that can be found in <xref linkend="locale">. The default sort order used + within the particular database cluster is set by + <command>initdb</command>, and while you can create new databases using + different sort order, the order used in the template databases that initdb + creates cannot be changed without dropping and recreating them. + There is also a performance impact for using locales other than <literal>C</> or <literal>POSIX</>. Therefore, it is important to make this choice correctly the first time. </para> diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index 41db566b6cc..45a9f5a389f 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -1,4 +1,4 @@ -<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.44 2008/05/16 16:31:01 tgl Exp $ --> +<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.45 2008/09/23 09:20:34 heikki Exp $ --> <chapter id="textsearch"> <title id="textsearch-title">Full Text Search</title> @@ -1896,7 +1896,7 @@ LIMIT 10; <note> <para> - The parser's notion of a <quote>letter</> is determined by the server's + The parser's notion of a <quote>letter</> is determined by the database's locale setting, specifically <varname>lc_ctype</>. Words containing only the basic ASCII letters are reported as a separate token type, since it is sometimes useful to distinguish them. In most European |