11 files changed, 137 insertions, 94 deletions
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 40f1ce568ed..bf1ac314f73 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.175 2008/09/19 19:03:40 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.176 2008/09/23 09:20:33 heikki Exp $ -->
 <!--
  Documentation of the system catalogs, directed toward PostgreSQL developers
  -->
@@ -2150,6 +2150,20 @@
      </row>
 
      <row>
+      <entry><structfield>datcollate</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry></entry>
+      <entry>LC_COLLATE for this database</entry>
+     </row>
+
+     <row>
+      <entry><structfield>datctype</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry></entry>
+      <entry>LC_CTYPE for this database</entry>
+     </row>
+
+     <row>
       <entry><structfield>datistemplate</structfield></entry>
       <entry><type>bool</type></entry>
       <entry></entry>
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 1f4866b203c..c012294ef81 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.87 2008/07/15 17:45:03 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/charset.sgml,v 2.88 2008/09/23 09:20:34 heikki Exp $ -->
 
 <chapter id="charset">
  <title>Localization</>
@@ -130,23 +130,23 @@ initdb --locale=sv_SE
 
    <para>
     The nature of some locale categories is that their value has to be
-    fixed for the lifetime of a database cluster.  That is, once
-    <command>initdb</command> has run, you cannot change them anymore.
-    <literal>LC_COLLATE</literal> and <literal>LC_CTYPE</literal> are
-    those categories.  They affect the sort order of indexes, so they
-    must be kept fixed, or indexes on text columns will become corrupt.
-    <productname>PostgreSQL</productname> enforces this by recording
-    the values of <envar>LC_COLLATE</> and <envar>LC_CTYPE</> that are
-    seen by <command>initdb</>.  The server automatically adopts
-    those two values when it is started.
+    fixed when the database is created.  You can use different settings
+    for different databases, but once a database is created, you cannot
+    change them for that database anymore. <literal>LC_COLLATE</literal>
+    and <literal>LC_CTYPE</literal> are those categories.  They affect
+    the sort order of indexes, so they must be kept fixed, or indexes on
+    text columns will become corrupt.  The default values for these
+    categories are defined when <command>initdb</command> is run, and
+    those values are used when new databases are created, unless
+    specified otherwise in the <command>CREATE DATABASE</command> command.
    </para>
 
    <para>
     The other locale categories can be changed as desired whenever the
     server is running by setting the run-time configuration variables
     that have the same name as the locale categories (see <xref
-    linkend="runtime-config-client-format"> for details).  The defaults that are
-    chosen by <command>initdb</command> are actually only written into
+    linkend="runtime-config-client-format"> for details).  The defaults
+    that are chosen by <command>initdb</command> are actually only written into
     the configuration file <filename>postgresql.conf</filename> to
     serve as defaults when the server is started.  If you delete these
     assignments from <filename>postgresql.conf</filename> then the
@@ -261,7 +261,7 @@ initdb --locale=sv_SE
 
    <para>
     Check that <productname>PostgreSQL</> is actually using the locale
-    that you think it is.  <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
+    that you think it is.  The default <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
     settings are determined at <command>initdb</> time and cannot be
     changed without repeating <command>initdb</>.  Other locale
     settings including <envar>LC_MESSAGES</> and <envar>LC_MONETARY</>
@@ -319,17 +319,11 @@ initdb --locale=sv_SE
   </para>
 
   <para>
-   An important restriction, however, is that each database character set
-   must be compatible with the server's <envar>LC_CTYPE</> setting.
+   An important restriction, however, is that each database's character set
+   must be compatible with the database's <envar>LC_CTYPE</> setting.
    When <envar>LC_CTYPE</> is <literal>C</> or <literal>POSIX</>, any
    character set is allowed, but for other settings of <envar>LC_CTYPE</>
    there is only one character set that will work correctly.
-   Since the <envar>LC_CTYPE</> setting is frozen by <command>initdb</>, the
-   apparent flexibility to use different encodings in different databases
-   of a cluster is more theoretical than real, except when you select
-   <literal>C</> or <literal>POSIX</> locale (thus disabling any real locale
-   awareness).  It is likely that these mechanisms will be revisited in future
-   versions of <productname>PostgreSQL</productname>.
   </para>
 
    <sect2 id="multibyte-charset-supported">
@@ -734,19 +728,19 @@ initdb -E EUC_JP
     </para>
 
     <para>
-     If you have selected <literal>C</> or <literal>POSIX</> locale,
-     you can create a database with a different character set:
+     You can specify a non-default encoding at database creation time,
+     provided that the encoding is compatible with the selected locale:
 
 <screen>
-createdb -E EUC_KR korean
+createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean
 </screen>
 
      This will create a database named <literal>korean</literal> that
-     uses the character set <literal>EUC_KR</literal>.  Another way to
-     accomplish this is to use this SQL command:
+     uses the character set <literal>EUC_KR</literal>, and locale <literal>ko_KR</literal>.
+     Another way to accomplish this is to use this SQL command:
 
 <programlisting>
-CREATE DATABASE korean WITH ENCODING 'EUC_KR';
+CREATE DATABASE korean WITH ENCODING 'EUC_KR' COLLATE='ko_KR.euckr' CTYPE='ko_KR.euckr' TEMPLATE=template0;
 </programlisting>
 
      The encoding for a database is stored in the system catalog
@@ -756,20 +750,17 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR';
 
 <screen>
 $ <userinput>psql -l</userinput>
-            List of databases
-   Database    |  Owner  |   Encoding    
----------------+---------+---------------
- euc_cn        | t-ishii | EUC_CN
- euc_jp        | t-ishii | EUC_JP
- euc_kr        | t-ishii | EUC_KR
- euc_tw        | t-ishii | EUC_TW
- mule_internal | t-ishii | MULE_INTERNAL
- postgres      | t-ishii | EUC_JP
- regression    | t-ishii | SQL_ASCII
- template1     | t-ishii | EUC_JP
- test          | t-ishii | EUC_JP
- utf8          | t-ishii | UTF8
-(9 rows)
+                                         List of databases
+   Name    |  Owner   | Encoding  |  Collation  |    Ctype    |          Access Privileges          
+-----------+----------+-----------+-------------+-------------+-------------------------------------
+ clocaledb | hlinnaka | SQL_ASCII | C           | C           | 
+ englishdb | hlinnaka | UTF8      | en_GB.UTF8  | en_GB.UTF8  | 
+ japanese  | hlinnaka | UTF8      | ja_JP.UTF8  | ja_JP.UTF8  | 
+ korean    | hlinnaka | EUC_KR    | ko_KR.euckr | ko_KR.euckr | 
+ postgres  | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | 
+ template0 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
+ template1 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
+(7 rows)
 </screen>
     </para>
 
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 2ab713c39be..0993a8be03f 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.74 2008/07/11 21:06:28 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/indices.sgml,v 1.75 2008/09/23 09:20:34 heikki Exp $ -->
 
 <chapter id="indexes">
  <title id="indexes-title">Indexes</title>
@@ -157,7 +157,7 @@ CREATE INDEX test1_id_index ON test1 (id);
    <emphasis>if</emphasis> the pattern is a constant and is anchored to
    the beginning of the string &mdash; for example, <literal>col LIKE
    'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not
-   <literal>col LIKE '%bar'</literal>. However, if your server does not
+   <literal>col LIKE '%bar'</literal>. However, if your database does not
    use the C locale you will need to create the index with a special
    operator class to support indexing of pattern-matching queries. See
    <xref linkend="indexes-opclass"> below. It is also possible to use
@@ -922,7 +922,7 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
       according to the locale-specific collation rules.  This makes
       these operator classes suitable for use by queries involving
       pattern matching expressions (<literal>LIKE</literal> or POSIX
-      regular expressions) when the server does not use the standard
+      regular expressions) when the database does not use the standard
       <quote>C</quote> locale.  As an example, you might index a
       <type>varchar</type> column like this:
 <programlisting>
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index b1b13332456..5e72768981c 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/ref/create_database.sgml,v 1.48 2007/09/28 22:25:49 tgl Exp $
+$PostgreSQL: pgsql/doc/src/sgml/ref/create_database.sgml,v 1.49 2008/09/23 09:20:34 heikki Exp $
 PostgreSQL documentation
 -->
 
@@ -24,6 +24,8 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
     [ [ WITH ] [ OWNER [=] <replaceable class="parameter">dbowner</replaceable> ]
            [ TEMPLATE [=] <replaceable class="parameter">template</replaceable> ]
            [ ENCODING [=] <replaceable class="parameter">encoding</replaceable> ]
+           [ COLLATE [=] <replaceable class="parameter">collate</replaceable> ]
+           [ CTYPE [=] <replaceable class="parameter">ctype</replaceable> ]
            [ TABLESPACE [=] <replaceable class="parameter">tablespace</replaceable> ]
            [ CONNECTION LIMIT [=] <replaceable class="parameter">connlimit</replaceable> ] ]
 </synopsis>
@@ -113,6 +115,29 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
       </listitem>
      </varlistentry>
      <varlistentry>
+      <term><replaceable class="parameter">collate</replaceable></term>
+      <listitem>
+       <para>
+        Collation order (<literal>LC_COLLATE</>) to use in the new database.
+        This affects the sort order applied to strings, e.g in queries with
+        ORDER BY, as well as the order used in indexes on text columns.
+        The default is to use the collation order of the template database.
+        See below for additional restrictions.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
+      <term><replaceable class="parameter">ctype</replaceable></term>
+      <listitem>
+       <para>
+        Character classification (<literal>LC_CTYPE</>) to use in the new
+        database. This affects the categorization of characters, e.g. lower,
+        upper and digit. The default is to use the character classification of
+        the template database. See below for additional restrictions.
+       </para>
+      </listitem>
+     </varlistentry>
+     <varlistentry>
       <term><replaceable class="parameter">tablespace</replaceable></term>
       <listitem>
        <para>
@@ -180,13 +205,11 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
   </para>
 
   <para>
-   Any character set encoding specified for the new database must be
-   compatible with the server's <envar>LC_CTYPE</> locale setting.
+   The character set encoding specified for the new database must be
+   compatible with the chosen COLLATE and CTYPE settings.
    If <envar>LC_CTYPE</> is <literal>C</> (or equivalently
    <literal>POSIX</>), then all encodings are allowed, but for other
-   locale settings there is only one encoding that will work properly,
-   and so the apparent freedom to specify an encoding is illusory if
-   you didn't initialize the database cluster in <literal>C</> locale.
+   locale settings there is only one encoding that will work properly.
    <command>CREATE DATABASE</> will allow superusers to specify
    <literal>SQL_ASCII</> encoding regardless of the locale setting,
    but this choice is deprecated and may result in misbehavior of
@@ -195,6 +218,16 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable>
   </para>
 
   <para>
+   The <literal>COLLATE</> and <literal>CTYPE</> settings must match
+   those of the template database, except when template0 is used as
+   template. This is because <literal>COLLATE</> and <literal>CTYPE</>
+   affects the ordering in indexes, so that any indexes copied from the
+   template database would be invalid in the new database with different
+   settings. <literal>template0</literal>, however, is known to not
+   contain any indexes that would be affected.
+  </para>
+
+  <para>
    The <literal>CONNECTION LIMIT</> option is only enforced approximately;
    if two new sessions start at about the same time when just one
    connection <quote>slot</> remains for the database, it is possible that
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 312da7085a9..110c21eb8c5 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/ref/initdb.sgml,v 1.43 2007/03/26 17:23:36 tgl Exp $
+$PostgreSQL: pgsql/doc/src/sgml/ref/initdb.sgml,v 1.44 2008/09/23 09:20:34 heikki Exp $
 PostgreSQL documentation
 -->
 
@@ -76,25 +76,34 @@ PostgreSQL documentation
 
   <para>
    <command>initdb</command> initializes the database cluster's default
-   locale and character set encoding. The collation order
-   (<literal>LC_COLLATE</>) and character set classes
-   (<literal>LC_CTYPE</>, e.g. upper, lower, digit) are fixed for all
-   databases and cannot be changed. Collation orders other than
-   <literal>C</> or <literal>POSIX</> also have a performance penalty.
-   For these reasons it is important to choose the right locale when
-   running <command>initdb</command>. The remaining locale categories
-   can be changed later when the server is started. All server locale
-   values (<literal>lc_*</>) can be displayed via <command>SHOW ALL</>.
+   locale and character set encoding. The character set encoding,
+   collation order (<literal>LC_COLLATE</>) and character set classes
+   (<literal>LC_CTYPE</>, e.g. upper, lower, digit) can be set separately
+   for a database when it is created. <command>initdb</command> determines
+   those settings for the <literal>template1</literal> database, which will
+   serve as the default for all other databases.
+  </para>
+
+  <para>
+   To alter the default collation order or character set classes, use the
+   <option>--lc-collate</option> and <option>--lc-ctype</option> options.
+   Collation orders other than <literal>C</> or <literal>POSIX</> also have
+   a performance penalty.  For these reasons it is important to choose the
+   right locale when running <command>initdb</command>. 
+  </para>
+
+  <para>
+   The remaining locale categories can be changed later when the server
+   is started.  You can also use <option>--locale</option> to set the
+   default for all locale categories, including collation order and
+   character set classes. All server locale values (<literal>lc_*</>) can
+   be displayed via <command>SHOW ALL</>.
    More details can be found in <xref linkend="locale">.
   </para>
 
   <para>
-   The character set encoding can be set separately for a database when
-   it is created. <command>initdb</command> determines the encoding for
-   the <literal>template1</literal> database, which will serve as the
-   default for all other databases. To alter the default encoding use
-   the <option>--encoding</option> option. More details can be found in
-   <xref linkend="multibyte">.
+   To alter the default encoding, use the <option>--encoding</option>.
+   More details can be found in <xref linkend="multibyte">.
   </para>
 
  </refsect1>
diff --git a/doc/src/sgml/ref/pg_controldata.sgml b/doc/src/sgml/ref/pg_controldata.sgml
index 466c03e2244..62695963e2b 100644
--- a/doc/src/sgml/ref/pg_controldata.sgml
+++ b/doc/src/sgml/ref/pg_controldata.sgml
@@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/ref/pg_controldata.sgml,v 1.10 2007/02/20 18:10:58 momjian Exp $
+$PostgreSQL: pgsql/doc/src/sgml/ref/pg_controldata.sgml,v 1.11 2008/09/23 09:20:35 heikki Exp $
 PostgreSQL documentation
 -->
 
@@ -30,7 +30,7 @@ PostgreSQL documentation
   <title>Description</title>
   <para>
    <command>pg_controldata</command> prints information initialized during
-   <command>initdb</>, such as the catalog version and server locale.
+   <command>initdb</>, such as the catalog version.
    It also shows information about write-ahead logging and checkpoint 
    processing.  This information is cluster-wide, and not specific to any one
    database.
diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml
index 588ff38c1bb..a9d34298e4c 100644
--- a/doc/src/sgml/ref/pg_resetxlog.sgml
+++ b/doc/src/sgml/ref/pg_resetxlog.sgml
@@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/ref/pg_resetxlog.sgml,v 1.20 2007/01/31 23:26:04 momjian Exp $
+$PostgreSQL: pgsql/doc/src/sgml/ref/pg_resetxlog.sgml,v 1.21 2008/09/23 09:20:35 heikki Exp $
 PostgreSQL documentation
 -->
 
@@ -62,14 +62,10 @@ PostgreSQL documentation
    by specifying the <literal>-f</> (force) switch.  In this case plausible
    values will be substituted for the missing data.  Most of the fields can be
    expected to match, but manual assistance might be needed for the next OID,
-   next transaction ID and epoch, next multitransaction ID and offset,
-   WAL starting address, and database locale fields.
-   The first six of these can be set using the switches discussed below.
-   <command>pg_resetxlog</command>'s own environment is the source for its
-   guess at the locale fields; take care that <envar>LANG</> and so forth
-   match the environment that <command>initdb</> was run in.
-   If you are not able to determine correct values for all these fields,
-   <literal>-f</> can still be used, but
+   next transaction ID and epoch, next multitransaction ID and offset, and
+   WAL starting address fields. These fields can be set using the switches
+   discussed below. If you are not able to determine correct values for all
+   these fields, <literal>-f</> can still be used, but
    the recovered database must be treated with even more suspicion than
    usual: an immediate dump and reload is imperative.  <emphasis>Do not</>
    execute any data-modifying operations in the database before you dump,
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 000b5614dd2..d8ed7aef9c6 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/ref/select.sgml,v 1.103 2008/02/15 22:17:06 tgl Exp $
+$PostgreSQL: pgsql/doc/src/sgml/ref/select.sgml,v 1.104 2008/09/23 09:20:35 heikki Exp $
 PostgreSQL documentation
 -->
 
@@ -747,8 +747,7 @@ SELECT name FROM distributors ORDER BY code;
 
    <para>
     Character-string data is sorted according to the locale-specific
-    collation order that was established when the database cluster
-    was initialized.
+    collation order that was established when the database was created.
    </para>
   </refsect2>
 
diff --git a/doc/src/sgml/ref/show.sgml b/doc/src/sgml/ref/show.sgml
index ebd1acee35a..fdc348053ea 100644
--- a/doc/src/sgml/ref/show.sgml
+++ b/doc/src/sgml/ref/show.sgml
@@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/ref/show.sgml,v 1.45 2008/01/03 21:23:15 tgl Exp $
+$PostgreSQL: pgsql/doc/src/sgml/ref/show.sgml,v 1.46 2008/09/23 09:20:35 heikki Exp $
 PostgreSQL documentation
 -->
 
@@ -82,8 +82,8 @@ SHOW ALL
          <para>
           Shows the database's locale setting for collation (text
           ordering).  At present, this parameter can be shown but not
-          set, because the setting is determined at
-          <command>initdb</> time.
+          set, because the setting is determined at database creation
+          time.
          </para>
         </listitem>
        </varlistentry>
@@ -94,8 +94,8 @@ SHOW ALL
          <para>
           Shows the database's locale setting for character
           classification.  At present, this parameter can be shown but
-          not set, because the setting is determined at
-          <command>initdb</> time.
+          not set, because the setting is determined at database creation
+          time.
          </para>
         </listitem>
        </varlistentry>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 75c6d266e9d..adde49e1a39 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.416 2008/04/26 22:47:40 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/runtime.sgml,v 1.417 2008/09/23 09:20:34 heikki Exp $ -->
 
 <chapter Id="runtime">
  <title>Operating System Environment</title>
@@ -145,11 +145,12 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
    Normally, it will just take the locale settings in the environment
    and apply them to the initialized database.  It is possible to
    specify a different locale for the database; more information about
-   that can be found in <xref linkend="locale">.  The sort order used
-   within a particular database cluster is set by
-   <command>initdb</command> and cannot be changed later, short of
-   dumping all data, rerunning <command>initdb</command>, and reloading
-   the data. There is also a performance impact for using locales
+   that can be found in <xref linkend="locale">.  The default sort order used
+   within the particular database cluster is set by
+   <command>initdb</command>, and while you can create new databases using
+   different sort order, the order used in the template databases that initdb
+   creates cannot be changed without dropping and recreating them.
+   There is also a performance impact for using locales
    other than <literal>C</> or <literal>POSIX</>. Therefore, it is
    important to make this choice correctly the first time.
   </para>
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index 41db566b6cc..45a9f5a389f 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.44 2008/05/16 16:31:01 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.45 2008/09/23 09:20:34 heikki Exp $ -->
 
 <chapter id="textsearch">
  <title id="textsearch-title">Full Text Search</title>
@@ -1896,7 +1896,7 @@ LIMIT 10;
 
   <note>
    <para>
-    The parser's notion of a <quote>letter</> is determined by the server's
+    The parser's notion of a <quote>letter</> is determined by the database's
     locale setting, specifically <varname>lc_ctype</>.  Words containing
     only the basic ASCII letters are reported as a separate token type,
     since it is sometimes useful to distinguish them.  In most European