doc/src/sgml/README.non-ASCII


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

<!-- doc/src/sgml/README.non-ASCII -->

Representation of non-ASCII characters
--------------------------------------

Find non-ASCII characters using:

        grep --recursive --color='auto' -P '[\x80-\xFF]' .

Convert to HTML4 named entity (&) escapes
-----------------------------------------

We support several output formats:

*  html (supports all Unicode characters)
*  man (supports all Unicode characters)
*  pdf (supports only Latin-1 characters)
*  info

While some output formatting tools support all Unicode characters,
others only support Latin-1 characters.  Specifically, the PDF rendering
engine can only display Latin-1 characters;  non-Latin-1 Unicode
characters are displayed as "###".

Therefore, in the SGML files, we can only use Latin-1 characters.  We
can use UTF8 representations of Latin-1 characters, or HTML entities of
Latin-1 characters, e.g., &Aacute;lvaro.

Do not use UTF numeric character escapes (&#nnn;).

When building the PDF docs, problem characters will appear as warnings.

HTML entities
        official:      http://www.w3.org/TR/html4/sgml/entities.html
        one page:      http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
        other lists:   http://www.zipcon.net/~swhite/docs/computers/browsers/entities.html
                       http://www.zipcon.net/~swhite/docs/computers/browsers/entities_page.html
                       https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references