aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/mb/Unicode/utf8_to_win1258.map
Commit message (Collapse)AuthorAge
* Update headers of generated filesPeter Eisentraut2018-02-24
| | | | | The scripts were changed in c98c35cd084a25c6cf9b08c76de8b89facd75fe7, but the output files were not updated to reflect the script changes.
* Include array size in forward declaration.Heikki Linnakangas2017-03-13
| | | | | Some compilers require it. At least Visual Studio, according to the buildfarm, and gcc with the -pedantic flag.
* Use radix tree for character encoding conversions.Heikki Linnakangas2017-03-13
| | | | | | | | | | | | | | | | | | | Replace the mapping tables used to convert between UTF-8 and other character encodings with new radix tree-based maps. Looking up an entry in a radix tree is much faster than a binary search in the old maps. As a bonus, the radix tree representation is also more compact, making the binaries slightly smaller. The "combined" maps work the same as before, with binary search. They are much smaller than the main tables, so it doesn't matter so much. However, the "combined" maps are now stored in the same .map files as the main tables. This seems more clear, since they're always used together, and generated from the same source files. Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages. Reviewed by Michael Paquier and Daniel Gustafsson. Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp
* Auto-generate file header comments in Unicode mapping files.Tom Lane2015-11-27
| | | | | | | | | | | | | | | Some of the Unicode/*.map files had identification comments added to them, evidently by hand. Others did not. Modify the generating scripts to produce these comments automatically, and update the generated files that lacked them. This is just minor cleanup as a by-product of trying to verify that the *.map files can indeed be reproduced from authoritative data. There are a depressingly large number that fail to reproduce from the claimed sources. I have not touched those in this commit, except for the JIS 2004-related files which required only a single comment update to match. Since this only affects comments, no need to consider a back-patch.
* Teach UtfToLocal/LocalToUtf to support algorithmic encoding conversions.Tom Lane2015-05-14
| | | | | | | | | | | | | | | | | | | | | Until now, these functions have only supported encoding conversions using lookup tables, which is fine as long as there's not too many code points to convert. However, GB18030 expects all 1.1 million Unicode code points to be convertible, which would require a ridiculously-sized lookup table. Fortunately, a large fraction of those conversions can be expressed through arithmetic, ie the conversions are one-to-one in certain defined ranges. To support that, provide a callback function that is used after consulting the lookup tables. (This patch doesn't actually change anything about the GB18030 conversion behavior, just provide infrastructure for fixing it.) Since this requires changing the APIs of UtfToLocal/LocalToUtf anyway, take the opportunity to rearrange their argument lists into what seems to me a saner order. And beautify the call sites by using lengthof() instead of error-prone sizeof() arithmetic. In passing, also mark all the lookup tables used by these calls "const". This moves an impressive amount of stuff into the text segment, at least on my machine, and is safer anyhow.
* Add support for Windows codepages 1253, 1254, 1255, and 1257 and cleanPeter Eisentraut2006-02-18
| | | | | | | | | | | | | | | | | | | | | up a bunch of the support utilities. In src/backend/utils/mb/Unicode remove nearly duplicate copies of the UCS_to_XXX perl script and replace with one version to handle all generic files. Update the Makefile so that it knows about all the map files. This produces a slight difference in some of the map files, using a uniform naming convention and not mapping the null character. In src/backend/utils/mb/conversion_procs create a master utf8<->win codepage function like the ISO 8859 versions instead of having a separate handler for each conversion. There is an externally visible change in the name of the win1258 to utf8 conversion. According to the documentation notes, it was named incorrectly and this changes it to a standard name. Running the Unicode mapping perl scripts has shown some additional mapping changes in koi8r and iso8859-7.
* Rename canonical encodings, per Peter:Bruce Momjian2005-03-07
UNICODE => UTF8 ALT => WIN866 WIN => WIN1251 TCVN => WIN1258 The old codes continue to work.