aboutsummaryrefslogtreecommitdiff
path: root/src/include/common/unicode_combining_table.h
Commit message (Collapse)AuthorAge
* Treat Unicode codepoints of category "Format" as non-spacingJohn Naylor2022-09-13
| | | | | | | | | | | | | | | | | | | | Commit d8594d123 updated the list of non-spacing codepoints used for calculating display width, but in doing so inadvertently removed some, since the script used for that commit only considered combining characters. For complete coverage for zero-width characters, include codepoints in the category Cf (Format). To reflect the wider purpose, also rename files and update comments that referred specifically to combining characters. Some of these ranges have been missing since v12, but due to lack of field complaints it was determined not important enough to justify adding special-case logic the backbranches. Kyotaro Horiguchi Report by Pavel Stehule Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRBE8yvpQ0FSkPCoe0Ny1jAAsAQ6j3qMgVwWvkqAoaaNmQ%40mail.gmail.com
* Update Unicode data to Unicode 14.0.0Peter Eisentraut2021-09-15
|
* Extend collection of Unicode combining characters to beyond the BMPJohn Naylor2021-08-26
| | | | | | | | The former limit was perhaps a carryover from an older hand-coded table. Since commit bab982161 we have enough space in mbinterval to store larger codepoints, so collect all combining characters. Discussion: https://www.postgresql.org/message-id/49ad1fa0-174e-c901-b14c-c484b60907f1%40enterprisedb.com
* Revert "Rename unicode_combining_table to unicode_width_table"John Naylor2021-08-26
| | | | | | | | | | | | | This reverts commit eb0d0d2c7300c9c5c22b35975c11265aa4becc84. After I had committed eb0d0d2c7 and 78ab944cd, I decided to add a sanity check for a "can't happen" scenario just to be cautious. It turned out that it already happened in the official Unicode source data, namely that a character can be both wide and a combining character. This fact renders the aforementioned commits unnecessary, so revert both of them. Discussion: https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com
* Rename unicode_combining_table to unicode_width_tableJohn Naylor2021-08-25
| | | | | No functional changes. A future commit will use this table for other purposes besides combining characters.
* Update Unicode data to Unicode 13.0.0 and CLDR 37Peter Eisentraut2020-04-24
|
* Add support for automatically updating Unicode derived filesPeter Eisentraut2020-01-09
We currently have several sets of files generated from data provided by Unicode. These all have ad hoc rules and instructions for updating when new Unicode versions appear, and it's not done consistently. This patch centralizes and automates the process and makes it part of the release checklist. The Unicode and CLDR versions are specified in Makefile.global.in. There is a new make target "update-unicode" that downloads all the relevant files and runs the generation script. There is also a new script for generating the table of combining characters for ucs_wcwidth(). That table is now in a separate include file rather than hardcoded into the middle of other code. This is based on the script that was used for generating d8594d123c155aeecd47fc2450f62f5100b2fbf0, but the script itself wasn't committed at that time. Reviewed-by: John Naylor <john.naylor@2ndquadrant.com> Discussion: https://www.postgresql.org/message-id/flat/c8d05f42-443e-6c23-819b-05b31759a37c@2ndquadrant.com