diff options
author | Giuseppe D'Angelo <giuseppe.dangelo@kdab.com> | 2021-04-15 14:39:51 +0200 |
---|---|---|
committer | Qt Cherry-pick Bot <cherrypick_bot@qt-project.org> | 2021-04-16 22:01:45 +0000 |
commit | a3f75608f0e8339ae2a6ca4b34955d69fdbacc84 (patch) | |
tree | c4f4eb367486f8d97dfcf6d2c4fd73d90e915023 /examples | |
parent | b3fed16e9bf62d511f3c40e9c3dca01332ae8160 (diff) |
Unicode: fix the extended grapheme cluster algorithm
UAX #29 in Unicode 11 changed the EGC algorithm to its current form.
Although Qt has upgraded the Unicode tables all the way up to
Unicode 13, the algorithm has never been adapted; in other words,
it has been working by chance for years. Luckily, MOST
of the cases were dealt with correctly, but emoji handling
actually manages to break it.
This commit:
* Adds parsing of emoji-data.txt into the unicode table generator.
That is necessary to extract the Extended_Pictographic property,
which is used by the EGC algorithm.
* Regenerates the tables.
* Removes some obsoleted grapheme cluster break properties, and
adds the ones added in the meanwhile.
* Rewrites the EGC algorithm according to Unicode 13. This is
done by simplifying a lot the lookup table. Some rules (GB11,
GB12, GB13) can't be done by the table alone so some hand-rolled
code is necessary in that case.
* Thanks to these fixes, the complete upstream GraphemeBreakTest
now passes. Remove the "edited" version that ignored some rows
(because they were failing).
Change-Id: Iaa07cb2e6d0ab9deac28397f46d9af189d2edf8b
Fixes: QTBUG-92822
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
(cherry picked from commit a794c5e287381bd056008b20ae55f9b1e0acf138)
Reviewed-by: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>
Diffstat (limited to 'examples')
0 files changed, 0 insertions, 0 deletions