summaryrefslogtreecommitdiffstats
path: root/util/android
diff options
context:
space:
mode:
authorGiuseppe D'Angelo <giuseppe.dangelo@kdab.com>2021-04-15 14:39:51 +0200
committerQt Cherry-pick Bot <cherrypick_bot@qt-project.org>2021-04-19 18:40:50 +0000
commitcf00353f303de3956a4116f4cec826e16aa6181f (patch)
tree116d7b4be1c8dd77ad135649c18415e1d5b6ea7b /util/android
parentd232de70055fba74ce7d4baecc5fa3b6d53e400f (diff)
Unicode: fix the extended grapheme cluster algorithm
UAX #29 in Unicode 11 changed the EGC algorithm to its current form. Although Qt has upgraded the Unicode tables all the way up to Unicode 13, the algorithm has never been adapted; in other words, it has been working by chance for years. Luckily, MOST of the cases were dealt with correctly, but emoji handling actually manages to break it. This commit: * Adds parsing of emoji-data.txt into the unicode table generator. That is necessary to extract the Extended_Pictographic property, which is used by the EGC algorithm. * Regenerates the tables. * Removes some obsoleted grapheme cluster break properties, and adds the ones added in the meanwhile. * Rewrites the EGC algorithm according to Unicode 13. This is done by simplifying a lot the lookup table. Some rules (GB11, GB12, GB13) can't be done by the table alone so some hand-rolled code is necessary in that case. * Thanks to these fixes, the complete upstream GraphemeBreakTest now passes. Remove the "edited" version that ignored some rows (because they were failing). Change-Id: Iaa07cb2e6d0ab9deac28397f46d9af189d2edf8b Fixes: QTBUG-92822 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com> (cherry picked from commit a794c5e287381bd056008b20ae55f9b1e0acf138) Reviewed-by: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>
Diffstat (limited to 'util/android')
0 files changed, 0 insertions, 0 deletions