summaryrefslogtreecommitdiffstats
path: root/util/unicode/data
Commit message (Collapse)AuthorAgeFilesLines
* unicode: Import version 15.1 (UCD version 32)Ievgenii Meshcheriakov2024-02-0817-6214/+12324
| | | | | | | | | | | | | | | | | | | | | | Add enumerator for the new Unicode version to QChar::UnicodeVersion. Remap new line breaking classes to their Unicode 15.0 values: * AK, AP and AS to AL, * VI and VF to CM. These are classes for new line breaking support for Indic scripts that require more work. Blacklist failing tests for now: * tst_QUrlUts46::idnaTestV2 * tst_QTextBoundaryFinder::lineBoundariesDefault * tst_QTextBoundaryFinder::graphemeBoundariesDefault Regenerate the source files. Task-number: QTBUG-121529 Change-Id: I869cc9fbaa53765d8ae6265c22cdbef9f19d05bf Reviewed-by: MÃ¥rten Nordheim <marten.nordheim@qt.io> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Update UCD to Revision 30Ievgenii Meshcheriakov2022-10-1117-236/+962
| | | | | | | | | | | | | | | | | | This corresponds to Unicode version 15.0.0. Added the following scripts: * Kawi * Nag Mundari Full support of these scripts requires harfbuzz version 5.2.0, this version adds support for Unicode 15.0: https://github.com/harfbuzz/harfbuzz/releases/tag/5.2.0 Fixes: QTBUG-106810 Change-Id: Ib06c526e49b0f01ef9f21123bcf875c6b19f2601 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Unicode: Extract EastAsianWidth propertyIevgenii Meshcheriakov2022-05-241-0/+2587
| | | | | | | | | | This property is needed to properly implement the line breaking algorithm from UAX #14. Task-number: QTBUG-97537 Pick-to: 6.3 Change-Id: Ia83cc553c9ef19fae33560721630849d2a95af84 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Update UCD to Revision 28Ievgenii Meshcheriakov2021-10-1816-317/+1989
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This corresponds to Unicode version 14.0.0. Added the following scripts: * CyproMinoan * OldUyghur * Tangsa * Toto * Vithkuqi Full support of these scripts requires harfbuzz version 3.0.0, this version adds support for Unicode 14.0: https://github.com/harfbuzz/harfbuzz/releases/tag/3.0.0 With this release 10 test cases in tst_qurluts46 were fixed, one additional test case is failing in tst_qtextboundaryfinder and is commented out. In total 62 line break test cases and 44 word break test cases are failing. A comment in src/corelib/text/qt_attribution.json was updated to include the URL of the page containing UCD version number. Fixes: QTBUG-94359 Change-Id: Iefc9ff13f3df279f91cbdb1246d56f75b20ecb35 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* unicode: Generate tables for IDNA/UTS #46Ievgenii Meshcheriakov2021-08-261-0/+8727
| | | | | | | | | | | | | | | Update the Unicode data processing tool to generate properties and mapping tables needed to implement UTS #46 (https://unicode.org/reports/tr46/). The implementation extends the standard to allow usage of underscores in URLs. This is done for compatibility with DNS-SD and SMB protocols. The data file needed to generate the new properties was taken from https://www.unicode.org/Public/idna/13.0.0/IdnaMappingTable.txt Task-number: QTBUG-85323 Change-Id: I2c303bf8a08aefb18a7491fb9b55385563bfa219 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Unicode: fix the extended grapheme cluster algorithmGiuseppe D'Angelo2021-04-161-0/+1261
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UAX #29 in Unicode 11 changed the EGC algorithm to its current form. Although Qt has upgraded the Unicode tables all the way up to Unicode 13, the algorithm has never been adapted; in other words, it has been working by chance for years. Luckily, MOST of the cases were dealt with correctly, but emoji handling actually manages to break it. This commit: * Adds parsing of emoji-data.txt into the unicode table generator. That is necessary to extract the Extended_Pictographic property, which is used by the EGC algorithm. * Regenerates the tables. * Removes some obsoleted grapheme cluster break properties, and adds the ones added in the meanwhile. * Rewrites the EGC algorithm according to Unicode 13. This is done by simplifying a lot the lookup table. Some rules (GB11, GB12, GB13) can't be done by the table alone so some hand-rolled code is necessary in that case. * Thanks to these fixes, the complete upstream GraphemeBreakTest now passes. Remove the "edited" version that ignored some rows (because they were failing). Change-Id: Iaa07cb2e6d0ab9deac28397f46d9af189d2edf8b Pick-to: 6.1 6.0 5.15 Fixes: QTBUG-92822 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* Update UCD to Revision 26Edward Welbourne2020-03-1414-182/+1487
| | | | | | | | | | | | | | Include WordBreakTest.html, since a test uses sample strings from it, albeit without actually reading the file. Had to comment out more of the new tests, as at Revision 24, pending an update to harfbuzz and the text boundary detection code. Task-number: QTBUG-79631 Task-number: QTBUG-79418 Task-number: QTBUG-82747 Change-Id: I0082294b09d67ffdc6a9b5c15acf77ad3b86f65f Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Update UCD data to Unicode 12.1.0's Revision 24Edward Welbourne2019-10-3014-596/+2560
| | | | | | | | | | | | | | | | Had to teach the update program to accept category Lm as for Joining_Transparent, for the sake of a new ArabicShaping.txt entry. Added three new Unicode versions, several new scripts and a new word-break class. Updated UCD's test data for tst_QTextBoundaryFinder. This left 57 tests failing; I have commented out the data rows for those tests, pending someone with more knowledge addressing this. Task-number: QTBUG-79631 Task-number: QTBUG-79418 Change-Id: Ic33d3b3551195d47a84d98e84020f57a68f0b201 Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@qt.io>
* Update Text segmentation and line break data to Unicode 10.0Lars Knoll2018-01-034-138/+744
| | | | | | | | Also adjusted the text segmentation and line break algorithms so that they can handle the new data, and pass the test suite. Change-Id: Ib727fd80003e34e96458d7a681996de3fa3691e7 Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@qt.io>
* Update most Unicode data to version 10.0Lars Knoll2018-01-0310-125/+3077
| | | | | | | | | | The text segmentation data is not being updated in this change, as it requires additional code changes. Updating those will come in a follow-up commit. Change-Id: I5d6b6bc96044e8dd0c25cf6f79756e7f68bf6e7c Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com> Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@qt.io>
* Update Unicode data files to v8.0Konstantin Ritt2015-11-0514-287/+2839
| | | | | Change-Id: I0aa368cb07353924031a9af4f0bdc33692eb1053 Reviewed-by: Lars Knoll <lars.knoll@theqtcompany.com>
* Update UCD source files to v7.0Konstantin Ritt2015-03-2714-25478/+7127
| | | | | Change-Id: I47277963c926128ad0c4ac5141835e767bb440a7 Reviewed-by: Lars Knoll <lars.knoll@digia.com>
* Update UCD source files up to Unicode 6.3.0Konstantin Ritt2014-01-1414-110/+381
| | | | | Change-Id: I9ab58a659af1e758b172a24aa95bce1fea89c33d Reviewed-by: Lars Knoll <lars.knoll@digia.com>
* Update the Unicode Data and Algorithms up to Unicode 6.2Konstantin Ritt2012-10-0914-996/+1016
| | | | | | | | | | | | | Version 6.2 of the Unicode Standard is a special release dedicated to the early publication of the newly encoded Turkish lira sign. In addition, there are some significant changes to the Unicode algorithms for text segmentation and line breaking to improve breaking for emoji symbols. For more details, see http://www.unicode.org/versions/Unicode6.2.0/ Change-Id: I21cfd4f307e41b41a19d36cce87f7a44c2661bc2 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Lars Knoll <lars.knoll@digia.com>
* Make QUnicodeTables::script() support SMP code pointsKonstantin Ritt2012-06-142-0/+0
| | | | | | | | | | | | | | | | | | | Instead of expanding the scripts table with script values for the code points >= 0x10000, it has been merged with the properties table in order to increase perfomance of the script itemization code (not affected yet). (Stats: the properties table grew up in 97428-89800 = 7628 bytes; the old scripts table was of size 7680 bytes) The outdated ScriptsInitial.txt and ScriptsCorrections.txt file has been removed (they were just empty, the "corrigendum" script corrections should be applied to Scripts.txt directly, *no customization allowed*!). More script testcases has been added - at least one per supported script. Task-number: QTBUG-6530 Change-Id: I40a9e76f681e2dd552fd4c61af0808d043962e79 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Update the Unicode data files up to v6.1.0Konstantin Ritt2012-06-1014-1317/+24169
| | | | | Change-Id: I20b94634b1f4ebff10757c2348cfdbbd906e8797 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* UCD-5.0: apply Corrigendum #6Konstantin Ritt2012-04-152-25/+14
| | | | | | | | | | | | | http://unicode.org/versions/corrigendum6.html: > in Unicode 5.0, the list of characters with the Bidi_Mirrored property > was made consistent for brackets and quotation marks, in preparation for > new constraints on bidi mirroring. However, after publication of > Unicode 5.0.0 it was discovered that this change adversely affected > several quotation mark characters in deployed data. Task-number: QTBUG-25169 Change-Id: Id49caf401af2d5a1e6dbcc32b2f350aa20b7f901 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Initial import from the monolithic Qt.Qt by Nokia2011-04-2716-0/+47207
This is the beginning of revision history for this module. If you want to look at revision history older than this, please refer to the Qt Git wiki for how to use Git history grafting. At the time of writing, this wiki is located here: http://qt.gitorious.org/qt/pages/GitIntroductionWithQt If you have already performed the grafting and you don't see any history beyond this commit, try running "git log" with the "--follow" argument. Branched from the monolithic repo, Qt master branch, at commit 896db169ea224deb96c59ce8af800d019de63f12