summaryrefslogtreecommitdiffstats
path: root/util/unicode/README
Commit message (Collapse)AuthorAgeFilesLines
* tests: Add test for UTS #46 implementation using Unicode dataIevgenii Meshcheriakov2021-09-011-0/+2
| | | | | | | | | | | | The new test is called tst_qurluts46. It verifies QUrl::{to,from}Ace() functionality using the data from IdnaTestV2.txt supplied by Unicode. The file was downloaded from https://www.unicode.org/Public/idna/13.0.0/IdnaTestV2.txt Task-Id: QTBUG-85371 Change-Id: I4c6a4942ef6018dafc90cb84ef73f6b2614566d7 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* unicode: Generate tables for IDNA/UTS #46Ievgenii Meshcheriakov2021-08-261-0/+2
| | | | | | | | | | | | | | | Update the Unicode data processing tool to generate properties and mapping tables needed to implement UTS #46 (https://unicode.org/reports/tr46/). The implementation extends the standard to allow usage of underscores in URLs. This is done for compatibility with DNS-SD and SMB protocols. The data file needed to generate the new properties was taken from https://www.unicode.org/Public/idna/13.0.0/IdnaMappingTable.txt Task-number: QTBUG-85323 Change-Id: I2c303bf8a08aefb18a7491fb9b55385563bfa219 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Unicode: fix the extended grapheme cluster algorithmGiuseppe D'Angelo2021-04-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UAX #29 in Unicode 11 changed the EGC algorithm to its current form. Although Qt has upgraded the Unicode tables all the way up to Unicode 13, the algorithm has never been adapted; in other words, it has been working by chance for years. Luckily, MOST of the cases were dealt with correctly, but emoji handling actually manages to break it. This commit: * Adds parsing of emoji-data.txt into the unicode table generator. That is necessary to extract the Extended_Pictographic property, which is used by the EGC algorithm. * Regenerates the tables. * Removes some obsoleted grapheme cluster break properties, and adds the ones added in the meanwhile. * Rewrites the EGC algorithm according to Unicode 13. This is done by simplifying a lot the lookup table. Some rules (GB11, GB12, GB13) can't be done by the table alone so some hand-rolled code is necessary in that case. * Thanks to these fixes, the complete upstream GraphemeBreakTest now passes. Remove the "edited" version that ignored some rows (because they were failing). Change-Id: Iaa07cb2e6d0ab9deac28397f46d9af189d2edf8b Pick-to: 6.1 6.0 5.15 Fixes: QTBUG-92822 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* Update UCD to Revision 26Edward Welbourne2020-03-141-5/+7
| | | | | | | | | | | | | | Include WordBreakTest.html, since a test uses sample strings from it, albeit without actually reading the file. Had to comment out more of the new tests, as at Revision 24, pending an update to harfbuzz and the text boundary detection code. Task-number: QTBUG-79631 Task-number: QTBUG-79418 Task-number: QTBUG-82747 Change-Id: I0082294b09d67ffdc6a9b5c15acf77ad3b86f65f Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Add missing docs for UCD additions at 5.15Edward Welbourne2019-11-281-0/+3
| | | | | | | | | Also remove two stray commas pointed out in code-review and some others noticed on checking for similar. This amends commit c3eb521a0f10112df6b61d2592351c4eef2e1f9b. Change-Id: If20c5146b740defe8d25ff61d399031b5c66ded1 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Update UCD data to Unicode 12.1.0's Revision 24Edward Welbourne2019-10-301-7/+22
| | | | | | | | | | | | | | | | Had to teach the update program to accept category Lm as for Joining_Transparent, for the sake of a new ArabicShaping.txt entry. Added three new Unicode versions, several new scripts and a new word-break class. Updated UCD's test data for tst_QTextBoundaryFinder. This left 57 tests failing; I have commented out the data rows for those tests, pending someone with more knowledge addressing this. Task-number: QTBUG-79631 Task-number: QTBUG-79418 Change-Id: Ic33d3b3551195d47a84d98e84020f57a68f0b201 Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@qt.io>
* Correct some references to corelib/tools/ to say corelib/text/Edward Welbourne2019-10-251-2/+2
| | | | | | | | | | The Unicode data tables moved with QString and friends. So did the locale data generated from CLDR. This amends commit a9aa206b7b8ac4e69f8c46233b4080e00e845ff5. Change-Id: If12f0420b559dcb78993adc00e9f39751bca684a Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Move text-related code out of corelib/tools/ to corelib/text/Edward Welbourne2019-07-101-1/+1
| | | | | | | | This includes byte array, string, char, unicode, locale, collation and regular expressions. Change-Id: I8b125fa52c8c513eb57a0f1298b91910e5a0d786 Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Clean up and update Unicode character data 3rd-party infrastructureEdward Welbourne2018-11-111-0/+31
| | | | | | | | | | | | | | Document how to do an update, fix the bit-rot that had crept into main.cpp since last it was compiled, correct the qt_attribution.json to use the actual version number of UCD (its Revision number) instead of the (admittedly correlated) Unicode release number. Updated to Release 22 (which came with Unicode 11.0.0) in the process; but this doesn't change our actual qunicodetables.cpp (so is incidental). Task-number: QTBUG-71281 Change-Id: Ieb7a6e1a4d49f639993f76ff82c8f12a572db3c3 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Initial import from the monolithic Qt.Qt by Nokia2011-04-271-0/+1
This is the beginning of revision history for this module. If you want to look at revision history older than this, please refer to the Qt Git wiki for how to use Git history grafting. At the time of writing, this wiki is located here: http://qt.gitorious.org/qt/pages/GitIntroductionWithQt If you have already performed the grafting and you don't see any history beyond this commit, try running "git log" with the "--follow" argument. Branched from the monolithic repo, Qt master branch, at commit 896db169ea224deb96c59ce8af800d019de63f12