summaryrefslogtreecommitdiffstats
path: root/src/corelib/text/qunicodetables_p.h
Commit message (Collapse)AuthorAgeFilesLines
* Unicode: Extract EastAsianWidth propertyIevgenii Meshcheriakov2022-05-241-1/+15
| | | | | | | | | | This property is needed to properly implement the line breaking algorithm from UAX #14. Task-number: QTBUG-97537 Pick-to: 6.3 Change-Id: Ia83cc553c9ef19fae33560721630849d2a95af84 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Unicode: Remove obsolete word break classesIevgenii Meshcheriakov2022-05-241-4/+0
| | | | | | | | | | Remove E_Base, Glue_After_Zwj, E_Base_GAZ, and E_Modifier obsoleted by UTS #29, version 33 (Unicode 11.0.0). Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: If5dc36ae17cd8746bbe81b73bbcc0863181e4a7a Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Use SPDX license identifiersLucie Gérard2022-05-161-38/+2
| | | | | | | | | | | | | Replace the current license disclaimer in files by a SPDX-License-Identifier. Files that have to be modified by hand are modified. License files are organized under LICENSES directory. Task-number: QTBUG-67283 Change-Id: Id880c92784c40f3bbde861c0d93f58151c18b9f1 Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
* Update UCD to Revision 28Ievgenii Meshcheriakov2021-10-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This corresponds to Unicode version 14.0.0. Added the following scripts: * CyproMinoan * OldUyghur * Tangsa * Toto * Vithkuqi Full support of these scripts requires harfbuzz version 3.0.0, this version adds support for Unicode 14.0: https://github.com/harfbuzz/harfbuzz/releases/tag/3.0.0 With this release 10 test cases in tst_qurluts46 were fixed, one additional test case is failing in tst_qtextboundaryfinder and is commented out. In total 62 line break test cases and 44 word break test cases are failing. A comment in src/corelib/text/qt_attribution.json was updated to include the URL of the page containing UCD version number. Fixes: QTBUG-94359 Change-Id: Iefc9ff13f3df279f91cbdb1246d56f75b20ecb35 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* unicode: Regenerate qunicodetables{.cpp,_p.h}Ievgenii Meshcheriakov2021-09-031-2/+2
| | | | | | | | | | Run unicode utility to regenerate the Unicode tables. This reduces size of the IDNA mapping tables. Adjust the QUrl client code to use the new API. Task-number: QTBUG-85323 Change-Id: Iaa8d6932e611f7aa4009a3fae2972de87b875cf8 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* unicode: Regenerate Unicode tablesIevgenii Meshcheriakov2021-08-261-1/+18
| | | | | | | | | Re-run unicode utility to update the Unicode tables. This adds properties and mappings needed to implement UTS #46 (IDNA). Task-number: QTBUG-85323 Change-Id: Id1de91caddd82095f8f8f2301bfd7bb2ee3fcafd Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Unicode: fix the extended grapheme cluster algorithmGiuseppe D'Angelo2021-04-161-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UAX #29 in Unicode 11 changed the EGC algorithm to its current form. Although Qt has upgraded the Unicode tables all the way up to Unicode 13, the algorithm has never been adapted; in other words, it has been working by chance for years. Luckily, MOST of the cases were dealt with correctly, but emoji handling actually manages to break it. This commit: * Adds parsing of emoji-data.txt into the unicode table generator. That is necessary to extract the Extended_Pictographic property, which is used by the EGC algorithm. * Regenerates the tables. * Removes some obsoleted grapheme cluster break properties, and adds the ones added in the meanwhile. * Rewrites the EGC algorithm according to Unicode 13. This is done by simplifying a lot the lookup table. Some rules (GB11, GB12, GB13) can't be done by the table alone so some hand-rolled code is necessary in that case. * Thanks to these fixes, the complete upstream GraphemeBreakTest now passes. Remove the "edited" version that ignored some rows (because they were failing). Change-Id: Iaa07cb2e6d0ab9deac28397f46d9af189d2edf8b Pick-to: 6.1 6.0 5.15 Fixes: QTBUG-92822 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* Port Q_STATIC_ASSERT(_X) to static_assertGiuseppe D'Angelo2020-06-191-1/+1
| | | | | | | | | | | | | | | | | There is no reason for keep using our macro now that we have C++17. The macro itself is left in for the moment being, as well as its detection logic, because it's needed for C code (not everything supports C11 yet). A few more cleanups will arrive in the next few patches. Note that this is a mere search/replace; some places were using double braces to work around the presence of commas in a macro, no attempt has been done to fix those. tst_qglobal had just some minor changes to keep testing the macro. Change-Id: I1c1c397d9f3e63db3338842bf350c9069ea57639 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* QUnicodeTables: port to charNN_tMarc Mutz2020-04-271-6/+6
| | | | | | | | | | | This makes existing calls passing uint or ushort ambiguous, so fix all the callers. There do not appear to be callers outside QtBase. In fact, the ...BreakClass() functions appear to be utterly unused. Change-Id: I1c2251920beba48d4909650bc1d501375c6a3ecf Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* Update UCD to Revision 26Edward Welbourne2020-03-141-3/+3
| | | | | | | | | | | | | | Include WordBreakTest.html, since a test uses sample strings from it, albeit without actually reading the file. Had to comment out more of the new tests, as at Revision 24, pending an update to harfbuzz and the text boundary detection code. Task-number: QTBUG-79631 Task-number: QTBUG-79418 Task-number: QTBUG-82747 Change-Id: I0082294b09d67ffdc6a9b5c15acf77ad3b86f65f Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Unicode tables: minor prettificationEdward Welbourne2019-11-281-0/+4
| | | | | | | | | | | Put blank lines before the final Num*Classes entries in enums, to set them off visibly from the "real" members. Moved some oddly placed commas to the ends of preceding lines, so that later additions can just add lines (with comma on end) without having to modify the preceding line while doing so. Change-Id: I5188dc25af9e4c17a1882fd9dab070e88013060b Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Add missing docs for UCD additions at 5.15Edward Welbourne2019-11-281-2/+2
| | | | | | | | | Also remove two stray commas pointed out in code-review and some others noticed on checking for similar. This amends commit c3eb521a0f10112df6b61d2592351c4eef2e1f9b. Change-Id: If20c5146b740defe8d25ff61d399031b5c66ded1 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Update UCD data to Unicode 12.1.0's Revision 24Edward Welbourne2019-10-301-3/+4
| | | | | | | | | | | | | | | | Had to teach the update program to accept category Lm as for Joining_Transparent, for the sake of a new ArabicShaping.txt entry. Added three new Unicode versions, several new scripts and a new word-break class. Updated UCD's test data for tst_QTextBoundaryFinder. This left 57 tests failing; I have commented out the data rows for those tests, pending someone with more knowledge addressing this. Task-number: QTBUG-79631 Task-number: QTBUG-79418 Change-Id: Ic33d3b3551195d47a84d98e84020f57a68f0b201 Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@qt.io>
* QUnicodeTables: use array for case folding tablesMarc Mutz2019-09-041-42/+15
| | | | | | | | | | | | | | | | | Instead of four pairs of :1 :15 bit fields, use an array of four :1, :15 structs. This allows to replace the case folding traits classes with a simple enum that indexes into said array. I don't know what the WASM #ifdef'ed code is supposed to effect (a :0 bit-field is only useful to separate adjacent bit-field into separate memory locations for multi-threading), but I thought it safer to leave it in, and that means the array must be a 64-bit block of its own, so I had to move two fields around. Saves ~4.5KiB in text size on optimized GCC 10 LTO Linux AMD64 builds. Change-Id: Ib52cd7706342d5227b50b57545d073829c45da9a Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* QUnicodeTables: pack Properties structMarc Mutz2019-09-041-1/+3
| | | | | | | | | | | | | | | | | | | | | | GCC doesn't like the sequence : 5 : 5 : 8 : 6 : 8 and inserts a :6 padding between the :5 and the :8 and a :2 padding between the :6 and the :8, growing the bitfield by 8 bits of embedded padding and another byte to bring the struct back to sizeof % 2 == 0. Fix by reshuffling the elements and adding a static_assert for the next round. Saves ~5KiB in QtCore executable size. Change-Id: I4758a6f48ba389abc2aee92f60997d42ebb0e5b8 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Move text-related code out of corelib/tools/ to corelib/text/Edward Welbourne2019-07-101-0/+232
This includes byte array, string, char, unicode, locale, collation and regular expressions. Change-Id: I8b125fa52c8c513eb57a0f1298b91910e5a0d786 Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>