summaryrefslogtreecommitdiffstats
path: root/tests/auto/corelib/text/qtextboundaryfinder/data
Commit message (Collapse)AuthorAgeFilesLines
* unicode: Import version 15.1 (UCD version 32)Ievgenii Meshcheriakov2024-02-085-471/+3698
| | | | | | | | | | | | | | | | | | | | | | Add enumerator for the new Unicode version to QChar::UnicodeVersion. Remap new line breaking classes to their Unicode 15.0 values: * AK, AP and AS to AL, * VI and VF to CM. These are classes for new line breaking support for Indic scripts that require more work. Blacklist failing tests for now: * tst_QUrlUts46::idnaTestV2 * tst_QTextBoundaryFinder::lineBoundariesDefault * tst_QTextBoundaryFinder::graphemeBoundariesDefault Regenerate the source files. Task-number: QTBUG-121529 Change-Id: I869cc9fbaa53765d8ae6265c22cdbef9f19d05bf Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Update UCD to Revision 30Ievgenii Meshcheriakov2022-10-115-25/+23
| | | | | | | | | | | | | | | | | | This corresponds to Unicode version 15.0.0. Added the following scripts: * Kawi * Nag Mundari Full support of these scripts requires harfbuzz version 5.2.0, this version adds support for Unicode 15.0: https://github.com/harfbuzz/harfbuzz/releases/tag/5.2.0 Fixes: QTBUG-106810 Change-Id: Ib06c526e49b0f01ef9f21123bcf875c6b19f2601 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* tst_qtextboundaryfinder: Remove full copies of data filesIevgenii Meshcheriakov2022-05-243-9541/+0
| | | | | | | | | | There are no commented out test cases remaining, so the normal test vectors are identical to full test vectors. Fixes: QTBUG-97537 Pick-to: 6.3 Change-Id: I987f178f192e1c8e2d998d36499fdce84f237e77 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix line breaking before open parenthesesIevgenii Meshcheriakov2022-05-241-24/+24
| | | | | | | | | | | | | | UAX #14, revision 45 (Unicode 13) has changed rule LB30 to only trigger if the open parentheses is non-wide: (AL | HL | NU) × [OP-[\p{ea=F}\p{ea=W}\p{ea=H}]] This fixes the remaining 24 line break tests. Task-number: QTBUG-97537 Pick-to: 6.3 Change-Id: I9870588c04bf0f6ae0a98289739bef8490f67f69 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix line breaking for potential emojisIevgenii Meshcheriakov2022-05-241-1/+1
| | | | | | | | | | | | | | Implement part of LB30b introduced by UAX #14, revision 47 (Unicode 14.0.0): [\p{Extended_Pictographic}&\p{Cn}] × EM This fixes one line breaking test. Task-number: QTBUG-97537 Pick-to: 6.3 Change-Id: I3fd2372a057b7391d8846e9c146f69a54686ea61 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix interactions of WB3d and WB4 rulesIevgenii Meshcheriakov2022-05-241-1/+1
| | | | | | | | | | | Word breaking rule WB3d should not be affected by WB4. This fixes the remaining word break test. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I99aee831d7c54fafcd2a9d526a3e078b12c5bfad Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Handle WB3c word break ruleIevgenii Meshcheriakov2022-05-241-9/+9
| | | | | | | | | | | | | | | Adjust handling of WB3c rule to UAX #29, revision 33 (Unicode 11.0.0). The rule reads: ZWJ × \p{Extended_Pictographic} This fixes 9 word break tests. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I818d4048828e6663d5c090aa372d83f5099fdffe Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Adjust properties of WSegSpace word break classIevgenii Meshcheriakov2022-05-241-34/+34
| | | | | | | | | | | | | | | | | | Disable break between sequences of WSegSpace characters (rule WB3d, introduced in UAX #29, version 33, Unicode 11.0.0). Also disable breaks between WSegSpace and (Extend | Format | ZWJ) due to rule WB4. Adjust "words4" test to take the above changes into account (space character belongs to WSegSpace). Mention the full class name in a comment inside the word break table. This fixes 34 word break tests. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I7dfe8367e45c86913bb7d7fe2adb053711978487 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix handling of LB22 line break ruleIevgenii Meshcheriakov2022-05-241-28/+28
| | | | | | | | | | | | | | This rule was simplified in version UTS #14 version 45 (Unicode 13.0.0) to read: × IN Re-enabled 28 fixed line break tests. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I1c5565a8c1633428c22379917215d4e424ff0055 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix handling of ZWJ for line breaksIevgenii Meshcheriakov2022-05-241-9/+9
| | | | | | | | | | | | | | | Adjust implementation of rule LB8a of UAX #14. The rule was changed in version 41 (corresponding to Unicode 11.0.0): ZWJ × (ID | EB | EM) ⇒ ZWJ × Fixing this rule fixes 9 line break tests. Those are re-enabled. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I1570719590a46ae28c98ed7d5053e72b12915db7 Reviewed-by: Øystein Heskestad <oystein.heskestad@qt.io> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Update UCD to Revision 28Ievgenii Meshcheriakov2021-10-187-73/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This corresponds to Unicode version 14.0.0. Added the following scripts: * CyproMinoan * OldUyghur * Tangsa * Toto * Vithkuqi Full support of these scripts requires harfbuzz version 3.0.0, this version adds support for Unicode 14.0: https://github.com/harfbuzz/harfbuzz/releases/tag/3.0.0 With this release 10 test cases in tst_qurluts46 were fixed, one additional test case is failing in tst_qtextboundaryfinder and is commented out. In total 62 line break test cases and 44 word break test cases are failing. A comment in src/corelib/text/qt_attribution.json was updated to include the URL of the page containing UCD version number. Fixes: QTBUG-94359 Change-Id: Iefc9ff13f3df279f91cbdb1246d56f75b20ecb35 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Unicode: fix the extended grapheme cluster algorithmGiuseppe D'Angelo2021-04-162-634/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UAX #29 in Unicode 11 changed the EGC algorithm to its current form. Although Qt has upgraded the Unicode tables all the way up to Unicode 13, the algorithm has never been adapted; in other words, it has been working by chance for years. Luckily, MOST of the cases were dealt with correctly, but emoji handling actually manages to break it. This commit: * Adds parsing of emoji-data.txt into the unicode table generator. That is necessary to extract the Extended_Pictographic property, which is used by the EGC algorithm. * Regenerates the tables. * Removes some obsoleted grapheme cluster break properties, and adds the ones added in the meanwhile. * Rewrites the EGC algorithm according to Unicode 13. This is done by simplifying a lot the lookup table. Some rules (GB11, GB12, GB13) can't be done by the table alone so some hand-rolled code is necessary in that case. * Thanks to these fixes, the complete upstream GraphemeBreakTest now passes. Remove the "edited" version that ignored some rows (because they were failing). Change-Id: Iaa07cb2e6d0ab9deac28397f46d9af189d2edf8b Pick-to: 6.1 6.0 5.15 Fixes: QTBUG-92822 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* Update UCD to Revision 26Edward Welbourne2020-03-148-1666/+2587
| | | | | | | | | | | | | | Include WordBreakTest.html, since a test uses sample strings from it, albeit without actually reading the file. Had to comment out more of the new tests, as at Revision 24, pending an update to harfbuzz and the text boundary detection code. Task-number: QTBUG-79631 Task-number: QTBUG-79418 Task-number: QTBUG-82747 Change-Id: I0082294b09d67ffdc6a9b5c15acf77ad3b86f65f Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Update UCD data to Unicode 12.1.0's Revision 24Edward Welbourne2019-10-308-1276/+10651
| | | | | | | | | | | | | | | | Had to teach the update program to accept category Lm as for Joining_Transparent, for the sake of a new ArabicShaping.txt entry. Added three new Unicode versions, several new scripts and a new word-break class. Updated UCD's test data for tst_QTextBoundaryFinder. This left 57 tests failing; I have commented out the data rows for those tests, pending someone with more knowledge addressing this. Task-number: QTBUG-79631 Task-number: QTBUG-79418 Change-Id: Ic33d3b3551195d47a84d98e84020f57a68f0b201 Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@qt.io>
* Move text-related code out of corelib/tools/ to corelib/text/Edward Welbourne2019-07-104-0/+10809
This includes byte array, string, char, unicode, locale, collation and regular expressions. Change-Id: I8b125fa52c8c513eb57a0f1298b91910e5a0d786 Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>