summaryrefslogtreecommitdiffstats
path: root/tests/auto/corelib/text/qtextboundaryfinder
Commit message (Collapse)AuthorAgeFilesLines
* tst_qtextboundaryfinder: Remove full copies of data filesIevgenii Meshcheriakov2022-05-243-9541/+0
| | | | | | | | | | There are no commented out test cases remaining, so the normal test vectors are identical to full test vectors. Fixes: QTBUG-97537 Pick-to: 6.3 Change-Id: I987f178f192e1c8e2d998d36499fdce84f237e77 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix line breaking before open parenthesesIevgenii Meshcheriakov2022-05-241-24/+24
| | | | | | | | | | | | | | UAX #14, revision 45 (Unicode 13) has changed rule LB30 to only trigger if the open parentheses is non-wide: (AL | HL | NU) × [OP-[\p{ea=F}\p{ea=W}\p{ea=H}]] This fixes the remaining 24 line break tests. Task-number: QTBUG-97537 Pick-to: 6.3 Change-Id: I9870588c04bf0f6ae0a98289739bef8490f67f69 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix line breaking for potential emojisIevgenii Meshcheriakov2022-05-241-1/+1
| | | | | | | | | | | | | | Implement part of LB30b introduced by UAX #14, revision 47 (Unicode 14.0.0): [\p{Extended_Pictographic}&\p{Cn}] × EM This fixes one line breaking test. Task-number: QTBUG-97537 Pick-to: 6.3 Change-Id: I3fd2372a057b7391d8846e9c146f69a54686ea61 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix interactions of WB3d and WB4 rulesIevgenii Meshcheriakov2022-05-241-1/+1
| | | | | | | | | | | Word breaking rule WB3d should not be affected by WB4. This fixes the remaining word break test. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I99aee831d7c54fafcd2a9d526a3e078b12c5bfad Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Handle WB3c word break ruleIevgenii Meshcheriakov2022-05-241-9/+9
| | | | | | | | | | | | | | | Adjust handling of WB3c rule to UAX #29, revision 33 (Unicode 11.0.0). The rule reads: ZWJ × \p{Extended_Pictographic} This fixes 9 word break tests. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I818d4048828e6663d5c090aa372d83f5099fdffe Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Adjust properties of WSegSpace word break classIevgenii Meshcheriakov2022-05-242-37/+37
| | | | | | | | | | | | | | | | | | Disable break between sequences of WSegSpace characters (rule WB3d, introduced in UAX #29, version 33, Unicode 11.0.0). Also disable breaks between WSegSpace and (Extend | Format | ZWJ) due to rule WB4. Adjust "words4" test to take the above changes into account (space character belongs to WSegSpace). Mention the full class name in a comment inside the word break table. This fixes 34 word break tests. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I7dfe8367e45c86913bb7d7fe2adb053711978487 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix handling of LB22 line break ruleIevgenii Meshcheriakov2022-05-241-28/+28
| | | | | | | | | | | | | | This rule was simplified in version UTS #14 version 45 (Unicode 13.0.0) to read: × IN Re-enabled 28 fixed line break tests. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I1c5565a8c1633428c22379917215d4e424ff0055 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix handling of ZWJ for line breaksIevgenii Meshcheriakov2022-05-241-9/+9
| | | | | | | | | | | | | | | Adjust implementation of rule LB8a of UAX #14. The rule was changed in version 41 (corresponding to Unicode 11.0.0): ZWJ × (ID | EB | EM) ⇒ ZWJ × Fixing this rule fixes 9 line break tests. Those are re-enabled. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I1570719590a46ae28c98ed7d5053e72b12915db7 Reviewed-by: Øystein Heskestad <oystein.heskestad@qt.io> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Use SPDX license identifiersLucie Gérard2022-05-161-27/+2
| | | | | | | | | | | | | Replace the current license disclaimer in files by a SPDX-License-Identifier. Files that have to be modified by hand are modified. License files are organized under LICENSES directory. Task-number: QTBUG-67283 Change-Id: Id880c92784c40f3bbde861c0d93f58151c18b9f1 Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
* Cleanup tests that add test data to resources explicitlyAlexey Edelev2022-02-111-21/+4
| | | | | | | | | | | | Remove Integrity and Android specific code that explicitly adds test data to the resource files. qt_internal_add_test functions implicitly adds test data to resources for Android and Integrity platforms by default. Change-Id: Ia1d58755b47442e1953462e38606f70fec262368 Reviewed-by: Assam Boudjelthia <assam.boudjelthia@qt.io> Reviewed-by: Alexandru Croitor <alexandru.croitor@qt.io> Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
* Remove unused .qrc filesJoerg Bornemann2022-01-171-8/+0
| | | | | | | | Task-number: QTBUG-94446 Change-Id: I136d8b4ab070a832866aa50b5701fc6bd863df8a Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Alexey Edelev <alexey.edelev@qt.io> Reviewed-by: Alexandru Croitor <alexandru.croitor@qt.io>
* Update UCD to Revision 28Ievgenii Meshcheriakov2021-10-187-73/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This corresponds to Unicode version 14.0.0. Added the following scripts: * CyproMinoan * OldUyghur * Tangsa * Toto * Vithkuqi Full support of these scripts requires harfbuzz version 3.0.0, this version adds support for Unicode 14.0: https://github.com/harfbuzz/harfbuzz/releases/tag/3.0.0 With this release 10 test cases in tst_qurluts46 were fixed, one additional test case is failing in tst_qtextboundaryfinder and is commented out. In total 62 line break test cases and 44 word break test cases are failing. A comment in src/corelib/text/qt_attribution.json was updated to include the URL of the page containing UCD version number. Fixes: QTBUG-94359 Change-Id: Iefc9ff13f3df279f91cbdb1246d56f75b20ecb35 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Remove conditioning on Android embeddedEdward Welbourne2021-09-171-1/+1
| | | | | | | | It is no longer handled separately from Android. This effectively reverts commit 6d50f746fe05a7008b63818e77784dd0c99270a1 Change-Id: Ic2d75b8c5a09895810913311ab2fe3355d4d2983 Reviewed-by: Assam Boudjelthia <assam.boudjelthia@qt.io>
* Unicode: fix the grapheme clustering algorithmGiuseppe D'Angelo2021-08-241-0/+101
| | | | | | | | | | | | | | | | | | | An oversight in the code kept the algorithm in the GB11 state, even if the codepoint that is being processed wouldn't allow for that (for instance a sequence of ExtPic, Ext and Any). Refactor the code of GB11/GB12/GB13 to deal with code points that break the sequences (falling back to "normal" handling). Add some manual tests; interestingly enough, the failing cases are not covered by Unicode's tests, as we now pass the entire test suite. Amends a794c5e287381bd056008b20ae55f9b1e0acf138. Fixes: QTBUG-94951 Pick-to: 6.1 5.15 Change-Id: If987d5ccf7c6b13de36d049b1b3d88a3c4b6dd00 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Unicode: fix the extended grapheme cluster algorithmGiuseppe D'Angelo2021-04-162-634/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UAX #29 in Unicode 11 changed the EGC algorithm to its current form. Although Qt has upgraded the Unicode tables all the way up to Unicode 13, the algorithm has never been adapted; in other words, it has been working by chance for years. Luckily, MOST of the cases were dealt with correctly, but emoji handling actually manages to break it. This commit: * Adds parsing of emoji-data.txt into the unicode table generator. That is necessary to extract the Extended_Pictographic property, which is used by the EGC algorithm. * Regenerates the tables. * Removes some obsoleted grapheme cluster break properties, and adds the ones added in the meanwhile. * Rewrites the EGC algorithm according to Unicode 13. This is done by simplifying a lot the lookup table. Some rules (GB11, GB12, GB13) can't be done by the table alone so some hand-rolled code is necessary in that case. * Thanks to these fixes, the complete upstream GraphemeBreakTest now passes. Remove the "edited" version that ignored some rows (because they were failing). Change-Id: Iaa07cb2e6d0ab9deac28397f46d9af189d2edf8b Pick-to: 6.1 6.0 5.15 Fixes: QTBUG-92822 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* Remove the qmake project filesJoerg Bornemann2021-01-071-11/+0
| | | | | | | | | | | | | | | | Remove the qmake project files for most of Qt. Leave the qmake project files for examples, because we still test those in the CI to ensure qmake does not regress. Also leave the qmake project files for utils and other minor parts that lack CMake project files. Task-number: QTBUG-88742 Change-Id: I6cdf059e6204816f617f9624f3ea9822703f73cc Reviewed-by: Edward Welbourne <edward.welbourne@qt.io> Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Kai Koehne <kai.koehne@qt.io>
* Replace QtTest headers with QTestDavid Skoland2020-12-221-1/+2
| | | | | | | | | | | Complete search and replace of QtTest and QtTest/QtTest with QTest, as QtTest includes the whole module. Replace all such instances with correct header includes. See Jira task for more discussion. Fixes: QTBUG-88831 Change-Id: I981cfae18a1cabcabcabee376016b086d9d01f44 Pick-to: 6.0 Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* QChar: make construction from integral explicitGiuseppe D'Angelo2020-11-151-25/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | QChar should not be convertible from any integral type except from char16_t, short and possibly char (since it's a direct superset). David provided the perfect example: if (str == 123) { ~~~ } compiles, with 123 implicitly converted to QChar (str == "123" was meant instead). But similarly one can construct other scenarios where QString(123) gets accidentally used (instead of QString::number(123)), like QString s; s += 123;. Add a macro to revert to the implicit constructors, for backwards compatibility. The breaks are mostly in tests that "abuse" of integers (arithmetic, etc.). Maybe it's time for user-defined literals for QChar/QString, but that is left for another commit. [ChangeLog][Potentially Source-Incompatible Changes][QChar] QChar constructors from integral types are now by default explicit. It is recommended to use explicit conversions, QLatin1Char, QChar::fromUcs4 instead of implicit conversions. The old behavior can be restored by defining the QT_IMPLICIT_QCHAR_CONSTRUCTION macro. Change-Id: I6175f6ab9bcf1956f6f97ab0c9d9d5aaf777296d Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io> Reviewed-by: Tor Arne Vestbø <tor.arne.vestbo@qt.io>
* CMake: Regenerate projects to use new qt_internal_ APIAlexandru Croitor2020-09-231-2/+2
| | | | | | | | | | | Modify special case locations to use the new API as well. Clean up some stale .prev files that are not needed anymore. Clean up some project files that are not used anymore. Task-number: QTBUG-86815 Change-Id: I9947da921f98686023c6bb053dfcc101851276b5 Reviewed-by: Joerg Bornemann <joerg.bornemann@qt.io> Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
* Clean up QTextBoundaryFinder and qunicodetoolsLars Knoll2020-09-071-2/+0
| | | | | | | | | | | Make QTBF ready for Qt6 by using qsizetype in the API and use QStringView where it makes sense. Change the exported API of qunicodetools to use QStringView as well and use char16_t internally. Change-Id: I853537bcabf40546a8e60fdf2ee7d751bc371761 Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
* Remove QTextCodec dependency from testLars Knoll2020-05-141-5/+1
| | | | | Change-Id: Ie546065c3179d475df46b284ca7df502c4465b93 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Port qtbase/tests/auto/corelib/text tests to CMakeSona Kurazyan2020-04-271-0/+34
| | | | | | Task-number: QTBUG-78220 Change-Id: I497da6ed489854bdee5a1ead9a3f34118c78d001 Reviewed-by: Alexandru Croitor <alexandru.croitor@qt.io>
* Update UCD to Revision 26Edward Welbourne2020-03-148-1666/+2587
| | | | | | | | | | | | | | Include WordBreakTest.html, since a test uses sample strings from it, albeit without actually reading the file. Had to comment out more of the new tests, as at Revision 24, pending an update to harfbuzz and the text boundary detection code. Task-number: QTBUG-79631 Task-number: QTBUG-79418 Task-number: QTBUG-82747 Change-Id: I0082294b09d67ffdc6a9b5c15acf77ad3b86f65f Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Core: Use Qt::SplitBehavior in preference to QString::SplitBehaviorEdward Welbourne2020-02-281-2/+2
| | | | | | | | The Qt version was added in 5.14 "for use as eventual replacement for QString::SplitBehavior." Move another step closer to that goal. Change-Id: I446f9ddc8f8de4a0b79b09edb44f7c1496fbc33f Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Update UCD data to Unicode 12.1.0's Revision 24Edward Welbourne2019-10-308-1276/+10651
| | | | | | | | | | | | | | | | Had to teach the update program to accept category Lm as for Joining_Transparent, for the sake of a new ArabicShaping.txt entry. Added three new Unicode versions, several new scripts and a new word-break class. Updated UCD's test data for tst_QTextBoundaryFinder. This left 57 tests failing; I have commented out the data rows for those tests, pending someone with more knowledge addressing this. Task-number: QTBUG-79631 Task-number: QTBUG-79418 Change-Id: Ic33d3b3551195d47a84d98e84020f57a68f0b201 Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@qt.io>
* Move text-related code out of corelib/tools/ to corelib/text/Edward Welbourne2019-07-108-0/+11684
This includes byte array, string, char, unicode, locale, collation and regular expressions. Change-Id: I8b125fa52c8c513eb57a0f1298b91910e5a0d786 Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>