| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The existing data comes under Unicode-DFS-2016 but future updates
shall come under Unicode-3.0, so update the existing headers with the
former and the generator script with the latter. Leave a note in the
attribution file about this transitional state and how to resolve it.
Replaced UNICODE_LICENSE.txt from src/corelib/text/ with
LICENSES/Unicode-DFS-2016.txt, as fetched using reuse download.
This doesn't look like a rename but only actually adds some irrelevant
lines about where on the Unicode website the upstream files (to which
we do not apply this license) come from and changes some spacing.
Pick-to: 6.7 6.5
Fixes: QTBUG-121653
Change-Id: I50c9f4badc77a9aa402af946561aff58ae9e3e7a
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
Reviewed-by: Kai Köhne <kai.koehne@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new rules were added in Unicode 15.1 (TR #14, revision 51).
The rules read:
LB15a: (sot | BK | CR | LF | NL | OP | QU | GL | SP | ZW)
[\p{Pi}&QU] SP* ×
LB15b: × [\p{Pf}&QU] (SP | GL | WJ | CL | QU | CP | EX
| IS | SY | BK | CR | LF | NL | ZW | eot)
Add two new line breaking classes LineBreak_QU_Pi and _QU_Pf to
represent quotation characters with context that matches left
side of LB15a and right side of LB15b respectively. This way
it is still possible to use the line breaking classes table.
Also add a coment about the original source of the line
break table.
Task-number: QTBUG-121529
Change-Id: Ib35f400e39e76819cd1c3299691f7b040ea37178
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add enumerator for the new Unicode version to QChar::UnicodeVersion.
Remap new line breaking classes to their Unicode 15.0 values:
* AK, AP and AS to AL,
* VI and VF to CM.
These are classes for new line breaking support for Indic scripts
that require more work.
Blacklist failing tests for now:
* tst_QUrlUts46::idnaTestV2
* tst_QTextBoundaryFinder::lineBoundariesDefault
* tst_QTextBoundaryFinder::graphemeBoundariesDefault
Regenerate the source files.
Task-number: QTBUG-121529
Change-Id: I869cc9fbaa53765d8ae6265c22cdbef9f19d05bf
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
|
|
|
|
|
|
|
|
|
|
| |
This amends c4e550703c2bdc1ee710507b8df9c0c9a118402e. The data version
update was just forgotten when updating to Unicode 15.0.
Pick-to: 6.5 6.6 6.7
Change-Id: Ibb3e9cb81e9bbcb5d4aaf4e4df6231485531c128
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This corresponds to Unicode version 15.0.0.
Added the following scripts:
* Kawi
* Nag Mundari
Full support of these scripts requires harfbuzz version 5.2.0,
this version adds support for Unicode 15.0:
https://github.com/harfbuzz/harfbuzz/releases/tag/5.2.0
Fixes: QTBUG-106810
Change-Id: Ib06c526e49b0f01ef9f21123bcf875c6b19f2601
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
|
|
|
|
|
|
|
|
| |
Task-number: QTBUG-100485
Pick-to: 6.3 6.2
Change-Id: I41480a34b14fd86a68a5c10b7e0f3d250e785d0f
Reviewed-by: Marc Mutz <marc.mutz@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
|
|
|
|
|
|
|
|
|
|
| |
This property is needed to properly implement the line breaking
algorithm from UAX #14.
Task-number: QTBUG-97537
Pick-to: 6.3
Change-Id: Ia83cc553c9ef19fae33560721630849d2a95af84
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
|
|
|
|
|
|
|
|
|
|
| |
Remove E_Base, Glue_After_Zwj, E_Base_GAZ, and E_Modifier obsoleted by
UTS #29, version 33 (Unicode 11.0.0).
Task-number: QTBUG-97537
Pick-to: 6.2 6.3
Change-Id: If5dc36ae17cd8746bbe81b73bbcc0863181e4a7a
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the current license disclaimer in files by
a SPDX-License-Identifier.
Files that have to be modified by hand are modified.
License files are organized under LICENSES directory.
Task-number: QTBUG-67283
Change-Id: Id880c92784c40f3bbde861c0d93f58151c18b9f1
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This corresponds to Unicode version 14.0.0.
Added the following scripts:
* CyproMinoan
* OldUyghur
* Tangsa
* Toto
* Vithkuqi
Full support of these scripts requires harfbuzz version 3.0.0,
this version adds support for Unicode 14.0:
https://github.com/harfbuzz/harfbuzz/releases/tag/3.0.0
With this release 10 test cases in tst_qurluts46 were fixed, one
additional test case is failing in tst_qtextboundaryfinder and
is commented out. In total 62 line break test cases and 44 word
break test cases are failing.
A comment in src/corelib/text/qt_attribution.json was updated to
include the URL of the page containing UCD version number.
Fixes: QTBUG-94359
Change-Id: Iefc9ff13f3df279f91cbdb1246d56f75b20ecb35
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
|
|
|
|
|
|
|
|
|
|
| |
Run unicode utility to regenerate the Unicode tables. This reduces
size of the IDNA mapping tables. Adjust the QUrl client code to use
the new API.
Task-number: QTBUG-85323
Change-Id: Iaa8d6932e611f7aa4009a3fae2972de87b875cf8
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
|
|
|
|
|
|
|
|
|
| |
Re-run unicode utility to update the Unicode tables. This adds
properties and mappings needed to implement UTS #46 (IDNA).
Task-number: QTBUG-85323
Change-Id: Id1de91caddd82095f8f8f2301bfd7bb2ee3fcafd
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UAX #29 in Unicode 11 changed the EGC algorithm to its current form.
Although Qt has upgraded the Unicode tables all the way up to
Unicode 13, the algorithm has never been adapted; in other words,
it has been working by chance for years. Luckily, MOST
of the cases were dealt with correctly, but emoji handling
actually manages to break it.
This commit:
* Adds parsing of emoji-data.txt into the unicode table generator.
That is necessary to extract the Extended_Pictographic property,
which is used by the EGC algorithm.
* Regenerates the tables.
* Removes some obsoleted grapheme cluster break properties, and
adds the ones added in the meanwhile.
* Rewrites the EGC algorithm according to Unicode 13. This is
done by simplifying a lot the lookup table. Some rules (GB11,
GB12, GB13) can't be done by the table alone so some hand-rolled
code is necessary in that case.
* Thanks to these fixes, the complete upstream GraphemeBreakTest
now passes. Remove the "edited" version that ignored some rows
(because they were failing).
Change-Id: Iaa07cb2e6d0ab9deac28397f46d9af189d2edf8b
Pick-to: 6.1 6.0 5.15
Fixes: QTBUG-92822
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Unicode table code can only be safely called on valid code-points.
So code that calls it must only pass it valid Unicode data. The string
iterator's Unchecked Unchecked methods only provide this guarantee
when the string being iterated is guaranteed to be valid UTF-16; while
client code should only use QString, QStringView and friends on valid
UTF-16 data, we have no way to be sure they have respected that.
So take the few extra cycles to actually check validity in the course
of iterating strings, when the resulting code-points are to be passed
to the Unicode table look-ups. Add tests that case mapping doesn't
access Unicode tables out of range (it'll trigger the new assertion).
Added some comments to qchar.h that helped me understand surrogates.
Change-Id: Iec2c3106bf1a875bdaa1d622f6cf94d7007e281e
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
|
|
|
|
|
|
|
|
| |
They were only used by one function each, in unicodetables.cpp, so
don't need to be macros.
Change-Id: I3e7f9f661568862d0a0d265bb8f657a8e0782b13
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
|
|
|
|
|
|
|
|
|
|
| |
Eliminate some needless parentheses, tidy up some spacing and
indentation and split some long lines. Change first += after
declaration to initializer.
Change-Id: I05ff2a6337b7ed14e0a2dc9c03fc784c92b63515
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are (at least) two implementations of the low-level case-folding
algorithm, one of which (for QChar::toLower()) seems to be wrong (it
doesn't deal with special cases which expand to more than one code
point).
The algoithm hidden in QString and entangled with the QString
detaching code makes reusing the code much harder.
At the same time, the dependency of the algorithm on the unicode
tables makes exposing a non-allocating result type in the public API
hard. std::u16string would be an alternative if we can assure that all
implementations use SSO with at least four characters.
So, for the time being, leave this as internal API for use in an
upcoming QStringView::toLower() as well as case-insensitive hashing.
Change-Id: Iabb2611846f6176776aa20e634f44d8464f3305c
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This makes existing calls passing uint or ushort ambiguous, so
fix all the callers. There do not appear to be callers outside
QtBase. In fact, the ...BreakClass() functions appear to be
utterly unused.
Change-Id: I1c2251920beba48d4909650bc1d501375c6a3ecf
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the standard gives us proper types for UTF-16 and UTF-32
characters, use them. Will eventually make the code much easier to
read than today, where uint could be an index as well as a char32_t.
It also ensures that the result of e.g. QChar::highSurrogate() can
still be implicitly converted to a QChar now that the
QChar(non-characater-integral-types) ctors are being made explicit.
[ChangeLog][QtCore][QChar] All low-level functions
(e.g. highSurrogate()) now take and return char16_t instead of ushort
and char32_t instead of uint.
Change-Id: I9cd8ebf6fb998fe1075dae96c7c4484a057f0b91
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Include WordBreakTest.html, since a test uses sample strings from it,
albeit without actually reading the file.
Had to comment out more of the new tests, as at Revision 24, pending
an update to harfbuzz and the text boundary detection code.
Task-number: QTBUG-79631
Task-number: QTBUG-79418
Task-number: QTBUG-82747
Change-Id: I0082294b09d67ffdc6a9b5c15acf77ad3b86f65f
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Had to teach the update program to accept category Lm as for
Joining_Transparent, for the sake of a new ArabicShaping.txt entry.
Added three new Unicode versions, several new scripts and a new
word-break class.
Updated UCD's test data for tst_QTextBoundaryFinder. This left 57
tests failing; I have commented out the data rows for those tests,
pending someone with more knowledge addressing this.
Task-number: QTBUG-79631
Task-number: QTBUG-79418
Change-Id: Ic33d3b3551195d47a84d98e84020f57a68f0b201
Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of four pairs of :1 :15 bit fields, use an array of four :1,
:15 structs. This allows to replace the case folding traits classes
with a simple enum that indexes into said array.
I don't know what the WASM #ifdef'ed code is supposed to effect (a :0
bit-field is only useful to separate adjacent bit-field into separate
memory locations for multi-threading), but I thought it safer to leave
it in, and that means the array must be a 64-bit block of its own, so
I had to move two fields around.
Saves ~4.5KiB in text size on optimized GCC 10 LTO Linux AMD64 builds.
Change-Id: Ib52cd7706342d5227b50b57545d073829c45da9a
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCC doesn't like the sequence
: 5
: 5
: 8
: 6
: 8
and inserts a :6 padding between the :5 and the :8 and a :2 padding
between the :6 and the :8, growing the bitfield by 8 bits of embedded
padding and another byte to bring the struct back to sizeof % 2 == 0.
Fix by reshuffling the elements and adding a static_assert for the
next round.
Saves ~5KiB in QtCore executable size.
Change-Id: I4758a6f48ba389abc2aee92f60997d42ebb0e5b8
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
|
|
This includes byte array, string, char, unicode, locale, collation and
regular expressions.
Change-Id: I8b125fa52c8c513eb57a0f1298b91910e5a0d786
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
|