summaryrefslogtreecommitdiffstats
path: root/src/corelib/tools/qunicodetools.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Move Unicode script itemization code from text engine to UnicodeToolsKonstantin Ritt2013-03-141-0/+45
| | | | | | | | | | | This is still the same trivial implementation with the only difference in that that it properly handles surrogate pairs and combining marks. This temporarily makes QTextEngine::itemize() insignificatly slower due to using intermediate buffer, until refactoring is done. Change-Id: I7987d6306b0b5cdb21b837968e292dd70abfe223 Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@digia.com>
* Hide Harfbuzz from the outer worldKonstantin Ritt2013-03-131-2/+1
| | | | | | | | Don't export, don't generate private headers, don't mention HB in API. Change-Id: I048ebd178bf4afaf9fda710a00933b95274cf910 Reviewed-by: Josh Faust <jfaust@suitabletech.com> Reviewed-by: Lars Knoll <lars.knoll@digia.com>
* Fix potential issue in QTBF itemization codeKonstantin Ritt2013-03-041-1/+1
| | | | | | | Since 5.0, script is QChar::Script which isn't of 1:1 mapping to HB_Script Change-Id: I2d88f929d7d3c3c994076a4e92ea22370962c41c Reviewed-by: Lars Knoll <lars.knoll@digia.com>
* Merge remote-tracking branch 'origin/stable' into devFrederik Gladhorn2013-01-221-1/+1
|\ | | | | | | | | | | | | | | | | | | Conflicts: src/corelib/io/qsavefile_p.h src/corelib/tools/qregularexpression.cpp src/gui/util/qvalidator.cpp src/gui/util/qvalidator.h Change-Id: I58fdf0358bd86e2fad5d9ad0556f3d3f1f535825
| * Update copyright year in Digia's license headersSergio Ahumada2013-01-181-1/+1
| | | | | | | | | | Change-Id: Ic804938fc352291d011800d21e549c10acac66fb Reviewed-by: Lars Knoll <lars.knoll@digia.com>
* | Merge branch 'stable' into devFrederik Gladhorn2013-01-041-1/+2
|\| | | | | | | | | | | | | | | | | | | | | Conflicts: examples/widgets/painting/shared/shared.pri src/corelib/tools/qharfbuzz_p.h src/corelib/tools/qunicodetools.cpp src/plugins/platforms/windows/accessible/qwindowsaccessibility.cpp src/plugins/platforms/windows/qwindowsfontdatabase.cpp Change-Id: Ibc9860abf570e5ce8b052fb88feb73ec35e64bd3
| * properly syncqt-ize harfbuzz headersOswald Buddenhagen2012-12-041-1/+1
| | | | | | | | | | | | | | | | | | we were already installing them into QtCore/private, so turn them into proper private headers to start with. this cleans up our project files. Change-Id: I0795f79e03b60b5854de9e4dc339e9b5a5e6fd87 Reviewed-by: Lars Knoll <lars.knoll@digia.com> Reviewed-by: Joerg Bornemann <joerg.bornemann@digia.com>
* | Update Qt internals to use QChar::ScriptKonstantin Ritt2012-12-211-3/+3
|/ | | | | | | | | | | | | | ...and remove the outdated QUnicodeTables::Script enum. QFontEngineData now has one extra slot that never used (engines[QChar::Script_Inherited]). engines[QChar::Script_Unknown], if accessed, would be set with a Box engine instance, and could be used as a minor optimization some time later. In order to preserve the existing behavior, we map all scripts up to Latin to Common. Change-Id: Ide4182a0f8447b4bf25713ecc3fe8097b8fed040 Reviewed-by: Pierre Rossi <pierre.rossi@gmail.com> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* QTBF: Fix issue with no splitting the words at "." (FULL STOP)Konstantin Ritt2012-11-231-0/+12
| | | | | | | | | | | | As of Unicode 5.1, some punctuation marks were mapped to MidLetter and MidNumLet for better URL and abbreviations handling which caused "hi.there" to be treated like if it were just a single word; until we have the Unicode Text Segmentation tailoring mechanism, retain the old behavior by remapping (some of) those characters back to their old values. Change-Id: I49dea6064f2ea40a82fc0b1bc3c4f0b4e803919f Reviewed-by: David Faure <david.faure@kdab.com> Reviewed-by: Lars Knoll <lars.knoll@digia.com>
* QTextBoundaryFinder: Introduce BoundaryReason::MandatoryBreak flagKonstantin Ritt2012-10-101-3/+3
| | | | | | | | | | that will be returned by boundaryReasons() when the boundary finder is at the line end position (CR, LF, NewLine Function, End of Text, etc.). The MandatoryBreak flag, if set, means the text should be wrapped at a given position. Change-Id: I32d4f570935d2e015bfc5f18915396a15f009fde Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com> Reviewed-by: Lars Knoll <lars.knoll@digia.com>
* Update the Unicode Data and Algorithms up to Unicode 6.2Konstantin Ritt2012-10-091-58/+63
| | | | | | | | | | | | | Version 6.2 of the Unicode Standard is a special release dedicated to the early publication of the newly encoded Turkish lira sign. In addition, there are some significant changes to the Unicode algorithms for text segmentation and line breaking to improve breaking for emoji symbols. For more details, see http://www.unicode.org/versions/Unicode6.2.0/ Change-Id: I21cfd4f307e41b41a19d36cce87f7a44c2661bc2 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Lars Knoll <lars.knoll@digia.com>
* QCharAttributes: add wordStart/wordEnd flagsKonstantin Ritt2012-09-261-2/+32
| | | | | | | | | | | | | | | A simple heuristic is used to detect the word beginning and ending by looking at the word break property value of surrounding characters. This behaves better than the white-spaces based implementation used before and makes it possible to tailor the default algorithm for complex scripts. BIG FAT WARNING: The QCharAttributes buffer now has to have a length of string length + 1 for the flags at end of text. Task-Id: QTBUG-6498 Change-Id: I5589b191ffde6a50d2af0c14a00430d3852c67b4 Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* Change copyrights from Nokia to DigiaIikka Eklund2012-09-221-24/+24
| | | | | | | | Change copyrights and license headers from Nokia to Digia Change-Id: If1cc974286d29fd01ec6c19dd4719a67f4c3f00e Reviewed-by: Lars Knoll <lars.knoll@digia.com> Reviewed-by: Sergio Ahumada <sergio.ahumada@digia.com>
* A step out from Harfbuzz (reduce dependency)Konstantin Ritt2012-09-221-27/+49
| | | | | | | | | | | | | | | | | Introduce QCharAttributes and use it instead of HB_CharAttributes everywhere in Qt (in Harfbuzz, the HB_CharAttributes is only used in the text segmentation algorithm which has been moved from HB to Qt (well, most of it)). Rename some members to better reflect their meaning, remember to keep HB_CharAttributes in sync with QCharAttributes. Also replace HB_ScriptItem with a (temporary) QUnicodeTools::ScriptItem struct that will be replaced with a more efficient/friendly solution a bit later. The soft hyphen and the mandatory break detection has been factored out of the default text breaking algorithm to a higher level in order to refactor the QCharAttributes bitfields and to optimize the implementation for the common case. Change-Id: Ieb365623ae954430f1c8b2dfcd65c82973143eec Reviewed-by: Lars Knoll <lars.knoll@digia.com>
* Add QChar::SoftHyphen enum valueKonstantin Ritt2012-07-031-1/+1
| | | | | | | | | Just like for the QChar::ByteOrderMark, `ch == QChar::SoftHyphen` is much more readable than `ch == 0x00ad // (soft-hyphen)`, etc. Change-Id: I9c85f14cfd979037d35103c3259a435fd729b869 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@nokia.com>
* QUnicodeTables: some internal API renamingsKonstantin Ritt2012-06-221-23/+23
| | | | | | | | | | | | enums GraphemeBreak, WordBreak, and SentenceBreak has been renamed to GraphemeBreakClass, WordBreakClass, and SentenceBreakClass respectively, their values has been renamed to contain a '_' as logical enum-value separator (just like many other nums in Qt, e.g. LineBreakClass); *BreakFormat has been replaced with *Break_Extend (some format characters are kind of subtype of the extender characters, not vice versa). Change-Id: I9ddbcf8848da87409736c2d6d1798a62fa28cab8 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Improve the code generation by using Q_LIKELY/Q_UNLIKELYKonstantin Ritt2012-06-161-36/+34
| | | | | | | + reorder conditions in getWordBreaks() to make further updates more clear Change-Id: I1ca9adde066c3a48830f310202f7181585fac194 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Line Breaking Algorithm: handle the Object Replacement CharacterKonstantin Ritt2012-06-101-30/+31
| | | | | | | | See http://www.unicode.org/reports/tr14/#CB and http://www.unicode.org/reports/tr14/#LB20 for details Change-Id: Ice0aa2b2ce81f6e39839a353240420436eddd754 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Line Breaking Algorithm: don't break inside numeric expressionsKonstantin Ritt2012-06-101-6/+107
| | | | | Change-Id: I8362663454e4c6604ecb6289ae8009d47c78aeb1 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Update the Unicode Text Breaking Algorithm implementationKonstantin Ritt2012-06-101-281/+326
| | | | | | | | | | | | | | | | | | to make it conformant to the Unicode 6.1 specifications #14 and #29. The most important changes are: * The implementation has been reworked from scratch to fix all known bugs; * Separate-out the grapheme and the line breaking implementation to eliminate an overhead due to calculating unnecessary breaks; * Stop using deprecated SG class in favor of resolving pairs of surrogates; * A proper support for SMP code points; * Support for extended grapheme clusters (a drop-in replacement for the legacy grapheme clusters as of Unicode 5.1); * The hardcoded tailoring of UBA has been eliminated which breaks the 7 years-old lineBreaking test. Some later, we'll investigate if such a tailoring is still needed. Change-Id: I9f5867b3cec753b4fc120bc5a7e20f9a73d89370 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Shift positions for lineBreakTypeKonstantin Ritt2012-06-071-1/+4
| | | | | | | | | | | | | | | | | | | to keep them consistent with positions for all other flags. This changes the internal behavior so that attributes[0].lineBreakType now means "break opportunity at start of the text (before the first character in the string)" and is always assigned with HB_NoBreak to conform rule LB2 (see http://www.unicode.org/reports/tr14/#LB2). The current implementation is based on the sample implementation from tr14 that aimed to be as simple as possible rather than to be optimal. From now, we can use pieces of the attributes array "as is" without having to adjust some positions. Or we can analize some long text by chunks (e.g. paragraph by paragraph) and consume less memory. This introduces a minor overhead that will be eliminated shortly. Change-Id: Ic873a05a9d5203b1c3d5aff2e4445a3f034c4bd2 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Set the whiteSpace flag outside the grapheme and the line breaking loopKonstantin Ritt2012-06-071-7/+21
| | | | | | | | | | | | | | | The white spaces determination doesn't belong to the text breaking algorithm. A proper breaking implementation shouldn't assume spaces are break opportunities (actually, space is allowed to be a grapheme base); However, the whiteSpace flag should never be checked alone while iterating over the text to find the space sequence; the grapheme boundaries should always be taken into account. This covers the SMP code points in UTF-16 text and graphemes that consist of a space followed with one or more grapheme extenders. This introduces a minor overhead that would be eliminated some later. Change-Id: Ic2cc7f485631fd0b436fc256ce112ded5f94fc07 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* QTBF: fix the mandatory line breaks not being set in some casesKonstantin Ritt2012-05-301-4/+5
| | | | | | | | | e.g. for the text "Aaa bbb ccc.\r\nDdd eee fff." the \r\n wasn't treated as a hard line break or as a line break opportunity, ever. Quite ancient bug... Change-Id: I8d8497c55a3a4d51c27de99ccfe1e31f3bf4de77 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Introduce QUnicodeToolsKonstantin Ritt2012-05-301-11/+21
| | | | | | | | | | | | | Add QUnicodeTools namespace and rename qGetCharAttributes to initCharAttributes; Make it possible to disable tailoring globally by overriding qt_initcharattributes_default_algorithm_only value (useful for i.e. running the specification conformance tests); This is mostly a preparation step for the upcoming patches. Change-Id: I783879fd17b63b52d7983e25dad5b820f0515e7f Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* move the default text breaking algorithm impl from HarfBuzz to QtKonstantin Ritt2012-05-101-0/+398
there are several reasons to do this: * text breaking is not a shaper's job; * since the text breaking rules are bound to a specific Unicode version, updating Qt's internal unicode data would require updating the data in HB as well; * makes porting to HurfBuzz-NG some easier Change-Id: I0bbf8e8a343bc074696f4ddf2ae4e7fa32a61629 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>