summaryrefslogtreecommitdiffstats
path: root/src/corelib/tools/qunicodetools.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Improve the code generation by using Q_LIKELY/Q_UNLIKELYKonstantin Ritt2012-06-161-36/+34
| | | | | | | + reorder conditions in getWordBreaks() to make further updates more clear Change-Id: I1ca9adde066c3a48830f310202f7181585fac194 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Line Breaking Algorithm: handle the Object Replacement CharacterKonstantin Ritt2012-06-101-30/+31
| | | | | | | | See http://www.unicode.org/reports/tr14/#CB and http://www.unicode.org/reports/tr14/#LB20 for details Change-Id: Ice0aa2b2ce81f6e39839a353240420436eddd754 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Line Breaking Algorithm: don't break inside numeric expressionsKonstantin Ritt2012-06-101-6/+107
| | | | | Change-Id: I8362663454e4c6604ecb6289ae8009d47c78aeb1 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Update the Unicode Text Breaking Algorithm implementationKonstantin Ritt2012-06-101-281/+326
| | | | | | | | | | | | | | | | | | to make it conformant to the Unicode 6.1 specifications #14 and #29. The most important changes are: * The implementation has been reworked from scratch to fix all known bugs; * Separate-out the grapheme and the line breaking implementation to eliminate an overhead due to calculating unnecessary breaks; * Stop using deprecated SG class in favor of resolving pairs of surrogates; * A proper support for SMP code points; * Support for extended grapheme clusters (a drop-in replacement for the legacy grapheme clusters as of Unicode 5.1); * The hardcoded tailoring of UBA has been eliminated which breaks the 7 years-old lineBreaking test. Some later, we'll investigate if such a tailoring is still needed. Change-Id: I9f5867b3cec753b4fc120bc5a7e20f9a73d89370 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Shift positions for lineBreakTypeKonstantin Ritt2012-06-071-1/+4
| | | | | | | | | | | | | | | | | | | to keep them consistent with positions for all other flags. This changes the internal behavior so that attributes[0].lineBreakType now means "break opportunity at start of the text (before the first character in the string)" and is always assigned with HB_NoBreak to conform rule LB2 (see http://www.unicode.org/reports/tr14/#LB2). The current implementation is based on the sample implementation from tr14 that aimed to be as simple as possible rather than to be optimal. From now, we can use pieces of the attributes array "as is" without having to adjust some positions. Or we can analize some long text by chunks (e.g. paragraph by paragraph) and consume less memory. This introduces a minor overhead that will be eliminated shortly. Change-Id: Ic873a05a9d5203b1c3d5aff2e4445a3f034c4bd2 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Set the whiteSpace flag outside the grapheme and the line breaking loopKonstantin Ritt2012-06-071-7/+21
| | | | | | | | | | | | | | | The white spaces determination doesn't belong to the text breaking algorithm. A proper breaking implementation shouldn't assume spaces are break opportunities (actually, space is allowed to be a grapheme base); However, the whiteSpace flag should never be checked alone while iterating over the text to find the space sequence; the grapheme boundaries should always be taken into account. This covers the SMP code points in UTF-16 text and graphemes that consist of a space followed with one or more grapheme extenders. This introduces a minor overhead that would be eliminated some later. Change-Id: Ic2cc7f485631fd0b436fc256ce112ded5f94fc07 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* QTBF: fix the mandatory line breaks not being set in some casesKonstantin Ritt2012-05-301-4/+5
| | | | | | | | | e.g. for the text "Aaa bbb ccc.\r\nDdd eee fff." the \r\n wasn't treated as a hard line break or as a line break opportunity, ever. Quite ancient bug... Change-Id: I8d8497c55a3a4d51c27de99ccfe1e31f3bf4de77 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Introduce QUnicodeToolsKonstantin Ritt2012-05-301-11/+21
| | | | | | | | | | | | | Add QUnicodeTools namespace and rename qGetCharAttributes to initCharAttributes; Make it possible to disable tailoring globally by overriding qt_initcharattributes_default_algorithm_only value (useful for i.e. running the specification conformance tests); This is mostly a preparation step for the upcoming patches. Change-Id: I783879fd17b63b52d7983e25dad5b820f0515e7f Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* move the default text breaking algorithm impl from HarfBuzz to QtKonstantin Ritt2012-05-101-0/+398
there are several reasons to do this: * text breaking is not a shaper's job; * since the text breaking rules are bound to a specific Unicode version, updating Qt's internal unicode data would require updating the data in HB as well; * makes porting to HurfBuzz-NG some easier Change-Id: I0bbf8e8a343bc074696f4ddf2ae4e7fa32a61629 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>