summaryrefslogtreecommitdiffstats
path: root/util/unicode/main.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Update the Unicode Data and Algorithms up to Unicode 6.2Konstantin Ritt2012-10-091-8/+18
| | | | | | | | | | | | | Version 6.2 of the Unicode Standard is a special release dedicated to the early publication of the newly encoded Turkish lira sign. In addition, there are some significant changes to the Unicode algorithms for text segmentation and line breaking to improve breaking for emoji symbols. For more details, see http://www.unicode.org/versions/Unicode6.2.0/ Change-Id: I21cfd4f307e41b41a19d36cce87f7a44c2661bc2 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Lars Knoll <lars.knoll@digia.com>
* Change copyrights from Nokia to DigiaIikka Eklund2012-09-221-48/+48
| | | | | | | | Change copyrights and license headers from Nokia to Digia Change-Id: If1cc974286d29fd01ec6c19dd4719a67f4c3f00e Reviewed-by: Lars Knoll <lars.knoll@digia.com> Reviewed-by: Sergio Ahumada <sergio.ahumada@digia.com>
* QUnicodeTables: some internal API renamingsKonstantin Ritt2012-06-221-179/+179
| | | | | | | | | | | | enums GraphemeBreak, WordBreak, and SentenceBreak has been renamed to GraphemeBreakClass, WordBreakClass, and SentenceBreakClass respectively, their values has been renamed to contain a '_' as logical enum-value separator (just like many other nums in Qt, e.g. LineBreakClass); *BreakFormat has been replaced with *Break_Extend (some format characters are kind of subtype of the extender characters, not vice versa). Change-Id: I9ddbcf8848da87409736c2d6d1798a62fa28cab8 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Clean-up the Unicode tables generator code and the generated headerKonstantin Ritt2012-06-221-432/+435
| | | | | | | | | | | | This fixes the blocks and memory consumption reports, the whitespace issues and makes the code a bit cleaner. Since I'm the only one who does change this code, such a no-op commit could not hurt anyone or even git blame ;) Change-Id: Ib069f925a3791c82e16c368c8392bcffbfd68c53 Reviewed-by: Lars Knoll <lars.knoll@nokia.com> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* Make QUnicodeTables::script() support SMP code pointsKonstantin Ritt2012-06-141-277/+145
| | | | | | | | | | | | | | | | | | | Instead of expanding the scripts table with script values for the code points >= 0x10000, it has been merged with the properties table in order to increase perfomance of the script itemization code (not affected yet). (Stats: the properties table grew up in 97428-89800 = 7628 bytes; the old scripts table was of size 7680 bytes) The outdated ScriptsInitial.txt and ScriptsCorrections.txt file has been removed (they were just empty, the "corrigendum" script corrections should be applied to Scripts.txt directly, *no customization allowed*!). More script testcases has been added - at least one per supported script. Task-number: QTBUG-6530 Change-Id: I40a9e76f681e2dd552fd4c61af0808d043962e79 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Line Breaking Algorithm: handle the Object Replacement CharacterKonstantin Ritt2012-06-101-7/+6
| | | | | | | | See http://www.unicode.org/reports/tr14/#CB and http://www.unicode.org/reports/tr14/#LB20 for details Change-Id: Ice0aa2b2ce81f6e39839a353240420436eddd754 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Update the qunicodetables generator to deal with UCD 6.1 filesKonstantin Ritt2012-06-101-34/+92
| | | | | Change-Id: If22018ff83cfc6b9c984f689648da038fce11d84 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Move ScriptSentinel enum from header to .cppKonstantin Ritt2012-05-251-4/+4
| | | | | Change-Id: Ic74e8e2471e92aa2014735f6ab0bb4f3b88de206 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* QChar: add isSurrogate() and isNonCharacter() to the public APIKonstantin Ritt2012-05-161-25/+6
| | | | | | | | + QChar::LastValidCodePoint enum value that supercede the UNICODE_LAST_CODEPOINT macro replace uses of hardcoded values with the new API; remove leftovers Change-Id: I1395c9840b85fcb6b08e241b131794a98773c952 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* significant unicodetables generator performance optimizationKonstantin Ritt2012-05-111-41/+47
| | | | | | | | | | since the entire range of a valid unicode code points is in use, QHash is suboptimal and could be replaced with QList; taking the value by ref and not inserting it back to the map + not calculating the default value over and over gains us up to 60% performance boost! Change-Id: I48c54a8e88472cf76c79c0aac44e65eeefa44861 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* add some useful methods to QUnicodeTables::Konstantin Ritt2012-05-101-1/+42
| | | | | | | in order to reduce code duplication and prepare the ground for upcoming changes Change-Id: I980244149f65384c9484bbec4682de8b7b848b08 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* add support for non-BMP ligaturesKonstantin Ritt2012-05-041-14/+84
| | | | | | | | | | | | | | | | | | | | | > http://www.unicode.org/versions/Unicode5.2.0/ D. Character Additions: There are three new characters in the newly-encoded Kaithi script that will require changes in implementations which make hard-coded assumptions about composition during normalization. Most new characters added to the standard with decompositions cannot be generated by the operations toNFC() or toNFKC), but these three can. Implementers should check their code carefully to ensure that it handles these three characters correctly. U+1109A KAITHI LETTER DDDHA U+1109C KAITHI LETTER RHA U+110AB KAITHI LETTER VA UCD 6.1 adds two more of them: U+1112E CHAKMA VOWEL SIGN O U+1112F CHAKMA VOWEL SIGN AU Change-Id: I781a26848078d8b83a182b0fd4e681be2a6d9a27 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* qunicodetables generator: improve the output and the generated codeKonstantin Ritt2012-04-241-90/+109
| | | | | | | | | | | better memory usage report; an additional asserts with conditions the implementation is depends on; a namespace for the internal static data; styling fixes Change-Id: Id4048ff6104c56b5f590f9ac6fbf7c0bce79ec47 Reviewed-by: Lars Knoll <lars.knoll@nokia.com> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* workaround issue where casing diff overflows signed shortKonstantin Ritt2012-04-241-17/+41
| | | | | | | | | | | | | | | | | | | there are two such codepoints were added in the Unicode 5.1: U+1D79 LATIN SMALL LETTER INSULAR G U+A77D LATIN CAPITAL LETTER INSULAR G two more of them were added in the Unicode 6.0: U+0265 LATIN SMALL LETTER TURNED H U+A78D LATIN CAPITAL LETTER TURNED H and two more were added in the Unicode 6.1: U+0266 LATIN SMALL LETTER H WITH HOOK U+A7AA LATIN CAPITAL LETTER H WITH HOOK we map them like special cases with length == 1 (note: all are in BMP which is checked explicitly in the generator) Change-Id: I8a34164eb3ee2e575b7799cc12d4b96ad5bcd9c6 Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com> Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* replace hardcoded values with a surrogate handling methodsKonstantin Ritt2012-04-111-10/+10
| | | | | | Change-Id: Iba079953c46a29404232d2dacbe0c90170097d51 Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@nokia.com> Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* minor improvement for NormalizationCorrectionsKonstantin Ritt2012-04-111-2/+5
| | | | | | | | let's don't hardcode the latests affected version value and simply use the one parsed from NormalizationCorrections.txt Change-Id: I37021e8238d77deada4c5ba7a2d160c87186b9dd Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* optimize QString::toLower()/toUpper() for special cases, step 2Konstantin Ritt2012-02-211-3/+7
| | | | | | | | | | | from now, QUnicodeTables::specialCaseMap[] starts with a placeholder; so, if somethingCaseSpecial is true, then somethingCaseDiff is always greater than 0 Change-Id: Ibb1870512836eee71b1521564c0745096c05b2f9 Merge-request: 70 Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@nokia.com> Reviewed-by: Olivier Reviewed-by: Olivier Goffart <ogoffart@woboq.com>
* optimize QString::toLower()/toUpper() for special cases, step 1Konstantin Ritt2012-02-211-18/+31
| | | | | | | | | | | reorganize QUnicodeTables::specialCaseMap as follows: specialCaseMap contains sequence entries in form { length, a, b, .. } Change-Id: Iea1f80bc2f4dc1f505428dad981cde26daaa52c7 Merge-request: 70 Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@nokia.com> Reviewed-by: Olivier Reviewed-by: Olivier Goffart <ogoffart@woboq.com>
* Remove "All rights reserved" line from license headers.Jason McDonald2012-01-301-2/+2
| | | | | | | | | | As in the past, to avoid rewriting various autotests that contain line-number information, an extra blank line has been inserted at the end of the license text to ensure that this commit does not change the total number of lines in the license header. Change-Id: I311e001373776812699d6efc045b5f742890c689 Reviewed-by: Rohan McGovern <rohan.mcgovern@nokia.com>
* Update contact information in license headers.Jason McDonald2012-01-231-2/+2
| | | | | | | Replace Nokia contact email address with Qt Project website. Change-Id: I431bbbf76d7c27d8b502f87947675c116994c415 Reviewed-by: Rohan McGovern <rohan.mcgovern@nokia.com>
* Update copyright year in license headers.Jason McDonald2012-01-051-2/+2
| | | | | Change-Id: I02f2c620296fcd91d4967d58767ea33fc4e1e7dc Reviewed-by: Rohan McGovern <rohan.mcgovern@nokia.com>
* replace 'const QChar &' with 'QChar ' for QChar and QStringRitt Konstantin2011-10-261-2/+2
| | | | | | | | Merge-request: 69 Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@nokia.com> Change-Id: I61f5a54b783252029fcad95677958fa6a2130d01 Reviewed-by: Olivier Goffart <ogoffart@kde.org>
* drop an obsolete QChar::NoCategory enum valueRitt Konstantin2011-07-131-5/+2
| | | | | | | | | | | | there is no such category in the Unicode specs. the QChar::NoCategory was a subject of bugs since it was introduced. int 4.6 it's meaning was limited to mention ucs4 > UNICODE_LAST_CODEPOINT only (which is useless anyways) in order to preserve the old (wrong) behavior. fix it now for qtbase Change-Id: I630534824e071090b39772881e747c1fdb758719 Reviewed-on: http://codereview.qt.nokia.com/1584 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Update licenseheader text in source files for qtbase Qt moduleJyri Tahtela2011-05-241-34/+34
| | | | | | | Updated version of LGPL and FDL licenseheaders. Apply release phase licenseheaders for all source files. Reviewed-by: Trust Me
* Initial import from the monolithic Qt.Qt by Nokia2011-04-271-0/+2786
This is the beginning of revision history for this module. If you want to look at revision history older than this, please refer to the Qt Git wiki for how to use Git history grafting. At the time of writing, this wiki is located here: http://qt.gitorious.org/qt/pages/GitIntroductionWithQt If you have already performed the grafting and you don't see any history beyond this commit, try running "git log" with the "--follow" argument. Branched from the monolithic repo, Qt master branch, at commit 896db169ea224deb96c59ce8af800d019de63f12