summaryrefslogtreecommitdiffstats
path: root/src/corelib/text/qlocale_data_p.h
Commit message (Collapse)AuthorAgeFilesLines
* Use char16_t in favor of ushort for locale data tablesEdward Welbourne2020-02-171-13/+13
| | | | | | Change-Id: I890dd2b52c1b786db1081744c8ca343baba93de4 Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Allow surrogate pairs for various "single character" locale dataEdward Welbourne2020-02-171-787/+818
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extract the character in its proper unicode form and encode it in a new single_character_data table of locale data. Record each entry as the range within that table that encodes it. Also added an assertion in the generator script to check that the digits CLDR gives us are a contiguous sequence in increasing order, as has been assumed by the C++ code for some time. Lots of number-formatting code now has to take account of how wide the digits are. This leaves nowhere for updateSystemPrivate() to record values read from sys_locale->query(), so we must always consult that function when accessing these members of the systemData() object. Various internal users of these single-character fields need the system-or-CLDR value rather than the raw CLDR value, so move QLocalePrivate's methods to supply them down to QLocaleData and ensure they check for system values, where appropriate first. This allows us to finally support the Chakma language and script, for whose number system UTF-16 needs surrogate pairs. Costs 10.8 kB in added data, much of it due to adding two new locales that need surrogates to represent digits. [ChangeLog][QtCore][QLocale] Various QLocale methods that returned single QChar values now return QString values to accommodate those locales which need a surrogate pair to represent the (single character) return value. Fixes: QTBUG-69324 Fixes: QTBUG-81053 Change-Id: I481722d6f5ee266164f09031679a851dfa6e7839 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Separate offsets from sizes in QLocale's dataEdward Welbourne2020-01-301-2611/+2533
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This enables us to make the sizes quint8 and benefit from the resulting packing, making the locale data smaller. The sizes for long month-name lists (which concatenate twelve names with semicolon as separator) can overflow an 8-bit member, so use quint16 where needed. Re-ordered the data in QLocaleData and QCalendarLocale. Now all long-short(-narrow) families arise in that order; and any standalone is grouped with the one of the same length. (This cost 20 bytes in the date-format table, which optimises out more duplication if short is before long, but the saving in the (smaller) time-format table more than make up for it; and 20 bytes isn't worth the confusion that being inconsistent in ordering might cause.) At the same time, drop trailing semicolons from list entries (which join various names with semicolon) as they're not needed: we know where the end of the list is, because we know the size of the string that results from concatenation. The code that parses such lists can even correctly handle empty entries at the end. Saves 26 kB of data in the compiled binaries. Task-number: QTBUG-81053 Change-Id: If6ccc96a6910828817aa605d10fd814f567ae1e8 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Deduplicate locale data tablesEdward Welbourne2020-01-301-1093/+1078
| | | | | | | | | | | | | | | | | | | | | | Some entries in tables were sub-strings (e.g. prefixes) of others. Since we store start-index and length (with no need for terminators), any entry that appears as a sub-string of an earlier entry can be recorded without making a separate copy of its content, just by recording where it appeared as a sub-string of an earlier entry. (Sadly this doesn't apply to month- or day-names and their short-forms: for those, we store ';'-joined lists. Thus, although each short-form is a prefix of its long-form, the short-form is stored in a list with other short-forms; and this is not a prefix of the list of matching long-forms.) The savings are modest (780 bytes at present), but cost us nothing except when running the python script that generates the data files (it takes a little longer now), which usually only happens at a CLDR update. Change-Id: I05bdaa9283365707bac0190ae983b31f074dd6ed Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Minor tidy-up in qlocalexml2cpp.pyEdward Welbourne2020-01-301-2/+2
| | | | | | | | | | | | | | | | | | | Split a long line. Use pythonic chained comparison to save some repetition. Comment on a field not currently in actual use. Say "zeros" rather than "0s" in one comment to match another. Added a .h suffix to the main locale data tempfile to match the naming of the tempfiles used for calendar data. Simplify generation of the blank line between Language and Script; and include a matching blank between Script and Country. This adds one blank line to qlocale.h Removed a stray space that misaligned locale data lines. This produces a space-only change in the generated *_data_p.h files. Change-Id: I974a9e8923c3dfd2178855d2cf1d6a5074e130b3 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Preserve the case of the exponent separator CLDR suppliesEdward Welbourne2020-01-301-537/+537
| | | | | | | | | | | | | | | | | | | | | | We have long (since 4.5.1) coerced it to lower-case, for no readily apparent, much less documented, reason. CLDR says most locales use an upper-case E for this - let's actually use what CLDR says we should use. The code that matches the exponent separator was doing so case-insensitively in any case; that needed adaptation now that the separator's case isn't pre-determined; and, in any case, should have been done using case-folding rather than upper-casing. In the process, removed some spurious checks for "'e' or 'E'" in the result, since the exponent separator is always represented by 'e' (and an 'e' might also be present for the separate reason of its use as a beyond-decimal digit representing fourteen). [ChangeLog][QtCore][QLocale] QLocale::exponential() now preserves the case of the CLDR source, where previously it was lower-cased. Change-Id: Ic9ac02136cff79cb9f136d72141b5dbf54d9e0a6 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Update CLDR to v36Edward Welbourne2019-10-251-2836/+2876
| | | | | | | | | | | | | | | | | | | | | | Released on October 4th. Adds Windows names for two time zones, Qyzylorda and Volgograd. Added languages Chickasaw (cic), Muscogee (mus) and Silesian (szl). Norwegian number formatting has flipped back to using colon rather than dot as time separator; it's flipped back and forth over the last several CLDR releases. The dot form is present as a variant, the colon form was long given as the normal pattern, then went away; but now it's back as a contributed draft and that's what we pick up. The MS-Win time-zone ID script was iterating a dict, causing random reshuffling when new entries are added. Fixed that by doing the critical iteration in sorted order. Omitted locales ccp_BD and ccp_IN due to QTBUG-69324. Task-number: QTBUG-79418 Change-Id: I43869ee1810ecc1fe876523947ddcbcddf4e550a Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Add support for calendars beside GregorianSoroush Rabiei2019-08-201-2569/+589
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add QCalendarBackend as a base class for calendar implementations and QCalendar as a facade via which to access it. QDate's implicit implementation of the Gregorian calendar becomes QGregorianCalendar and QDate methods now support choice of calendar. Convert QLocale's CLDR data for month names to a locale-data component of each supported calendar and relevant QLocale methods now support choice of calendar. Adapt Python scripts for locale data generation to extract month name data from CLDR (keeping on version v35.1) into the new calendar-locale files. The locale data for the Gregorian calendar is held in a Roman calendar base, for sharing with other calendars. Add tests for basic uses of the new API. [ChangeLog][QtCore][QCalendar] Added QCalendar to support diverse calendars, supported by implementing QCalendarBackend. [ChangeLog][QtCore][QDate] Allow choice of calendar in various operations, with Gregorian remaining the default. Done-with: Lars Knoll <lars.knoll@qt.io> Done-with: Edward Welbourne <edward.welbourne@qt.io> Fixes: QTBUG-17110 Fixes: QTBUG-950 Change-Id: I9d6278f394269a183aee8156e990cec4d5198ab8 Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Move text-related code out of corelib/tools/ to corelib/text/Edward Welbourne2019-07-101-0/+8814
This includes byte array, string, char, unicode, locale, collation and regular expressions. Change-Id: I8b125fa52c8c513eb57a0f1298b91910e5a0d786 Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>