summaryrefslogtreecommitdiffstats
path: root/util/locale_database
Commit message (Collapse)AuthorAgeFilesLines
* Fix handling of Suzhou numbering systemEdward Welbourne2020-07-171-2/+4
| | | | | | | | | | | | | | | | This only arises when the system locale tells us to use its zero as our zero digit, since no CLDR locale uses it by default. Adapt an MS-specific QLocale::system() test to use Suzhou numbering, so as to test this. While updating the locale-restoration code to also restore the digits being set in that test, add restore code for the long time format, where previously only the short time format was restored. Add a comment to make it less likely one of those shall be missed in future. Fixes: QTBUG-85409 Change-Id: I343324bb563ee0e455dfe77d4825bf8c3082ca30 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Support digit-grouping correctlyEdward Welbourne2020-07-143-6/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Read three more values from CLDR and add a byte to the bit-fields at the end of QLocaleData, indicating the three group sizes. This adds three new parameters to various low-level formatting functions. At the same time, rename ThousandsGroup to GroupDigits, more faithfully expressing what this (internal) option means. This replaces commit 27d139128013c969a939779536485c1a80be977e with a fuller implementation that handles digit-grouping in any of the ways that CLDR supports. The formerly "Indian" formatting now also applies to at least some locales for Bangladesh, Bhutan and Sri Lanka. Fixed Costa Rica currency formatting test that wrongly put a separator after the first digit; the locale (in common with several Spanish locales) requires at least two digits before the first separator. [ChangeLog][QtCore][Important Behavior Changes] Some locales require more than one digit before the first grouping separator; others use group sizes other than three. The latter was partially supported (only for India) at 5.15 but is now systematically supported; the former is now also supported. Task-number: QTBUG-24301 Fixes: QTBUG-81050 Change-Id: I4ea4e331f3254d1f34801cddf51f3c65d3815573 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Remove unused importsDimitrios Apostolou2020-07-103-3/+0
| | | | | | | As found by LGTM.com. Change-Id: I1704f10f9bab1b11ab22824aca0cfcdcb47fef2f Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Merge remote-tracking branch 'origin/5.15' into devQt Forward Merge Bot2020-04-088-1942/+2511
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: examples/opengl/doc/src/cube.qdoc src/corelib/global/qlibraryinfo.cpp src/corelib/text/qbytearray_p.h src/corelib/text/qlocale_data_p.h src/corelib/time/qhijricalendar_data_p.h src/corelib/time/qjalalicalendar_data_p.h src/corelib/time/qromancalendar_data_p.h src/network/ssl/qsslcertificate.h src/widgets/doc/src/graphicsview.qdoc src/widgets/widgets/qcombobox.cpp src/widgets/widgets/qcombobox.h tests/auto/corelib/tools/qscopeguard/tst_qscopeguard.cpp tests/auto/widgets/widgets/qcombobox/tst_qcombobox.cpp tests/benchmarks/corelib/io/qdiriterator/qdiriterator.pro tests/manual/diaglib/debugproxystyle.cpp tests/manual/diaglib/qwidgetdump.cpp tests/manual/diaglib/qwindowdump.cpp tests/manual/diaglib/textdump.cpp util/locale_database/cldr2qlocalexml.py util/locale_database/qlocalexml.py util/locale_database/qlocalexml2cpp.py Resolution of util/locale_database/ are based on: https://codereview.qt-project.org/c/qt/qtbase/+/294250 and src/corelib/{text,time}/*_data_p.h were then regenerated by running those scripts. Updated CMakeLists.txt in each of tests/auto/corelib/serialization/qcborstreamreader/ tests/auto/corelib/serialization/qcborvalue/ tests/auto/gui/kernel/ and generated new ones in each of tests/auto/gui/kernel/qaddpostroutine/ tests/auto/gui/kernel/qhighdpiscaling/ tests/libfuzzer/corelib/text/qregularexpression/optimize/ tests/libfuzzer/gui/painting/qcolorspace/fromiccprofile/ tests/libfuzzer/gui/text/qtextdocument/sethtml/ tests/libfuzzer/gui/text/qtextdocument/setmarkdown/ tests/libfuzzer/gui/text/qtextlayout/beginlayout/ by running util/cmake/pro2cmake.py on their changed .pro files. Changed target name in tests/auto/gui/kernel/qaction/qaction.pro tests/auto/gui/kernel/qaction/qactiongroup.pro tests/auto/gui/kernel/qshortcut/qshortcut.pro to ensure unique target names for CMake Changed tst_QComboBox::currentIndex to not test the currentIndexChanged(QString), as that one does not exist in Qt 6 anymore. Change-Id: I9a85705484855ae1dc874a81f49d27a50b0dcff7
| * Fix parameter order in cldr2qlocalexml.py's usage()Edward Welbourne2020-04-061-1/+1
| | | | | | | | | | | | | | Callers and definition were out of sync. Change-Id: Icda26887cb64c61c7e373766f25559b0d450d112 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Ensure we use UTF-8 for the emitted QLocaleXML data fileEdward Welbourne2020-04-021-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Python helpfully uses a sensible locale when stdout is a tty but uses the system (not the filesystem) default encoding, which may be ascii and unable to encode some of the data we need to save. So brute force kludge it to ensure emit.encoding is UTF-8 when writing the output we'll read as UTF-8 anyway. (This matches dev's commit 0ef79d94f6dcf276ca55b084d27f980b1f260473 for the reworked version of the script.) Task-number: QTBUG-79902 Change-Id: I60ddc896a308c06e01fa87e8e18e112faa17d601 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Purge a stray space from calendar locale dataEdward Welbourne2020-04-021-1/+1
| | | | | | | | | | | | | | | | It was causing all lines after the first, in each calendar's locale_data[], to be over-indented. This only changes spacing. Change-Id: Ibfc4986548eecbfdba2902cc18f44a2af669bc6d Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
| * Convert the qlocale2cpp's last few %-formats to modern format() styleEdward Welbourne2020-04-021-3/+3
| | | | | | | | | | | | | | | | | | I've taken care of all the others in the course of other changes already ... Task-number: QTBUG-81344 Change-Id: I44e40a0d1c9f1e1a540a5f4cd252369fdc9b2698 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Check all matches for each XPath when searchingEdward Welbourne2020-04-021-63/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if we found one element with required attributes, we would search into it and ignore any later elements also with those required attributes. This meant that, if the first didn't contain the child elements we were looking for, we'd fail to find what we sought, if it was in a later matching element (e.g. with some ignored attributes). We would then go on to look for a match in a later file, where there might have been a match we should have found in the earlier file. Check all matches, rather than only the first match in each file. Do the search in each file "in parallel" to save reparsing the XPath. This clears the search code of rather hard-to-follow break/else handling in loops; and currently makes no change to the generated data. Change-Id: I86b010e65b9a1fc1b79e5fdd45a5aeff1ed5d5d5 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Change QLocale to use CLDR's accounting formats for currenciesEdward Welbourne2020-04-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In particular, this changed the US currency formats for negative amounts to be parenthesised versions of the positive amount forms, rather than having a minus sign after the $ sign. Test updated. [ChangeLog][QtCore][QLocale] Currency formats are now based on CLDR's accounting formats, where they were previously mostly based (more or less by accident) on standard formats. In particular, this now means negative currency formats are specified, where available, where they (mostly) were not previously. Task-number: QTBUG-79902 Change-Id: Ie0c07515ece8bd518a74a6956bf97ca85e9894eb Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Take CLDR's distinguished attributes into accountEdward Welbourne2020-04-022-21/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing XPATH searches, child nodes that have distinguished attributes that were not asked for should be skipped. This is part of the LDML spec and matters when resolving locale inheritance. Scan the LDML DTD (previously only scanned for the CLDR version) to find which attributes of which tags are ignorable - all others are distinguished - and take the result into account when performing XPATH searches. The XPath we were using for currency formats wasn't excluding currencyFormatLength elements with type="short" and patterns specific to thousands (and larger multiples); this is fixed by taking distinguished attributes into account. However, the XPATH also wasn't specifying the always distinguished attribute type="standard" that was, in practice, used for nearly all locales that weren't (wrongly) using short-forms for thousands; so type="standard" is now made explicit, so as to minimize the diff. This leaves only twenty-one locales with a negative currency formats. A later commit shall switch to using accounting by default (it falls back via an alias to standard, in any case), thereby restoring the two mentioned below that were using it by accident, but the present change gives the minimal diff here. Thousands-specific formats replaced with sensible ones: * zh_Hant_{HK,MO} (Traditional Mandarin, Hong Kong and Macau) * eo_001 (Esperanto) * fr_CA (Canadian French) * ha_* (Hausa, when not written in Arabic) * es_{GT,MX,US} (Spanish - Guatemala, Mexico, USA) * sw_KE (Swahili, Kenya) * yi_001 (Yiddish) * mfe_MU (Morisyen, Mauritius) * lag_TZ (Langi, Tanzania) * mgh_MZ (Makhuwa Meetto, Mozambique) * wae_CH (Walser, Switzerland) * kkj_CM (Kako, Cameroon) * lkt_US (Lakota, USA) * pa_Arab_PK (Punjabi, in Arabic script, as used in Pakistan; uses arabext number system, whose currency falls back to latn's, for which pa_Arab over-rides the thousands-format). Format changed from an over-ridden type="accounting" to standard (so these lost a negative-specific form) in: * en_SI (English, Slovenia) * es_DO (Spanish, Dominican Republic; same) For some locales we were picking up over-rides of narrow or short list formats, or formats for or-lists or unit-lists rather than and-lists, in place of the standard list format, that these locales don't over-ride, provided by a parent locale. This changed list formats for: * en_CA, en_IN (dropped "Oxford" comma before "and") * qu_* (Quechua; dropped "utaq", presumably meaning "and") * ur_IN (Urdu, India; was using unit-list formats) [ChangeLog][QtCore][QLocale] Data used for currency formats in several locales and list patterns in some locales have changed due to now parsing the CLDR data more faithfully. Fixes: QTBUG-81344 Change-Id: I6b95c6c37db92df167153767c1b103becfb0ac98 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Take number system into account in currency format look-upEdward Welbourne2020-04-021-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CLDR's currency formats do have number system variation, so take it into account. (The old xpathlite code clearly intended to do this, but failed at it due to looking for the wrong component of an XPATH to fix.) This changes the currency formats in use for * all Dutch locales (because nl.xml lists a currency format for arab before the one for latn, and they differ), * Punjabi, Urdu - specifically pa_Guru_IN, ur_Arab_PK (both like Dutch, arabext before latn; which is correct for pa_Arab_PK and ur_Arab_IN), * Sindi (whose over-ride of latn currency format we were using, where we should be using arab's format, supplied by root's default), * Tatar (which specifies a generic currency format, which we were using, before one specific to latn, which we now use), * Tongan (same as Dutch), * Konkani (like Dutch, deva before latn) and * several North African Arabic locales (whose default number system is latn, rather than arab, but previously used arab's formats). Task-number: QTBUG-79902 Change-Id: I18d8ec16bfd3a516d1bcd2f63bc7f7f15179a3f4 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Rework cldr2qlocalexml.py's reading of CLDR dataEdward Welbourne2020-04-025-898/+972
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the code out to a CldrReader class in cldr.py, expand CldrAccess with facilities that needs, expand ldml.py to include support for more features, finally making xpathlite.py redundant. This initial commit aims, though, to be bug-for-bug compatible with xpathlite in its reading of the CLDR data. It turns out we've been using draftier data than we were aware of (which might not be a bad thing). The xpathlite code appeared to check for draft attributes, but these only appear on leaf nodes and most data were fetched by finding a parent and then scanning its children without the draft check; only am/pm data was actually being excluded based on draft values. (We allowed contributed, for am/pm, in addition to approved, which is all the xpathlite code allows otherwise.) There are also some less equivocal bugs; I'll deal with these in later commits. Simplified number-system data look-ups; the old get_number_in_system() was taking care of old LDML versions' placement of the number system attribute; this is no longer needed. (It was also being used for a currency value to which it was not appropriate, which is now handled separately; this is one of the bugs mentioned above.) Ditched a fall-back to nativeZeroDigit, which no longer exists in CLDR. Change the command-line to take the root of the CLDR data tree, rather than its common/main/ sub-directory. Support naming the file to which to write output, as a second command-line argument, instead of always writing to stdout (which remains the default) and leaving whoever runs the script to redirect stdout. Support (internally for now, while adding TODOs to give main() more command-line options) separating the stderr output into its more and less interesting parts; for now, continue producing both, but suppress the least interesting entirely. Task-number: QTBUG-81344 Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Move cldr2qtimezone.py's CLDR-reading to a CldrAccess classEdward Welbourne2020-04-023-71/+339
| | | | | | | | | | | | | | | | | | | | | | | | This begins the process of replacing xpathlite.py, adding low-level DOM-access classes to ldml.py and the CldrAccess class to cldr.py Moved a format comment from cldr2qtimezone.py's doc-string to the method of CldrAccess that does the actual reading. Task-number: QTBUG-81344 Change-Id: I46ae3f402f8207ced6d30a1de5cedaeef47b2bcf Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Rework qlocalexml2cpp.py to use writers based on TranscriberEdward Welbourne2020-04-021-508/+438
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This saves repetition of temporary-file manipulation code. In the process, ensure that we tidy away temporary files on failure. Moved a comment in qlocale.h to *outside* the re-written portion, to save having to rewrite it every time. Added blank lines to separate script data from country data in the generated output. Changed 0s in one comment to zeros, to match another comment. Isolated use of sys to the __main__ block. Isolated use of enumdata to the new LocaleHeaderWriter class. Modernised all the string-formatting I touched. Task-number: QTBUG-81344 Change-Id: I5768e45d9a8ea23facc303b3dd8af8b3ccbf7ff2 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Rework cldr2qtimezone.py into more maintainable formEdward Welbourne2020-04-021-165/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Broke out the updating of a source file to a ZoneIdWriter helper class, which enables tidying away the temporary file if we fail. Collected up the rest of the script into a main() that's now called from a __name__ == '__main__' block. Rationalized the imports. Eliminated an inefficient lookup function by constructing a suitable dict() before entering the loop that needed it. Separated the "data you might need to update" tables from the code that does the work, to make it easier for those adding support for new zones to see what they're doing. Removed the spurious $Revision$ from the output and reworded the premable of the generated file. (It would seem CLDR no longer uses an RCS-based version-control system.) Generated output is otherwise unchanged. Task-number: QTBUG-81344 Change-Id: I7d9de8357ebcb599d154de9f862e25f7ade00390 Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Add tools to localetools to facilitate source file recreationEdward Welbourne2020-04-021-0/+99
| | | | | | | | | | | | | | | | | | | | For now unused; later commits shall put them to use. Transcriber -- base, takes care of tempfile and renaming. SourceFileEditor -- handles copying parts before and after a common delimiter. Task-number: QTBUG-81344 Change-Id: I28cf977d0a08825fbb873fb330da6823b88ad3ed Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Move some shared code to a localetools moduleEdward Welbourne2020-04-026-75/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The time-zone script was importing two functions from the locale data generation script. Move them to a separate module, to which I'll shortly add some more shared utilities. Cleaned up some imports in the process. Combined qlocalexml2cpp's and xpathlit's error classes into a new Error class in the new module and made it a bit more like a proper python error class. Task-number: QTBUG-81344 Change-Id: Idbe0139ba9aaa2f823b8f7216dee1d2539c18b75 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Move qlocalexml2cpp.py's XML-reading to QLocaleXmlReaderEdward Welbourne2020-04-022-277/+223
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new class mirrors the existing QLocaleXmlWriter and places the two side-by-side in qlocalexml.py, rather than having the writing and reading in separate places. Made judicious use of transformed versions of mappings to save repeated iteration of a mapping's entries to do lookups on fist entries of pair-values; several (id, name, code) data-sets are sometimes indexed by id, sometimes by name. Reworked the default_map, that the complicated compareLocaleKeys() used in sorting locale keys, to map IDs instead of names; the function also needed the locale_map so that it could convert IDs to names, which we can skip by going directly with IDs. Task-number: QTBUG-81344 Change-Id: Iff6a97f7f0755b56dda70d8a6796ec074c558910 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
| * Rework cldr2qlocalexml.py in terms of a QLocaleXmlWriter classEdward Welbourne2020-04-022-197/+330
| | | | | | | | | | | | | | | | | | | | | | | | | | Delegate the output of XML to a helper class provided by qlocalexml.py and restructure the driver script so that it can be imported without running anything. It now has a minimal __name__ == '__main__' block that calls a main() function. This, for the moment, requires a global via which it shares the CLDR directory with various other functions; that shall go away in a later commit. Task-number: QTBUG-81344 Change-Id: Ica2d3ec09f2d38ba42fd930258cc765283f29a71 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
* | Merge remote-tracking branch 'origin/5.15' into devSimon Hausmann2020-03-161-2/+0
|\| | | | | | | | | | | | | Conflicts: src/corelib/kernel/qmetatype.cpp Change-Id: I88eb0d3e9c9a38abf7241a51e370c655ae74e38a
| * Deduplicate day-name data in QLocaleXML filesEdward Welbourne2020-03-161-2/+0
| | | | | | | | | | | | | | | | | | | | This is a follow-up to commit ebb0212133bd91f1da4931b29eb1d33fb77b1444. The day name data appeared twice in the XML files. Skip the second copy, saving 8.8% of the intermediate file-size. This makes no change to generated QLocale data. Change-Id: Ic2cc543a2a85cbb1d2d47ebac7df4fa9ad6ee0a7 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* | Merge remote-tracking branch 'origin/5.15' into devLars Knoll2020-03-043-11/+10
|\| | | | | | | Change-Id: I99ee6f8b4bdc372437ee60d1feab931487fe55c4
| * Rename the localexml module to qlocalexmlEdward Welbourne2020-03-033-4/+4
| | | | | | | | | | | | | | | | | | It implements interaction with the QLocaleXML file format type, so rename it to match. Task-number: QTBUG-81344 Change-Id: I46302d4ac1038cdfc5929e73b554b6d793814c56 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
| * Rename the endonym members of the Locale typeEdward Welbourne2020-03-032-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | All other members had camelCase names, but the endonyms had prefix_endonym names, requiring munging where they were emitted to XML. So just do that munging upstream in the attribute name of the Locale objects. Makes no change to the data output by the scripts, not even to the intermediate QLocaleXML file. Task-number: QTBUG-81344 Change-Id: I01c15a822216281dc669b3e7ebda096d18b04f9b Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
* | Use char16_t in favor of ushort for locale data tablesEdward Welbourne2020-02-171-1/+1
| | | | | | | | | | | | Change-Id: I890dd2b52c1b786db1081744c8ca343baba93de4 Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* | Allow surrogate pairs for various "single character" locale dataEdward Welbourne2020-02-173-83/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extract the character in its proper unicode form and encode it in a new single_character_data table of locale data. Record each entry as the range within that table that encodes it. Also added an assertion in the generator script to check that the digits CLDR gives us are a contiguous sequence in increasing order, as has been assumed by the C++ code for some time. Lots of number-formatting code now has to take account of how wide the digits are. This leaves nowhere for updateSystemPrivate() to record values read from sys_locale->query(), so we must always consult that function when accessing these members of the systemData() object. Various internal users of these single-character fields need the system-or-CLDR value rather than the raw CLDR value, so move QLocalePrivate's methods to supply them down to QLocaleData and ensure they check for system values, where appropriate first. This allows us to finally support the Chakma language and script, for whose number system UTF-16 needs surrogate pairs. Costs 10.8 kB in added data, much of it due to adding two new locales that need surrogates to represent digits. [ChangeLog][QtCore][QLocale] Various QLocale methods that returned single QChar values now return QString values to accommodate those locales which need a surrogate pair to represent the (single character) return value. Fixes: QTBUG-69324 Fixes: QTBUG-81053 Change-Id: I481722d6f5ee266164f09031679a851dfa6e7839 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* | Merge remote-tracking branch 'origin/5.15' into devLiang Qi2020-02-131-0/+1
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: examples/widgets/graphicsview/boxes/scene.h src/corelib/Qt5CoreMacros.cmake src/corelib/Qt6CoreMacros.cmake src/network/ssl/qsslsocket.cpp src/network/ssl/qsslsocket.h src/platformsupport/fontdatabases/windows/qwindowsfontenginedirectwrite.cpp src/testlib/CMakeLists.txt src/testlib/.prev_CMakeLists.txt tests/auto/corelib/tools/qscopeguard/tst_qscopeguard.cpp Disabled building manual tests with CMake for now, because qmake doesn't do it, and it confuses people. Done-With: Alexandru Croitor <alexandru.croitor@qt.io> Done-With: Volker Hilsheimer <volker.hilsheimer@qt.io> Change-Id: I865ae347bd01f4e59f16d007b66d175a52f1f152
| * Enable system locale to skip digit-grouping if configured to do soEdward Welbourne2020-02-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On macOS it's possible to configure the system locale to not do digit grouping (separating "thousands", in most western locales); it then returns an empty string when asked for the grouping character, which QLocale's system-configuration then ignored, falling back on using the base UI locale's grouping separator. This could lead to the same separator being used for decimal and grouping, which should never happen, least of all when configured to not group at all. In order to notice when this happens, query() must take care to return an empty QString (as a QVariant, which is then non-null) when it *has* a value for the locale property, and that value is empty, as opposed to a null QVariant when it doesn't find a configured value. The caller can then distinguish the two cases. Furthermore, the group and decimal separators need to be distinct, so we need to take care to avoid cases where the system overrides one with what the CLDR has given for the other and doesn't over-ride that other. Only presently implemented for macOS and MS-Win, since the (other) Unix implementation of the system locale returns single QChar values for the numeric tokens - see QTBUG-69324, QTBUG-81053. Fixes: QTBUG-80459 Change-Id: Ic3fbb0fb86e974604a60781378b09abc13bab15d Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* | Separate offsets from sizes in QLocale's dataEdward Welbourne2020-01-303-108/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This enables us to make the sizes quint8 and benefit from the resulting packing, making the locale data smaller. The sizes for long month-name lists (which concatenate twelve names with semicolon as separator) can overflow an 8-bit member, so use quint16 where needed. Re-ordered the data in QLocaleData and QCalendarLocale. Now all long-short(-narrow) families arise in that order; and any standalone is grouped with the one of the same length. (This cost 20 bytes in the date-format table, which optimises out more duplication if short is before long, but the saving in the (smaller) time-format table more than make up for it; and 20 bytes isn't worth the confusion that being inconsistent in ordering might cause.) At the same time, drop trailing semicolons from list entries (which join various names with semicolon) as they're not needed: we know where the end of the list is, because we know the size of the string that results from concatenation. The code that parses such lists can even correctly handle empty entries at the end. Saves 26 kB of data in the compiled binaries. Task-number: QTBUG-81053 Change-Id: If6ccc96a6910828817aa605d10fd814f567ae1e8 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* | Deduplicate locale data tablesEdward Welbourne2020-01-301-22/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some entries in tables were sub-strings (e.g. prefixes) of others. Since we store start-index and length (with no need for terminators), any entry that appears as a sub-string of an earlier entry can be recorded without making a separate copy of its content, just by recording where it appeared as a sub-string of an earlier entry. (Sadly this doesn't apply to month- or day-names and their short-forms: for those, we store ';'-joined lists. Thus, although each short-form is a prefix of its long-form, the short-form is stored in a list with other short-forms; and this is not a prefix of the list of matching long-forms.) The savings are modest (780 bytes at present), but cost us nothing except when running the python script that generates the data files (it takes a little longer now), which usually only happens at a CLDR update. Change-Id: I05bdaa9283365707bac0190ae983b31f074dd6ed Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* | Minor tidy-up in qlocalexml2cpp.pyEdward Welbourne2020-01-301-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split a long line. Use pythonic chained comparison to save some repetition. Comment on a field not currently in actual use. Say "zeros" rather than "0s" in one comment to match another. Added a .h suffix to the main locale data tempfile to match the naming of the tempfiles used for calendar data. Simplify generation of the blank line between Language and Script; and include a matching blank between Script and Country. This adds one blank line to qlocale.h Removed a stray space that misaligned locale data lines. This produces a space-only change in the generated *_data_p.h files. Change-Id: I974a9e8923c3dfd2178855d2cf1d6a5074e130b3 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* | Preserve the case of the exponent separator CLDR suppliesEdward Welbourne2020-01-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have long (since 4.5.1) coerced it to lower-case, for no readily apparent, much less documented, reason. CLDR says most locales use an upper-case E for this - let's actually use what CLDR says we should use. The code that matches the exponent separator was doing so case-insensitively in any case; that needed adaptation now that the separator's case isn't pre-determined; and, in any case, should have been done using case-folding rather than upper-casing. In the process, removed some spurious checks for "'e' or 'E'" in the result, since the exponent separator is always represented by 'e' (and an 'e' might also be present for the separate reason of its use as a beyond-decimal digit representing fourteen). [ChangeLog][QtCore][QLocale] QLocale::exponential() now preserves the case of the CLDR source, where previously it was lower-cased. Change-Id: Ic9ac02136cff79cb9f136d72141b5dbf54d9e0a6 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* | Ensure we use UTF-8 for the emitted QLocaleXML data fileEdward Welbourne2020-01-291-1/+5
|/ | | | | | | | | | | | Python helpfully uses a sensible locale when stdout is a tty but uses the system (not the filesystem) default encoding, which may be ascii and unable to encode some of the data we need to save. So brute force kludge it to ensure sys.stdout.encoding is UTF-8 when writing the output we'll read as UTF-8 anyway. Task-number: QTBUG-79902 Change-Id: I218dc0ec4c71a6b1b7181db55b018266d803bc58 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Update CLDR to v36Edward Welbourne2019-10-252-3/+8
| | | | | | | | | | | | | | | | | | | | | | Released on October 4th. Adds Windows names for two time zones, Qyzylorda and Volgograd. Added languages Chickasaw (cic), Muscogee (mus) and Silesian (szl). Norwegian number formatting has flipped back to using colon rather than dot as time separator; it's flipped back and forth over the last several CLDR releases. The dot form is present as a variant, the colon form was long given as the normal pattern, then went away; but now it's back as a contributed draft and that's what we pick up. The MS-Win time-zone ID script was iterating a dict, causing random reshuffling when new entries are added. Fixed that by doing the critical iteration in sorted order. Omitted locales ccp_BD and ccp_IN due to QTBUG-69324. Task-number: QTBUG-79418 Change-Id: I43869ee1810ecc1fe876523947ddcbcddf4e550a Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Correct some references to corelib/tools/ to say corelib/text/Edward Welbourne2019-10-251-1/+1
| | | | | | | | | | The Unicode data tables moved with QString and friends. So did the locale data generated from CLDR. This amends commit a9aa206b7b8ac4e69f8c46233b4080e00e845ff5. Change-Id: If12f0420b559dcb78993adc00e9f39751bca684a Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Add support for the Islamic Civil calendarSoroush Rabiei2019-08-223-3/+18
| | | | | | | | | | | | | | | | This has its own locale data, extracted from CLDR. This data may potentially be shared with other variants on the Islamic calendar, so is handled by a separate base-class, QHijriCalendar, on which such variants may base their implementations. [ChangeLog][QtCore][QCalendar] Added support for the Islamic Civil calendar, controlled by feature islamiccivilcalendar, with locale data that can be shared with other implementations, controlled by feature hijricalendar. Fixes: QTBUG-56675 Change-Id: Idf32d3da7034baa8ec5e66ef847e59a8a2f31cbd Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Add support for the Jalali (Solar Hijri or Persian) calendarSoroush Rabiei2019-08-213-2/+7
| | | | | | | | | | | | This has its own locale data, extracted from CLDR. [ChangeLog][QtCore][QCalendar] Added support for the Jalali (Persian or Solar Hijri) calendar, controlled by feature jalalicalendar. Fixes: QTBUG-58404 Change-Id: Id5c56a10db05a4fd612aafc01615273db81ec743 Reviewed-by: Paul Wicking <paul.wicking@qt.io> Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Add support for calendars beside GregorianSoroush Rabiei2019-08-203-54/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add QCalendarBackend as a base class for calendar implementations and QCalendar as a facade via which to access it. QDate's implicit implementation of the Gregorian calendar becomes QGregorianCalendar and QDate methods now support choice of calendar. Convert QLocale's CLDR data for month names to a locale-data component of each supported calendar and relevant QLocale methods now support choice of calendar. Adapt Python scripts for locale data generation to extract month name data from CLDR (keeping on version v35.1) into the new calendar-locale files. The locale data for the Gregorian calendar is held in a Roman calendar base, for sharing with other calendars. Add tests for basic uses of the new API. [ChangeLog][QtCore][QCalendar] Added QCalendar to support diverse calendars, supported by implementing QCalendarBackend. [ChangeLog][QtCore][QDate] Allow choice of calendar in various operations, with Gregorian remaining the default. Done-with: Lars Knoll <lars.knoll@qt.io> Done-with: Edward Welbourne <edward.welbourne@qt.io> Fixes: QTBUG-17110 Fixes: QTBUG-950 Change-Id: I9d6278f394269a183aee8156e990cec4d5198ab8 Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Extract a large format string as a module constant valueSoroush Rabiei2019-08-081-14/+15
| | | | | | | | | The template for the "This is a generated file" notice made a clumsy intrusion in the code in which it appeared, so split it out as a constant of the module and access it by name where it's used. Change-Id: Ic4dfb8e873078c54410b191654d6c21d082c9016 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Move text-related code out of corelib/tools/ to corelib/text/Edward Welbourne2019-07-102-11/+11
| | | | | | | | This includes byte array, string, char, unicode, locale, collation and regular expressions. Change-Id: I8b125fa52c8c513eb57a0f1298b91910e5a0d786 Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Add data for Windows Time-Zone IDs added in the last two yearsEdward Welbourne2019-07-011-0/+29
| | | | | | | | | | | | | | | | | | | We've not run util/locale_database/cldr2qtimezone.py for a while, so CLDR has had time to add several more zones. Catch up, inserting the new entries in order. Change-Id: I8625548b0f7775958230eccbd89b897d7afed9e9 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Tidy up in cldr2qtimezone.py and document the need to run itEdward Welbourne2019-07-012-12/+11
| | | | | | | | | | | | | | | | | | | It wasn't mentioned in cldr2qlocalexml.py's instructions, so I didn't know to run it. The data it used in an illustration was out of date. Two tests could be combined with no loss. Change-Id: I26e619e6210ea5b1258326fc4bc2b6aee9d6a999 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* cldr2qtimezone.py: report all missing zones, rather than just the firstEdward Welbourne2019-07-011-1/+6
| | | | | | | | | | | | | | | | | | | | | When scanning the CLDR data, the script raised an exception if it didn't recognize a zone ID. Instead, collect up such unrecognized IDs in a list and report them all at the end, so that whoever runs this can do them all in one go, rather than doing one, running the script, doing the next, running the script, ad nauseam. Change-Id: Ia659f1d1c7e1c1b4ccb87cc23828a0588a5bf958 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Use simpler data structures in cldr2qtimezone.pyEdward Welbourne2019-07-011-169/+165
| | | | | | | | | | | | | | | | | | | | | | Use tuples for the fixed data. The numbering of rows in the data tables isn't part of any public API, so we can change it freely; it is thus unnecessary, as we can just enumerate a tuple of the data values to generate sequential indices on the fly. (Updates to the data shall no longer need to renumber in order to insert entries.) Restore ordering of the data tables, and remove wanton spacing from inside parens, in the process. Change-Id: I59956cfb6191fe729300b57070671b7e66bd0379 Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Separate out the time, zone, date code from corelib/tools/Edward Welbourne2019-06-061-2/+2
| | | | | | | | | | We'll be adding calendar code here as well, and tools/ was getting rather crowded, so it looks like time to move out a reasonably coherent sub-bundle of it all. Change-Id: I7e8030f38c31aa307f519dd918a43fc44baa6aa1 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Add locale support for Cebuano and Erzya languages (new in CLDR v35.1)Edward Welbourne2019-05-201-0/+2
| | | | | | | Change-Id: I5d0ee7bc27eeca1c046d442b0410128ea5abbdb3 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* Suggest name, when available, for unknown codesEdward Welbourne2019-05-202-5/+57
| | | | | | | | | | | | | | | | When parsing the CLDR data, we only handle language, script and territory (which we call country) codes if they are known to our enumdata.py tables. When reporting the rest as unknown, in the content of an actual locale definition (not the likely subtag data), check whether en.xml can resolve the code for us; if it can, report the full name it provides, as a hint to whoever's running the script that an update to enumdata.py may be in order. Change-Id: I9ca1d6922a91d45bc436f4b622e5557261897d7f Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* Rename util/locale_database/ to include the e that was missingEdward Welbourne2019-05-2015-0/+4137
It was misnamed local_database, quite missing the point of its name. Change-Id: I73a4fdf24f53daac12304de1f443636d89afacb2 Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>