summaryrefslogtreecommitdiffstats
path: root/util/locale_database/cldr2qlocalexml.py
Commit message (Collapse)AuthorAgeFilesLines
* Apply a common style to the main()s of locale database programsEdward Welbourne2024-04-261-5/+14
| | | | | | | | | | | | | | Include documentation in both, using common phrasing. Take sys.argv as a parameter, along with sys.stdout and sys.stderr, so that we can invoke them from python when importing into a python session to debug or test. Supply the script name to the argument parser as prog, so it can correctly report it and forward the rest of argv to parse_args(). Remove comments anticipating one of the several calendars we don't yet support; the existing entries suffice to make clear what shall be needed when we get round to adding more. Change-Id: I2cebd385679e3c84d4ccf899e60091ac823ad10d Reviewed-by: Mate Barany <mate.barany@qt.io>
* Prepare to support taking CLDR data from its github upstreamEdward Welbourne2024-01-191-10/+16
| | | | | | | | | | | | | | We've previously used the zip-file form, but that's not been published for CLDR v44.1 - the advice on the list was to use github instead. That, however, has ↑↑↑ as a special value for fields, meaning to inherit from a prent locale. So special-case that value. I have verified that v44 from the zip file produces identical results to v44 from github, with this minor fix. As it happens v44.1 also produces identical results. Pick-to: 6.7 6.5 Change-Id: I6eb0aedda7556753cdc83bb9d76652fbb68dc669 Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
* Use CLDR's names in QLocale::*ToName() for language, script, territoryEdward Welbourne2023-08-091-1/+1
| | | | | | | | | | | | | | | Various comments need to continue using the enumdata.py names, as they associate data with particular enum members, but we can now correctly use the en.xml versions of their names when we report them, rather than the enum-friendly names we use in the code. Since this now means the data may stray outside plain ASCII - it'll be UTF-8-encoded - this implies replacing the QLatin1StringView()s of the code that formerly read this data with QString::fromUtf8(). Fixes: QTBUG-94460 Change-Id: Id3b08875a46af58c0555c3e303b0e15a19441509 Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Use SPDX license identifiersLucie Gérard2022-05-161-28/+2
| | | | | | | | | | | | | Replace the current license disclaimer in files by a SPDX-License-Identifier. Files that have to be modified by hand are modified. License files are organized under LICENSES directory. Task-number: QTBUG-67283 Change-Id: Id880c92784c40f3bbde861c0d93f58151c18b9f1 Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
* Use a HTTPS URL for the CLDR download linkIevgenii Meshcheriakov2021-11-051-1/+1
| | | | | | | | | FTP is insecure and is not supported by modern browsers anymore. See also: https://mywiki.wooledge.org/FtpMustDie Change-Id: Iad65d29912e79a4f3fadb9317bb5d9c5fe9b68d7 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* locale_database: Use pathlib to manipulate paths in Python codeIevgenii Meshcheriakov2021-07-191-4/+6
| | | | | | | | | | | pathlib's API is more modern and easier to use than os.path. It also allows to distinguish between paths and other strings in type annotations. Task-number: QTBUG-83488 Pick-to: 6.2 Change-Id: Ie6d9b4e35596f7f6befa4c9635f4a65ea3b20025 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* locale_database: Use argparse module to parse command line argumentsIevgenii Meshcheriakov2021-07-161-29/+25
| | | | | | | | | | | arparse is the standard way to parse command line arguments in Python. It provides help and usage information for free and is easier to extend than a custom argument parser. Task-number: QTBUG-83488 Pick-to: 6.2 Change-Id: I1e4c9cd914449e083d01932bc871ef10d26f0bc2 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* locale_database: Use f-strings in Python codeIevgenii Meshcheriakov2021-07-161-8/+7
| | | | | | | | | | | Replace most uses of str.format() and string arithmetic by f-strings. This results in more compact code and the code is easier to read when using an appropriate editor. Task-number: QTBUG-83488 Pick-to: 6.2 Change-Id: I3409f745b5d0324985cbd5690f5eda8d09b869ca Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Convert CLDR scripts to Python 3Ievgenii Meshcheriakov2021-07-151-5/+1
| | | | | | | | | | | The convertion is moslty done using 2to3 script with manual cleanup afterwards. Task-number: QTBUG-83488 Pick-to: 6.2 Change-Id: I4d33b04e7269c55a83ff2deb876a23a78a89f39d Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Report unused enum members after CLDR data scanEdward Welbourne2021-06-071-1/+1
| | | | | | | | We should at least know when members of QLocale's enums aren't adding any value, and it may make sense to deprecate the unused ones. Change-Id: Icf202f81d2a35904c13ccdc202d41985bcb3f2e6 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
* Nomenclature change: s/countr/territor/g in locale scriptsEdward Welbourne2021-05-261-1/+1
| | | | | | | | | | Change the nomenclature used in the scripts and the QLocaleXML data format to use "territory" and "territories" in place of "country" and "countries". Does not change the generated source files. Change-Id: I4b208d8d01ad2bfc70d289fa6551f7e0355df5ef Reviewed-by: JiDe Zhang <zhangjide@uniontech.com> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QLocaleXmlWriter.enumData(): move enumdata import to method from callerEdward Welbourne2021-05-261-3/+2
| | | | | | | | | The only reason cldr.py imported enumdata was so as to pass what it imported to writer.enumData(); that method might as well do the import itself. Change-Id: Ie77dcd29058f926b8cca4deef35837f30505859f Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
* Update to CLDR v38.1, adding Yukon Standard TimeEdward Welbourne2021-01-271-1/+1
| | | | | | | | | | | No change to QLocale's data, one addition to the Windows time-zone data. What was formerly "Us Mountain Standard time / Canada" is now Yukon Standard Time. Fixes: QTBUG-89784 Pick-to: 6.0 5.15 Change-Id: I4c9a23620e74ea379be8a4c5ba0896d35fe9b594 Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
* Remove unused importsDimitrios Apostolou2020-07-101-1/+0
| | | | | | | As found by LGTM.com. Change-Id: I1704f10f9bab1b11ab22824aca0cfcdcb47fef2f Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Fix parameter order in cldr2qlocalexml.py's usage()Edward Welbourne2020-04-061-1/+1
| | | | | | | Callers and definition were out of sync. Change-Id: Icda26887cb64c61c7e373766f25559b0d450d112 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
* Ensure we use UTF-8 for the emitted QLocaleXML data fileEdward Welbourne2020-04-021-1/+5
| | | | | | | | | | | | | | | Python helpfully uses a sensible locale when stdout is a tty but uses the system (not the filesystem) default encoding, which may be ascii and unable to encode some of the data we need to save. So brute force kludge it to ensure emit.encoding is UTF-8 when writing the output we'll read as UTF-8 anyway. (This matches dev's commit 0ef79d94f6dcf276ca55b084d27f980b1f260473 for the reworked version of the script.) Task-number: QTBUG-79902 Change-Id: I60ddc896a308c06e01fa87e8e18e112faa17d601 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
* Rework cldr2qlocalexml.py's reading of CLDR dataEdward Welbourne2020-04-021-591/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the code out to a CldrReader class in cldr.py, expand CldrAccess with facilities that needs, expand ldml.py to include support for more features, finally making xpathlite.py redundant. This initial commit aims, though, to be bug-for-bug compatible with xpathlite in its reading of the CLDR data. It turns out we've been using draftier data than we were aware of (which might not be a bad thing). The xpathlite code appeared to check for draft attributes, but these only appear on leaf nodes and most data were fetched by finding a parent and then scanning its children without the draft check; only am/pm data was actually being excluded based on draft values. (We allowed contributed, for am/pm, in addition to approved, which is all the xpathlite code allows otherwise.) There are also some less equivocal bugs; I'll deal with these in later commits. Simplified number-system data look-ups; the old get_number_in_system() was taking care of old LDML versions' placement of the number system attribute; this is no longer needed. (It was also being used for a currency value to which it was not appropriate, which is now handled separately; this is one of the bugs mentioned above.) Ditched a fall-back to nativeZeroDigit, which no longer exists in CLDR. Change the command-line to take the root of the CLDR data tree, rather than its common/main/ sub-directory. Support naming the file to which to write output, as a second command-line argument, instead of always writing to stdout (which remains the default) and leaving whoever runs the script to redirect stdout. Support (internally for now, while adding TODOs to give main() more command-line options) separating the stderr output into its more and less interesting parts; for now, continue producing both, but suppress the least interesting entirely. Task-number: QTBUG-81344 Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
* Move some shared code to a localetools moduleEdward Welbourne2020-04-021-26/+28
| | | | | | | | | | | | | | | The time-zone script was importing two functions from the locale data generation script. Move them to a separate module, to which I'll shortly add some more shared utilities. Cleaned up some imports in the process. Combined qlocalexml2cpp's and xpathlit's error classes into a new Error class in the new module and made it a bit more like a proper python error class. Task-number: QTBUG-81344 Change-Id: Idbe0139ba9aaa2f823b8f7216dee1d2539c18b75 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
* Rework cldr2qlocalexml.py in terms of a QLocaleXmlWriter classEdward Welbourne2020-04-021-174/+138
| | | | | | | | | | | | | Delegate the output of XML to a helper class provided by qlocalexml.py and restructure the driver script so that it can be imported without running anything. It now has a minimal __name__ == '__main__' block that calls a main() function. This, for the moment, requires a global via which it shares the CLDR directory with various other functions; that shall go away in a later commit. Task-number: QTBUG-81344 Change-Id: Ica2d3ec09f2d38ba42fd930258cc765283f29a71 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
* Rename the localexml module to qlocalexmlEdward Welbourne2020-03-031-1/+1
| | | | | | | | | It implements interaction with the QLocaleXML file format type, so rename it to match. Task-number: QTBUG-81344 Change-Id: I46302d4ac1038cdfc5929e73b554b6d793814c56 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Rename the endonym members of the Locale typeEdward Welbourne2020-03-031-2/+2
| | | | | | | | | | | | | All other members had camelCase names, but the endonyms had prefix_endonym names, requiring munging where they were emitted to XML. So just do that munging upstream in the attribute name of the Locale objects. Makes no change to the data output by the scripts, not even to the intermediate QLocaleXML file. Task-number: QTBUG-81344 Change-Id: I01c15a822216281dc669b3e7ebda096d18b04f9b Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
* Enable system locale to skip digit-grouping if configured to do soEdward Welbourne2020-02-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | On macOS it's possible to configure the system locale to not do digit grouping (separating "thousands", in most western locales); it then returns an empty string when asked for the grouping character, which QLocale's system-configuration then ignored, falling back on using the base UI locale's grouping separator. This could lead to the same separator being used for decimal and grouping, which should never happen, least of all when configured to not group at all. In order to notice when this happens, query() must take care to return an empty QString (as a QVariant, which is then non-null) when it *has* a value for the locale property, and that value is empty, as opposed to a null QVariant when it doesn't find a configured value. The caller can then distinguish the two cases. Furthermore, the group and decimal separators need to be distinct, so we need to take care to avoid cases where the system overrides one with what the CLDR has given for the other and doesn't over-ride that other. Only presently implemented for macOS and MS-Win, since the (other) Unix implementation of the system locale returns single QChar values for the numeric tokens - see QTBUG-69324, QTBUG-81053. Fixes: QTBUG-80459 Change-Id: Ic3fbb0fb86e974604a60781378b09abc13bab15d Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Add support for the Islamic Civil calendarSoroush Rabiei2019-08-221-1/+1
| | | | | | | | | | | | | | | | This has its own locale data, extracted from CLDR. This data may potentially be shared with other variants on the Islamic calendar, so is handled by a separate base-class, QHijriCalendar, on which such variants may base their implementations. [ChangeLog][QtCore][QCalendar] Added support for the Islamic Civil calendar, controlled by feature islamiccivilcalendar, with locale data that can be shared with other implementations, controlled by feature hijricalendar. Fixes: QTBUG-56675 Change-Id: Idf32d3da7034baa8ec5e66ef847e59a8a2f31cbd Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Add support for the Jalali (Solar Hijri or Persian) calendarSoroush Rabiei2019-08-211-1/+1
| | | | | | | | | | | | This has its own locale data, extracted from CLDR. [ChangeLog][QtCore][QCalendar] Added support for the Jalali (Persian or Solar Hijri) calendar, controlled by feature jalalicalendar. Fixes: QTBUG-58404 Change-Id: Id5c56a10db05a4fd612aafc01615273db81ec743 Reviewed-by: Paul Wicking <paul.wicking@qt.io> Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Add support for calendars beside GregorianSoroush Rabiei2019-08-201-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add QCalendarBackend as a base class for calendar implementations and QCalendar as a facade via which to access it. QDate's implicit implementation of the Gregorian calendar becomes QGregorianCalendar and QDate methods now support choice of calendar. Convert QLocale's CLDR data for month names to a locale-data component of each supported calendar and relevant QLocale methods now support choice of calendar. Adapt Python scripts for locale data generation to extract month name data from CLDR (keeping on version v35.1) into the new calendar-locale files. The locale data for the Gregorian calendar is held in a Roman calendar base, for sharing with other calendars. Add tests for basic uses of the new API. [ChangeLog][QtCore][QCalendar] Added QCalendar to support diverse calendars, supported by implementing QCalendarBackend. [ChangeLog][QtCore][QDate] Allow choice of calendar in various operations, with Gregorian remaining the default. Done-with: Lars Knoll <lars.knoll@qt.io> Done-with: Edward Welbourne <edward.welbourne@qt.io> Fixes: QTBUG-17110 Fixes: QTBUG-950 Change-Id: I9d6278f394269a183aee8156e990cec4d5198ab8 Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Move text-related code out of corelib/tools/ to corelib/text/Edward Welbourne2019-07-101-2/+2
| | | | | | | | This includes byte array, string, char, unicode, locale, collation and regular expressions. Change-Id: I8b125fa52c8c513eb57a0f1298b91910e5a0d786 Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Tidy up in cldr2qtimezone.py and document the need to run itEdward Welbourne2019-07-011-0/+3
| | | | | | | | | | | | | | | | | | | It wasn't mentioned in cldr2qlocalexml.py's instructions, so I didn't know to run it. The data it used in an illustration was out of date. Two tests could be combined with no loss. Change-Id: I26e619e6210ea5b1258326fc4bc2b6aee9d6a999 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Suggest name, when available, for unknown codesEdward Welbourne2019-05-201-3/+31
| | | | | | | | | | | | | | | | When parsing the CLDR data, we only handle language, script and territory (which we call country) codes if they are known to our enumdata.py tables. When reporting the rest as unknown, in the content of an actual locale definition (not the likely subtag data), check whether en.xml can resolve the code for us; if it can, report the full name it provides, as a hint to whoever's running the script that an update to enumdata.py may be in order. Change-Id: I9ca1d6922a91d45bc436f4b622e5557261897d7f Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* Rename util/locale_database/ to include the e that was missingEdward Welbourne2019-05-201-0/+663
It was misnamed local_database, quite missing the point of its name. Change-Id: I73a4fdf24f53daac12304de1f443636d89afacb2 Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>