summaryrefslogtreecommitdiffstats
path: root/src/corelib/text/qstringconverter.cpp
Commit message (Collapse)AuthorAgeFilesLines
* QString{En,De}coder: add constructors with QString parametersThiago Macieira13 days1-0/+20
| | | | | | | | | | | | | Because QStringConverter::availableCodecs() returns QString. Added as Q_WEAK_OVERLOAD so this doesn't create ambiguous overloads when passing QByteArrays. Fixes: QTBUG-123919 Change-Id: If1bf59ecbe014b569ba1fffd17c29a253ac22abe Reviewed-by: Sune Vuorela <sune@vuorela.dk> Reviewed-by: Fabian Kosmale <fabian.kosmale@qt.io> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QStringConverterICU: Pass correct pointer to callbackFabian Kosmale2024-04-191-2/+2
| | | | | | | | | | | | | | | Pass the pointer to the current state, not a pointer to a pointer to it. [ChangeLog][QtCore][QStringConverter] Fixed a bug involving moved QStringEncoder/QStringDecoder objects accessing invalid state. Amends 122270d6bea164e6df4357f4d4d77aacfa430470. Done-with: Marc Mutz <marc.mutz@qt.io> Pick-to: 6.7 6.5 Change-Id: I70d4dc00e3e0db6cad964579662bcf6d185a4c34 Reviewed-by: Fabian Kosmale <fabian.kosmale@qt.io> Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
* QStringConverter/Doc: add more details about additional codecsThiago Macieira2024-04-181-1/+9
| | | | | | | | Fixes: QTBUG-124221 Pick-to: 6.7 Change-Id: If1bf59ecbe014b569ba1fffd17c4d113d02425eb Reviewed-by: Fabian Kosmale <fabian.kosmale@qt.io> Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
* Bootstrap: remove the UTF-16 and UTF-32 codecsThiago Macieira2024-03-131-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | Unlike most of everything else in the Bootstrap lib, this is code that couldn't be eliminated by the linker because they were referenced in one static array. Maybe an exceptionally smart whole-program analysis could do it, but GCC and Clang LTO modes don't do that now. I removed the code that performed detection from HTML and from data too. I could have left the detection of UTF-8 and "other" but this code wasn't necessary. In particular, QTextStream couldn't benefit from it because it already defaults to UTF-8, so the detection code would never determine anything different from the input. Drive-by removed QStringConverter::availableCodecs() too because it was in the middle of functions #ifdef'ed out to. This reduced the size of release-mode moc text data bss dec hex filename 1079858 5440 640 1085938 1091f2 original/moc 1074386 5200 640 1080226 107ba2 updated/moc -5472 -240 0 -5712 difference Change-Id: I01ec3c774d9943adb903fffd17b7f114c42874ac Reviewed-by: Lars Knoll <lars@knoll.priv.no>
* QLocal8Bit::convertToUnicode[win]: rewrite remainingChars handling as recursiveMårten Nordheim2024-03-021-52/+62
| | | | | | | | | | | | | Then we will automatically handle invalid leading characters instead of throwing away the whole sequence when it cannot be converted. Added a test that was failing before. Drive-by change: add a comment explaining why we have the stack allocated buffer. Task-number: QTBUG-118834 Change-Id: I647a58f2ba95e2e7ed4ea6a964d99ecc0c91fad3 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QLocal8Bit::convertToUnicode[win]: Fix order of growth and saturateMårten Nordheim2024-02-231-1/+1
| | | | | | | | | | | | | | The order was wrong so we could have ended up saturating a 0 before we grew to 1. Since this has never been in a release it is of no concern, and it was already an edge-case anyway. Amends 1090d5dd4ae5be898d4566314eda43b0283709d9 Pick-to: 6.7 6.6 6.5 Change-Id: I4b70f9018c3049697495a58313af148f8366c8bb Reviewed-by: Marc Mutz <marc.mutz@qt.io> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QLocal8Bit::convertFromUnicode[win]: limit fprintf to !NDEBUGMårten Nordheim2024-02-161-0/+2
| | | | | | | | | Because there is no other way to stop it from printing the output. Pick-to: 6.7 6.6 6.5 Change-Id: Ie6dcb393351f50691366849ba85d60e2e186f9fb Reviewed-by: Edward Welbourne <edward.welbourne@qt.io> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QLocal8Bit::convert{To,From}Unicode[win]: use more of stateMårten Nordheim2024-02-141-5/+26
| | | | | | | | | | | | | | | Like other backends we should increment the invalid character count when we output a replacement character. And we should also output the NULL character if requested! The downside here is that convertFromUnicode doesn't even have the ability to do so. So instead I added a comment explaining why it is not handled there. Task-number: QTBUG-118318 Pick-to: 6.7 6.6 6.5 Change-Id: I57ba631aa59454e77007ab353277b7e8c2b5526a Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QLocal8Bit::convertFromUnicode[win]: support more than 2Gi inputMårten Nordheim2024-02-121-8/+47
| | | | | | | | | | | | | | | | As we did for convertToUnicode. To support more than 2Gi input, we need to handle the input in chunks because of the `int` parameter in the Windows API. Testing also revealed some corner cases we also need to handle, which is mostly happening when there is an incomplete surrogate pair at the end of the current input window. The test takes between 3 (plain MinGW) and 8 (MSVC with ASAN) seconds to run on my machine. Pick-to: 6.7 6.6 6.5 Fixes: QTBUG-105105 Change-Id: I4fb0420b88ca41dfa8b561a35c6d96659bd81468 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QLocal8Bit::convertFromUnicode[win]: Pre 2Gi changesMårten Nordheim2024-02-121-20/+29
| | | | | | | | | | | | | As we did for convertToUnicode, we do some smaller changes, like increasing indentation, and switching to using pointers and calculating the input-size in this commit, so that the real changes in the next commit are (hopefully) easier to read. Pick-to: 6.7 6.6 6.5 Task-number: QTBUG-105105 Change-Id: I3bf1a487f63a3e24efd7a945152647dd8fc0aca8 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QLocal8Bit::convertToUnicode[win]: support more than 2Gi inputMårten Nordheim2024-02-121-18/+50
| | | | | | | | | | | | | | | | | | | | | To properly support more than 2Gi input we have to support being asked to resize more than once. Previously we would only have to resize the one time because we went from our 4K stack buffer to the final size heap buffer. But now, since our input size can only be specified in int, we have to deal with looping over the input and resizing the buffer as needed. We also have to deal with trailing data at the end of our sliding window potentially causing issues for the encoding. So we try to shrink our window when it causes issues, or store the trailing data for the next call. The >2Gi test takes about 6-8 seconds on my machine. Pick-to: 6.7 6.6 6.5 Task-number: QTBUG-105105 Change-Id: I9a44b8f379bf2c2c58183f961544ed2f4c8c7215 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QLocal8Bit::convertToUnicode[win]: split out buffer growingMårten Nordheim2024-02-121-8/+27
| | | | | | | | | | | | | | | | | | We will need to potentially grow the buffer before appending anything to it, because if we pass in 0 as a size then the MultiByteToWideChar just returns the size we would need. If we didn't intend to do so then we would increment our output buffers even though nothing is written. And when appending single characters (like the replacement character for an invalid sequence) we need to grow the buffer as well. We'll need this all in the next commit. Pick-to: 6.7 6.6 6.5 Task-number: QTBUG-105105 Change-Id: I94b9a0f7d18a725da01a47398163e6d0f704eefc Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Fix QStringConverter::encodingForName() for trailing `-`, `_`Marc Mutz2023-12-071-13/+8
| | | | | | | | | | | | | | | | | | The (internal) docs say that - and _ are ignored, and they're ignored everywhere, except as suffixes. If the old code only ignored them as infixes, fine, that would make some sense, but it ignored infixes and prefixes, so there's no reason for it to not ignore suffixes, too. Fix by continuing the loop until both input ranges are exhausted, or a mismatch was found. [ChangeLog][QtCore][QStringConverter] Fixed a bug where encodingForName() failed due to trailing characters (`_`, `-`) that ought to have been ignored. Pick-to: 6.6 6.5 Change-Id: Iec21489d988eda7d33c744c170f88cd665b73f34 Reviewed-by: Ivan Solovev <ivan.solovev@qt.io>
* QLocal8Bit::convertFromUnicode[win]: fix code unit pairingMårten Nordheim2023-11-151-3/+9
| | | | | | | | | | | | | | When we restore a high surrogate from the state, we need to make sure that the next code unit is a low surrogate. And if it is not then we should at least not throw it away. Amends d8d5922f16f1710b66caf718c302b633d2f78b0b Pick-to: 6.6 6.5 Task-number: QTBUG-118185 Task-number: QTBUG-105105 Change-Id: I64afa0d323d73422128e24e16755e648a8811523 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QStringConverter: harden encodingForName() against nullptrMarc Mutz2023-11-101-0/+2
| | | | | | | | | | | | | | | | | | | The nameMatch() function has an implicit precondition that neither argument is nullptr: it immediately dereferences both arguments. Prevent the crash by checking for name == nullptr early, before passing to nameMatch(). Add tests for null and empty. As a drive-by, make variables in the test const (needed for the QByteArray to avoid detaching, peer pressure for the others). Amends a639bcda1e42f48fa32885ede77f9fd320ce731c. Pick-to: 6.6 6.5 6.2 Change-Id: I4a30f6c130310eb701ba7c7251168294489c34db Reviewed-by: Ivan Solovev <ivan.solovev@qt.io>
* QLocal8Bit::convertToUnicode[win]: Pre-2Gi changesMårten Nordheim2023-11-061-26/+36
| | | | | | | | | | | | | | | | | Prepare the code for the upcoming changes to support strings longer than 2GiB. We will have to loop from start to end, and increment the pointer whenever we succeed, rather than assuming there is a single success before we return. This also means the error-handling code goes into an else-branch and gets indented. Pick-to: 6.6 6.5 Task-number: QTBUG-105105 Change-Id: Ibe49cc661f582fd54ce36ad466cf798a62b5c4c6 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QLocal8Bit::convert*Unicode[win]: Converge logicMårten Nordheim2023-11-061-15/+19
| | | | | | | | | | | | | I ended up writing different logic for similar things. And using points_into_range doesn't work if we, by coincidence, point at end, though this shouldn't be possible yet, but it may happen once we support input larger than 2Gi. So, let's instead check if the destination buffer has been initialized. Pick-to: 6.6 6.5 Task-number: QTBUG-105105 Change-Id: I28c367eb965339ae84355c0cac27c5d0352d9271 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QLocal8Bit::convertToUnicode[win]: Support stateless flagMårten Nordheim2023-11-061-1/+4
| | | | | | | | | By just setting state to nullptr. Pick-to: 6.6 6.5 Task-number: QTBUG-105105 Change-Id: I6b4f8fe39f1ba51dcfaf98ce7e42c2acd4c4cf98 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QLocal8Bit::convertFromUnicode[win]: handle trailing high surrogateMårten Nordheim2023-10-301-10/+42
| | | | | | | | | | | | | | | | | | | | | The win32 API doesn't give us much choice. _Some_ code pages have support for returning some error if we pass a specific flag, but not all of them. Anyway, since the code pages might not support all that UTF-16 provides, we can't reasonably make it error out on characters that cannot be converted. So, the most reasonable thing we can handle is a unpaired high surrogate at the end of a string, assume that the rest of the string was fine, and that the low surrogate will be provided in the next call. Pick-to: 6.6 6.5 Fixes: QTBUG-118185 Task-number: QTBUG-105105 Change-Id: I1f193c9d8e04bec769d885d32440c759d9dff0c2 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QLocal8Bit::convertFromUnicode[win] use local array for bufferMårten Nordheim2023-10-301-5/+18
| | | | | | | | | | | | | | To match convertToUnicode, we use a local array as a temporary buffer, then if any growth is needed we work directly with a QBA. As a drive-by: explicitly cast to int where we pass int Pick-to: 6.6 6.5 Task-number: QTBUG-105105 Task-number: QTBUG-118185 Change-Id: I1efff318eea41d87d558599d737b64107af4ae17 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QLocal8Bit::convertFromUnicode[win]: Drop UsedDefaultChar argumentMårten Nordheim2023-10-301-5/+5
| | | | | | | | | | | | | | | We don't use the value, and the docs[0] say that we get the best performance if we don't pass the argument. [0] https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte#remarks See table Pick-to: 6.6 6.5 Task-number: QTBUG-118185 Task-number: QTBUG-105105 Change-Id: I3eb5e023a936fe3def5169e3fb492a62708bbf44 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QLocal8Bit::convertFromUnicode[win] move cast earlierMårten Nordheim2023-10-301-10/+7
| | | | | | | | | | So we don't have to do this multiple times when calling the function. Pick-to: 6.6 6.5 Task-number: QTBUG-118185 Task-number: QTBUG-105105 Change-Id: Ifa72eedd5f71365618ec6b67fa3047f90f4eb541 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QLocal8Bit::convertToUnicode[win]: handle more than one octet stateMårten Nordheim2023-10-301-76/+54
| | | | | | | | | | | | | Both to store and to restore. Without this a 3 or more octet sequence would cause errors or wrong output. This can be seen with GB 18030. Pick-to: 6.6 6.5 Fixes: QTBUG-118318 Task-number: QTBUG-105105 Change-Id: Id1f7f5f2fba4633b9f888add2186f4d8d21b7293 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QLocal8Bit::convertToUnicode[win]: Drop MB_PRECOMPOSED flagMårten Nordheim2023-10-301-4/+4
| | | | | | | | | | | | | | | | A few code pages do not support this flag[0]. It's also deprecated[1] and is what Windows prefers to generate by default. So let's drop it. [0] https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar See note at the end for the dwFlags parameter. [1] It's mentioned in the header files, but not online... Pick-to: 6.6 6.5 Task-number: QTBUG-118185 Task-number: QTBUG-105105 Change-Id: I798c387170c73a953be874de139868543b2d775e Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QLocal8Bit::convertToUnicode[win]: Simplify state-handlingMårten Nordheim2023-10-301-29/+18
| | | | | | | | | | | | | | | | Instead of having separate variables for the state, that we then store back at the end, let's just make state-handing explicit, making the logic around it easier to follow. We now output Replacement Characters, if we try to decode stateless and have an invalid sequence at the end. Otherwise we fall back to convertToUnicodeCharByChar as before. Pick-to: 6.6 6.5 Task-number: QTBUG-118318 Task-number: QTBUG-105105 Change-Id: Ifea64bc241113f468b69cad16fc3cc97a6ebe646 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QLocal8Bit::convertToUnicode[win]: Drop QVLA in favor of arrayMårten Nordheim2023-10-271-18/+27
| | | | | | | | | | | We now, instead, resize and copy the data directly into a QString if we run out of space in the pre-allocated array. Pick-to: 6.6 6.5 Task-number: QTBUG-118318 Task-number: QTBUG-105105 Change-Id: I1eed5e75f0bd067b4e7d6bff97c4186f3f6ee0ad Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QLocal8Bit::convertToUnicode[win]: Fix indentationMårten Nordheim2023-10-271-3/+2
| | | | | | | | | | | To make future changes easier to read Pick-to: 6.6 6.5 Task-number: QTBUG-118318 Task-number: QTBUG-105105 Change-Id: I431c83b956c179b1d04c2bf51b744227f8b136be Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QStringConverter[win]: expose+test control of code-pageMårten Nordheim2023-10-241-10/+22
| | | | | | | | | | | | Then we can easily test how fromLocal8Bit() and toLocal8Bit() behave with different code-pages. Pick-to: 6.6 6.5 Task-number: QTBUG-118318 Task-number: QTBUG-118185 Task-number: QTBUG-105105 Change-Id: Ib1cd3bccd27d598f4c80915557e332befcd96354 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QStringConverter: clarify decode()/encode() API docsAhmad Samir2023-10-061-10/+15
| | | | | | | | | | | These methods return a struct which is implicitly convertible to QString/QByteArray respectively. Don't hide the return type from QDoc, this simplifies telling users what those methods return exactly. Fixes: QTBUG-117705 Pick-to: 6.6 6.5 Change-Id: Ibb22a1e54fffce8f5f20aaabe47983870ccfba1e Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Deprecate Q_ASSUME()Thiago Macieira2023-08-141-1/+1
| | | | | | | | | | | | | | | | | | | | We've known for a long time that this is producing worse code with GCC because of how we implemented in Q_ASSUME_IMPL(). So bite the bullet and actually deprecate the macro, replacing all extant Q_ASSUME() with Q_ASSERT(). The replacement is in C++23. Backporting the support onto Q_ASSUME_IMPL was previously rejected by reviewers. [ChangeLog][Deprecation Notice] The Q_ASSUME() macro is deprecated. This macro has different side-effects depending on the compiler used (GCC compared to Clang and MSVC), and there are certain conditions under which GCC is known to produce worse code than if the macro was absent. To give a hint to the compiler for optimizations, use the C++23 [[assume]] attribute. Change-Id: I80612a7d275c41f1baf0fffd177a3a4ad819fb2d Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Provide QStringConverter equivalent of QTextCodec::availableCodecsFabian Kosmale2023-07-181-0/+58
| | | | | | | | | | | | A text editor commonly wants to display a list of codecs that are supported. With the introduction of the ICU based QStringConverter, that list is no longer statically known. So provide the necessary functionality. Fixes: QTBUG-109104 Change-Id: I9ecf59aa6bcc6fe65c8872cab84affafec4fa362 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
* QStringDecoder: add a char16_t overload of appendToBuffer(QChar*, ~~~)Marc Mutz2023-06-041-0/+6
| | | | | | | | | | | | | More and more code in Qt uses char16_t instead of QChar, even QString::Data, so reduce the impedance mismatch with such code and supply a char16_t overload in parallel to the existing QChar* one. [ChangeLog][QtCore][QStringDecoder] Added appendToBuffer() overload for char16_t*, complementing the existing overload taking QChar*. Task-number: QTBUG-106198 Change-Id: I0cb8ab22c897c14b1318a676f5212cc0cf1b72b7 Reviewed-by: Fabian Kosmale <fabian.kosmale@qt.io>
* Misc: Fix qsizetype-related narrowing coversionsAhmad Samir2023-03-111-6/+6
| | | | | | Task-number: QTBUG-102461 Change-Id: I96757abc50fc45756bc1271a970f819a48021663 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QtMiscUtils: add some more character helpersAhmad Samir2023-02-071-9/+4
| | | | | | | | | | | | | isHexDigit, isOctalDigit, isAsciiDigit, isAsciiLower, isAsciiUpper, isAsciiLetterOrNumber. This de-duplicates some code through out. Rename two local lambdas that were called "isAsciiLetterOrNumber" to not conflict with the method in QtMiscUtils. Change-Id: I5b631f95b9f109136d19515f7e20b8e2fbca3d43 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QStringConverter: add QLatin1::convert{To,From}Unicode()Ahmad Samir2022-12-301-14/+2
| | | | | | | | With the methods that use helpers from qstring.cpp defined in the latter. Change-Id: I11d6b0bfb95efe34e56d33d2ecbfe8f4423a9e6c Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Add QUtf8::convertToUnicode(char16_t *, ....) overloadsAhmad Samir2022-12-231-14/+14
| | | | | | | Mark private API docs as \internal. Change-Id: Ia2fae84832d5f253ea730c1993ce1810f43dff78 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QStringConverter: don't shadow variablesAhmad Samir2022-12-201-2/+1
| | | | | Change-Id: I3c209585de2a7599f1cd4e58c1d3b3501425b903 Reviewed-by: Marc Mutz <marc.mutz@qt.io>
* Add In-place utf-8 case-insensitive comparisonsØystein Heskestad2022-12-021-3/+62
| | | | | | | | | | | Also add optimizations for more string comparisons and add tests and benchmarks. [ChangeLog][QtCore][QString] Added utf-8 case-insensitive comparisons Fixes: QTBUG-100235 Change-Id: I7c0809c6d80c00e9a5d0e8ac3ebb045cf7004a30 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Port from container::count() and length() to size() - V5Marc Mutz2022-11-031-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a semantic patch using ClangTidyTransformator as in qtbase/df9d882d41b741fef7c5beeddb0abe9d904443d8, but extended to handle typedefs and accesses through pointers, too: const std::string o = "object"; auto hasTypeIgnoringPointer = [](auto type) { return anyOf(hasType(type), hasType(pointsTo(type))); }; auto derivedFromAnyOfClasses = [&](ArrayRef<StringRef> classes) { auto exprOfDeclaredType = [&](auto decl) { return expr(hasTypeIgnoringPointer(hasUnqualifiedDesugaredType(recordType(hasDeclaration(decl))))).bind(o); }; return exprOfDeclaredType(cxxRecordDecl(isSameOrDerivedFrom(hasAnyName(classes)))); }; auto renameMethod = [&] (ArrayRef<StringRef> classes, StringRef from, StringRef to) { return makeRule(cxxMemberCallExpr(on(derivedFromAnyOfClasses(classes)), callee(cxxMethodDecl(hasName(from), parameterCountIs(0)))), changeTo(cat(access(o, cat(to)), "()")), cat("use '", to, "' instead of '", from, "'")); }; renameMethod(<classes>, "count", "size"); renameMethod(<classes>, "length", "size"); except that the on() matcher has been replaced by one that doesn't ignoreParens(). a.k.a qt-port-to-std-compatible-api V5 with config Scope: 'Container'. Added two NOLINTNEXTLINEs in tst_qbitarray and tst_qcontiguouscache, to avoid porting calls that explicitly test count(). Change-Id: Icfb8808c2ff4a30187e9935a51cad26987451c22 Reviewed-by: Ivan Solovev <ivan.solovev@qt.io> Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
* Long live QUtf8::convertFromLatin1()!Marc Mutz2022-11-021-0/+15
| | | | | | | | | | | | | | | | | | | | | | With the introduction of QAnyStringView, overloading based on UTF-8 and Latin-1 is becoming more common. Often, the two overloads can share the processing backend, because we're only interested in the US-ASCII subset of each. But if they can't, we need a faster way to convert L1 into UTF-8 than going via UTF-16. This is where the new private API comes in. Eventually, we should have the converse operation, too, to complete the set of direct conversions between the possible three QAnyStringView encodings L1/U8/U16, but this direction is easier to code (there are no error cases) and more immediately useful, so provide L1->U8 alone for now. Change-Id: I3f7e1a9c89979d0eb604cb9e42dedf3d514fca2c Reviewed-by: Edward Welbourne <edward.welbourne@qt.io> Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QStringConverter/AVX2: fix build with MSVC 2022Thiago Macieira2022-10-261-1/+1
| | | | | | | | | | It doesn't like 0x80 passed to a char, causing a warning qstringconverter.cpp(196): warning C4309: 'argument': truncation of constant value Pick-to: 6.2 6.4 Change-Id: I07ec23f3cb174fb197c3fffd17215b6f83476ebf Reviewed-by: Lars Knoll <lars@knoll.priv.no>
* Port from container.count()/length() to size()Marc Mutz2022-10-041-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is semantic patch using ClangTidyTransformator: auto QtContainerClass = expr(hasType(namedDecl(hasAnyName(<classes>)))).bind(o) makeRule(cxxMemberCallExpr(on(QtContainerClass), callee(cxxMethodDecl(hasAnyName({"count", "length"), parameterCountIs(0))))), changeTo(cat(access(o, cat("size"), "()"))), cat("use 'size()' instead of 'count()/length()'")) a.k.a qt-port-to-std-compatible-api with config Scope: 'Container'. <classes> are: // sequential: "QByteArray", "QList", "QQueue", "QStack", "QString", "QVarLengthArray", "QVector", // associative: "QHash", "QMultiHash", "QMap", "QMultiMap", "QSet", // Qt has no QMultiSet Change-Id: Ibe8837be96e8d30d1846881ecd65180c1bc459af Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Finish porting cross-platform parts of QStringConverter to qsizetype/size_tMarc Mutz2022-08-121-2/+2
| | | | | | | | | | | | | | There are still problems with platforms-specific APIs that are 32-bit only (cf. QTBUG-105105), but this patch finishes the port of the cross-platform parts of QStringConverter. None of these changes have a user-visible effect. They just avoid the Code Smell that int has become since Qt 6.0. Pick-to: 6.4 Task-number: QTBUG-103531 Change-Id: I267e2e1268a18c130892fa2fd80d1b5dabb3d9b9 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QStringConverter: make a narrowing conversion explicitMarc Mutz2022-08-121-1/+1
| | | | | | | | | | | Int variables are a code smell these days, so make the narrowing conversion (from ptrdiff_t to int) explicit and add a comment. Pick-to: 6.4 6.3 6.2 Task-number: QTBUG-105105 Change-Id: Ia4e14f1cc132ca36d15e9684bfcb4605d7b9251f Reviewed-by: Edward Welbourne <edward.welbourne@qt.io> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QStringConverter: fix -Wc++20-compatMarc Mutz2022-08-121-4/+4
| | | | | | | | | | | | | | | GCC 13 warns: qstringconverter_p.h:29:6: warning: identifier ‘char8_t’ is a keyword in C++20 [-Wc++20-compat] 29 | enum char8_t : uchar {}; Fix by calling the replacement qchar8_t (and making it a typedef to char8_t when the latter is available). Pick-to: 6.4 6.3 6.2 Change-Id: If59a9d55667bf1f5245e3a34189687995b000daa Reviewed-by: Ville Voutilainen <ville.voutilainen@qt.io> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Unicode conversion: Skip superfluous check for QT_COMPILER_SUPPORTS_SSE2Fabian Kosmale2022-07-131-3/+2
| | | | | | | | | | | We already check for __SSE2__, which gets undefined when __SSE2__ is not set. Moreover, we want to use the intrinsics without a runtime check there, so checking for __SSE2__ is the correct thing to do. Change-Id: I7f8610e2927650b439c3697585234b843e345e4c Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QStringConverter: Do not use std::optional::value()Ulf Hermann2022-07-051-1/+1
| | | | | | | | | | | | | | | | value() can potentially throw an exception. We know that it doesn't in this case, but the compiler doesn't know. And our code checker doesn't know either and generates lots of false positives. Also, without the exception propagation code the resulting binary is probably smaller. Coverity-Id: 386110 Coverity-Id: 384314 Coverity-Id: 383835 Coverity-Id: 383784 Pick-to: 6.4 Change-Id: Icdacf8e003fd3a6ac8fd260ed335239a59de3295 Reviewed-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Add QStringDecoder::decoderForHtml()Lars Knoll2022-06-211-14/+50
| | | | | | | | | | | | | | Now that QStringConverter can handle non UTF encodings through ICU, add a way to get a decoder for arbitrary HTML code. Opposed to QStringConverter::encodingForHtml(), this method will try to create a valid string decoder also for non unicode codecs. Pick-to: 6.4 Change-Id: I343584da1b114396c744f482d9b433c9cedcc511 Reviewed-by: Fabian Kosmale <fabian.kosmale@qt.io> Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Long live the ICU-based QStringConverter interface!Fabian Kosmale2022-06-191-3/+278
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for additional codecs to QStringConverter when ICU is available. We store the converter in the state (d[0]), and its canonical name in d[1]. We need the name there, as in the clear function we close the UConverter, and set the pointer to null. Consequently, the actual conversion functions might need to re-open the converter again. The advantage of this approach is that clear is used in the destructor of State, and with this approach we properly clean up the state. There is however a disadvantage: The clear function was so far also used for resetting the state when QStringConverter::resetState . Discarding the whole Uconverter for that is however rather costly. For that reason we modify resetState to call a new function, State::reset. For existing converters, it behaves the same as clear; for the ICU based converter, we call the more efficient ucnv_reset. Code compiled against Qt 6.4 can benefit from this more efficient version; code compiled against older Qt versions will continue to work, as the conversion functions can just recretate the converter from the name. We can distinguish between ICU and non-ICU converters by checking if the UsesIcu flag is set. QStringConverter::name is changed to return the name stored in d[1]. The interface of the ICU converter has a dummy name, so code using the old name function from QT < 6.4 still returns something, namely a message asking the user to recompile. The function is moved out of line, as we need to check for the private ICU feature, and want to avoid having that check in the public header. As the QStringConverter ctor taking a name now can allocate memory, it can no longer be noexcept. Removing the noexceptness is safe, as it was only added after Qt 6.3. Note that we cannot extend the API consuming or returning Encoding, as we use Encoding values to index into an array of converter interfaces in inline API. Further API to support getting an ICU converter for HTML will be added in a future commit. Currently, the code depending on ICU is enabled at compile time if ICU is found. However, in the future it could be moved into a plugin to avoid a hard dependency on ICU in Core. [ChangeLog][Corelib][Text] QStringConverter and API using it now supports more text codecs if Qt is compiled with ICU support. Fixes: QTBUG-103375 Change-Id: I7afb92fc68ef994179ebc7a3aa73beebb1386204 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QStringConverter: use the QUtf8 codec when Windows is using UTF-8Thiago Macieira2022-05-231-2/+7
| | | | | | | | | | | | | | | | | | | | The QLocal8Bit implementation assumes that there's at most one continuation byte -- that is, that all codecs are either Single or Double Byte Character Sets (SBCS or DBCS). It appears to be the case for all Windows default codepages, except for CP_UTF8, which is an opt-in anyway. Instead of fixing our codec, let's just use the optimized UTF-8 implementation. [ChangeLog][Windows] Fixed support for using Qt applications with UTF-8 as the system codepage or by enabling that in the application's manifest. Discussed-on: https://lists.qt-project.org/pipermail/interest/2022-May/038241.html Pick-to: 6.2 6.3 Change-Id: I77c8221eb2824c369feffffd16f0912550a98049 Reviewed-by: Lars Knoll <lars.knoll@qt.io>