Long live QUtf8::convertFromLatin1()!

With the introduction of QAnyStringView, overloading based on UTF-8 and Latin-1 is becoming more common. Often, the two overloads can share the processing backend, because we're only interested in the US-ASCII subset of each. But if they can't, we need a faster way to convert L1 into UTF-8 than going via UTF-16. This is where the new private API comes in. Eventually, we should have the converse operation, too, to complete the set of direct conversions between the possible three QAnyStringView encodings L1/U8/U16, but this direction is easier to code (there are no error cases) and more immediately useful, so provide L1->U8 alone for now. Change-Id: I3f7e1a9c89979d0eb604cb9e42dedf3d514fca2c Reviewed-by: Edward Welbourne <edward.welbourne@qt.io> Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
author: Marc Mutz <marc.mutz@qt.io> 2022-10-31 17:27:15 +0100
committer: Marc Mutz <marc.mutz@qt.io> 2022-11-02 17:21:28 +0100
commit: 8acec4dbe6f9faac2c48a8be67f73e0d2ec1185b (patch)
tree: e38fbf542db6b56993fb8017cef34d123f3ac133 /src/corelib/text/qstringconverter.cpp
parent: 3834fee3d33fde303c0a1535c1ef0439aa05c6b0 (diff)
1 files changed, 15 insertions, 0 deletions
diff --git a/src/corelib/text/qstringconverter.cpp b/src/corelib/text/qstringconverter.cpp
index 329e1990da..a0e0173cbb 100644
--- a/src/corelib/text/qstringconverter.cpp
+++ b/src/corelib/text/qstringconverter.cpp
@@ -571,6 +571,21 @@ char *QUtf8::convertFromUnicode(char *out, QStringView in, QStringConverter::Sta
     return reinterpret_cast<char *>(cursor);
 }
 
+char *QUtf8::convertFromLatin1(char *out, QLatin1StringView in)
+{
+    // ### SIMD-optimize:
+    for (uchar ch : in) {
+        if (ch < 128) {
+            *out++ = ch;
+        } else {
+            // as per https://en.wikipedia.org/wiki/UTF-8#Encoding, 2nd row
+            *out++ = 0b110'0'0000u | (ch >> 6);
+            *out++ = 0b10'00'0000u | (ch & 0b0011'1111);
+        }
+    }
+    return out;
+}
+
 QString QUtf8::convertToUnicode(QByteArrayView in)
 {
     // UTF-8 to UTF-16 always needs the exact same number of words or less:
author	Marc Mutz <marc.mutz@qt.io>	2022-10-31 17:27:15 +0100
committer	Marc Mutz <marc.mutz@qt.io>	2022-11-02 17:21:28 +0100
commit	8acec4dbe6f9faac2c48a8be67f73e0d2ec1185b (patch)
tree	e38fbf542db6b56993fb8017cef34d123f3ac133 /src/corelib/text/qstringconverter.cpp
parent	3834fee3d33fde303c0a1535c1ef0439aa05c6b0 (diff)