qHash: implement an AES hasher for QLatin1StringView

It's the same aeshash() as before, except we're passing a template parameter to indicate whether to read half and then zero-extend the data. That is, it will perform a conversion from Latin1 on the fly. When running in zero-extending mode, the length parameters are actually doubled (counting the number of UTF-16 code units) and we then divide again by 2 when advancing. The implementation should have the following performance characteristics: * QLatin1StringView now will be roughly half as fast as Qt 6.7 * QLatin1StringView now will be roughly as fast as QStringView For the aeshash128() in default builds of QtCore (will use SSE4.1), the long loop (32 characters or more) is: QStringView QLatin1StringView movdqu -0x20(%rax),%xmm4 | pmovzxbw -0x10(%rdx),%xmm2 movdqu -0x10(%rax),%xmm5 | pmovzxbw -0x8(%rdx),%xmm3 add $0x20,%rax | add $0x10,%rdx pxor %xmm4,%xmm0 | pxor %xmm2,%xmm0 pxor %xmm5,%xmm1 | pxor %xmm3,%xmm1 aesenc %xmm0,%xmm0 aesenc %xmm0,%xmm0 aesenc %xmm1,%xmm1 aesenc %xmm1,%xmm1 aesenc %xmm0,%xmm0 aesenc %xmm0,%xmm0 aesenc %xmm1,%xmm1 aesenc %xmm1,%xmm1 The number of instructions is identical, but there are actually 2 more uops per iteration. LLVM-MCA simulation shows this should execute in the same number of cycles on older CPUs that do not have support for VAES (see <https://analysis.godbolt.org/z/x95Mrfrf7>). For the VAES version in aeshash256() and the AVX10 version in aeshash256_256(): QStringView QLatin1StringView vpxor -0x40(%rax),%ymm1,%ym | vpmovzxbw -0x20(%rax),%ymm3 vpxor -0x20(%rax),%ymm0,%ym | vpmovzxbw -0x10(%rax),%ymm2 add $0x40,%rax | add $0x20,%rax | vpxor %ymm3,%ymm0,%ymm0 | vpxor %ymm2,%ymm1,%ymm1 vaesenc %ymm1,%ymm1,%ymm1 < vaesenc %ymm0,%ymm0,%ymm0 vaesenc %ymm0,%ymm0,%ymm0 vaesenc %ymm1,%ymm1,%ymm1 vaesenc %ymm1,%ymm1,%ymm1 vaesenc %ymm0,%ymm0,%ymm0 vaesenc %ymm0,%ymm0,%ymm0 > vaesenc %ymm1,%ymm1,%ymm1 In this case, the increase in number of instructions matches the increase in number of uops. The LLVM-MCA simulation says that the QLatin1StringView version is faster at 11 cycles/iteration vs 14 cyc/it (see <https://analysis.godbolt.org/z/1Gv1coz13>), but that can't be right. Measured performance of CPU cycles, on an Intel Core i9-7940X (Skylake, no VAES support), normalized on the QString performance (QByteArray is used as a stand-in for the performance in Qt 6.7): aeshash | siphash QByteArray QL1SV QString QByteArray QString dictionary 94.5% 79.7% 100.0% 150.5%* 159.8% paths-small 90.2% 93.2% 100.0% 202.8% 290.3% uuids 81.8% 100.7% 100.0% 215.2% 350.7% longstrings 42.5% 100.8% 100.0% 185.7% 353.2% numbers 95.5% 77.9% 100.0% 155.3%* 164.5% On an Intel Core i7-1165G7 (Tiger Lake, capable of VAES and AVX512VL): aeshash | siphash QByteArray QL1SV QString QByteArray QString dictionary 90.0% 91.1% 100.0% 103.3%* 157.1% paths-small 99.4% 104.8% 100.0% 237.5% 358.0% uuids 88.5% 117.6% 100.0% 274.5% 461.7% longstrings 57.4% 111.2% 100.0% 503.0% 974.3% numbers 90.6% 89.7% 100.0% 98.7%* 149.9% On an Intel 4th Generation Xeon Scalable Platinum (Sapphire Rapids, same Golden Cove core as Alder Lake): aeshash | siphash QByteArray QL1SV QString QByteArray QString dictionary 89.9% 102.1% 100.0% 158.1%* 172.7% paths-small 78.0% 89.4% 100.0% 159.4% 258.0% uuids 109.1% 107.9% 100.0% 279.0% 496.3% longstrings 52.1% 112.4% 100.0% 564.4% 1078.3% numbers 85.8% 98.9% 100.0% 152.6%* 190.4% * dictionary contains very short entries (6 characters) * paths-small contains strings of varying length, but very few over 32 * uuids-list contains fixed-length strings (38 characters) * longstrings is the same but 304 characters * numbers also a lot contains very short strings (1 to 6 chars) What this shows: * For short strings, the performance difference is negligible between all three * For longer strings, QLatin1StringView now costs between 7 and 17% more than QString on the tested machines instead of up to ~50% less, except on the older machine (where I think the main QString hashing is suffering from memory bandwidth limitations) * The AES hash implementation is anywhere from 1.6 to 11x faster than Siphash * Murmurhash (marked with asterisk) is much faster than Siphash, but it only managed to beat the AES hash in one test Change-Id: I664b9f014ffc48cbb49bfffd17b045c1811ac0ed Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
author: Thiago Macieira <thiago.macieira@intel.com> 2024-02-02 22:15:56 -0800
committer: Thiago Macieira <thiago.macieira@intel.com> 2024-03-12 18:23:09 -0700
commit: 55959aefab1a190435dfacfc2a136ce3314d423c (patch)
tree: 0577418cdae818acb80fdf5388232bdb2663080c /tests/auto/corelib/tools/qhashfunctions/tst_qhashfunctions.cpp
parent: 970aad541811d002e5004bd3826929247492ba09 (diff)
1 files changed, 3 insertions, 3 deletions
diff --git a/tests/auto/corelib/tools/qhashfunctions/tst_qhashfunctions.cpp b/tests/auto/corelib/tools/qhashfunctions/tst_qhashfunctions.cpp
index fdb2b37346..06f18dfe9c 100644
--- a/tests/auto/corelib/tools/qhashfunctions/tst_qhashfunctions.cpp
+++ b/tests/auto/corelib/tools/qhashfunctions/tst_qhashfunctions.cpp
@@ -289,10 +289,12 @@ void tst_QHashFunctions::stringConsistency_data()
     QTest::newRow("null") << QString();
     QTest::newRow("empty") << "";
     QTest::newRow("withnull") << QStringLiteral("A\0z");
-    QTest::newRow("short-ascii") << "Hello";
+    QTest::newRow("short-ascii") << "Hello";            // 10 bytes
+    QTest::newRow("medium-ascii") << "Hello, World";    // 24 bytes
     QTest::newRow("long-ascii") << QStringLiteral("abcdefghijklmnopqrstuvxyz").repeated(16);
 
     QTest::newRow("short-latin1") << "Bokmål";
+    QTest::newRow("medium-latin1") << "Det går bra!";   // 24 bytes
     QTest::newRow("long-latin1")
             << R"(Alle mennesker er født frie og med samme menneskeverd og menneskerettigheter.
  De er utstyrt med fornuft og samvittighet og bør handle mot hverandre i brorskapets ånd.)";
@@ -327,8 +329,6 @@ void tst_QHashFunctions::stringConsistency()
         QLatin1StringView l1sv(l1ba.data(), l1ba.size());
 #ifdef Q_PROCESSOR_ARM
         // zero-extending aeshash not implemented on ARM
-#elif defined(Q_PROCESSOR_X86)
-        // zero-extending aeshash not implemented on x86
 #else
         if (value == l1sv)
             QCOMPARE(qHash(l1sv, seed), qHash(value, seed));
author	Thiago Macieira <thiago.macieira@intel.com>	2024-02-02 22:15:56 -0800
committer	Thiago Macieira <thiago.macieira@intel.com>	2024-03-12 18:23:09 -0700
commit	55959aefab1a190435dfacfc2a136ce3314d423c (patch)
tree	0577418cdae818acb80fdf5388232bdb2663080c /tests/auto/corelib/tools/qhashfunctions/tst_qhashfunctions.cpp
parent	970aad541811d002e5004bd3826929247492ba09 (diff)