summaryrefslogtreecommitdiffstats
path: root/src
diff options
context:
space:
mode:
authorGiuseppe D'Angelo <dangelog@gmail.com>2012-03-27 18:40:06 +0100
committerQt by Nokia <qt-info@nokia.com>2012-04-09 10:16:09 +0200
commit4893a5422e2978f4b9a0e7785af1696e3438ac22 (patch)
treecef483fbe83435e154d91f739b1667bb1cbf34ba /src
parentddb70bee2fd323ddc4273aec5d40d975f50d2904 (diff)
New qHash algorithm for uchar/ushort arrays (QString, QByteArray, etc.)
Port of Robin's work from I0a53aa4581e25b351b9cb5033415b5163d05fe71 on top of the new qHash patches (the original commit just introduced lots of conflicts, so I redid it from scratch). This is based on the work done in the QHash benchmark over the past few months experimenting with the performance of the string hashing algorithm used by Java. The Java algorithm, in turn, appears to have been based off a variant of djb's work at http://cr.yp.to/cdb/cdb.txt. This commit provides a performance boost of ~12-33% on the QHash benchmark. Unfortunately, the rcc test depends on QHash ordering. Randomizing QHash or changing qHash will cause the test to fail (see QTBUG-25078), so for now the testdata is changed as well. Done-with: Robin Burchell Change-Id: Ie05d8e21588d1b2d4bd555ef254e1eb101864b75 Reviewed-by: João Abecasis <joao.abecasis@nokia.com> Reviewed-by: Robin Burchell <robin+qt@viroteck.net>
Diffstat (limited to 'src')
-rw-r--r--src/corelib/tools/qhash.cpp42
1 files changed, 23 insertions, 19 deletions
diff --git a/src/corelib/tools/qhash.cpp b/src/corelib/tools/qhash.cpp
index ce7d4ad098..20202a4896 100644
--- a/src/corelib/tools/qhash.cpp
+++ b/src/corelib/tools/qhash.cpp
@@ -73,38 +73,42 @@
QT_BEGIN_NAMESPACE
+/*
+ The Java's hashing algorithm for strings is a variation of D. J. Bernstein
+ hashing algorithm appeared here http://cr.yp.to/cdb/cdb.txt
+ and informally known as DJB33XX - DJB's 33 Times Xor.
+ Java uses DJB31XA, that is, 31 Times Add.
-// ### Qt 5: see tests/benchmarks/corelib/tools/qhash/qhash_string.cpp
-// Hashing of the whole string is a waste of cycles.
+ The original algorithm was a loop around
+ (h << 5) + h ^ c
+ (which is indeed h*33 ^ c); it was then changed to
+ (h << 5) - h ^ c
+ (so h*31^c: DJB31XX), and the XOR changed to a sum:
+ (h << 5) - h + c
+ (DJB31XA), which can save some assembly instructions.
-/*
- These functions are based on Peter J. Weinberger's hash function
- (from the Dragon Book). The constant 24 in the original function
- was replaced with 23 to produce fewer collisions on input such as
- "a", "aa", "aaa", "aaaa", ...
+ Still, we can avoid writing the multiplication as "(h << 5) - h"
+ -- the compiler will turn it into a shift and an addition anyway
+ (for instance, gcc 4.4 does that even at -O0).
*/
-static uint hash(const uchar *p, int n, uint seed)
+static inline uint hash(const uchar *p, int len, uint seed)
{
uint h = seed;
- while (n--) {
- h = (h << 4) + *p++;
- h ^= (h & 0xf0000000) >> 23;
- h &= 0x0fffffff;
- }
+ for (int i = 0; i < len; ++i)
+ h = 31 * h + p[i];
+
return h;
}
-static uint hash(const QChar *p, int n, uint seed)
+static inline uint hash(const QChar *p, int len, uint seed)
{
uint h = seed;
- while (n--) {
- h = (h << 4) + (*p++).unicode();
- h ^= (h & 0xf0000000) >> 23;
- h &= 0x0fffffff;
- }
+ for (int i = 0; i < len; ++i)
+ h = 31 * h + p[i].unicode();
+
return h;
}