New QHash implementation

A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
author: Lars Knoll <lars.knoll@qt.io> 2020-01-17 14:33:53 +0100
committer: Lars Knoll <lars.knoll@qt.io> 2020-04-09 20:02:55 +0200
commit: 5b7c3e31b538376f2b4733bd868b5875b504cdb3 (patch)
tree: e3e45f65f1bdc2db5dad3b25ec79bfe04320d9e6 /src/corelib/tools/qhash.cpp
parent: 926a0886d1961a3f384d3e6c36919e6dd8055dce (diff)
1 files changed, 0 insertions, 322 deletions
diff --git a/src/corelib/tools/qhash.cpp b/src/corelib/tools/qhash.cpp
index ca5b7ffd55..f048aa01de 100644
--- a/src/corelib/tools/qhash.cpp
+++ b/src/corelib/tools/qhash.cpp
@@ -405,328 +405,6 @@ uint qt_hash(QStringView key, uint chained) noexcept
     return h;
 }
 
-/*
-    The prime_deltas array contains the difference between a power
-    of two and the next prime number:
-
-    prime_deltas[i] = nextprime(2^i) - 2^i
-
-    Basically, it's sequence A092131 from OEIS, assuming:
-    - nextprime(1) = 1
-    - nextprime(2) = 2
-    and
-    - left-extending it for the offset 0 (A092131 starts at i=1)
-    - stopping the sequence at i = 28 (the table is big enough...)
-*/
-
-static const uchar prime_deltas[] = {
-    0,  0,  1,  3,  1,  5,  3,  3,  1,  9,  7,  5,  3, 17, 27,  3,
-    1, 29,  3, 21,  7, 17, 15,  9, 43, 35, 15,  0,  0,  0,  0,  0
-};
-
-/*
-    The primeForNumBits() function returns the prime associated to a
-    power of two. For example, primeForNumBits(8) returns 257.
-*/
-
-static inline int primeForNumBits(int numBits)
-{
-    return (1 << numBits) + prime_deltas[numBits];
-}
-
-/*
-    Returns the smallest integer n such that
-    primeForNumBits(n) >= hint.
-*/
-static int countBits(int hint)
-{
-    int numBits = 0;
-    int bits = hint;
-
-    while (bits > 1) {
-        bits >>= 1;
-        numBits++;
-    }
-
-    if (numBits >= (int)sizeof(prime_deltas)) {
-        numBits = sizeof(prime_deltas) - 1;
-    } else if (primeForNumBits(numBits) < hint) {
-        ++numBits;
-    }
-    return numBits;
-}
-
-/*
-    A QHash has initially around pow(2, MinNumBits) buckets. For
-    example, if MinNumBits is 4, it has 17 buckets.
-*/
-const int MinNumBits = 4;
-
-const QHashData QHashData::shared_null = {
-    nullptr, nullptr, Q_REFCOUNT_INITIALIZE_STATIC, 0, 0, MinNumBits, 0, 0, 0, true, false, 0
-};
-
-void *QHashData::allocateNode(int nodeAlign)
-{
-    void *ptr = strictAlignment ? qMallocAligned(nodeSize, nodeAlign) : malloc(nodeSize);
-    Q_CHECK_PTR(ptr);
-    return ptr;
-}
-
-void QHashData::freeNode(void *node)
-{
-    if (strictAlignment)
-        qFreeAligned(node);
-    else
-        free(node);
-}
-
-QHashData *QHashData::detach_helper(void (*node_duplicate)(Node *, void *),
-                                    void (*node_delete)(Node *),
-                                    int nodeSize,
-                                    int nodeAlign)
-{
-    union {
-        QHashData *d;
-        Node *e;
-    };
-    if (this == &shared_null)
-        qt_initialize_qhash_seed(); // may throw
-    d = new QHashData;
-    d->fakeNext = nullptr;
-    d->buckets = nullptr;
-    d->ref.initializeOwned();
-    d->size = size;
-    d->nodeSize = nodeSize;
-    d->userNumBits = userNumBits;
-    d->numBits = numBits;
-    d->numBuckets = numBuckets;
-    d->seed = (this == &shared_null) ? uint(qt_qhash_seed.loadRelaxed()) : seed;
-    d->sharable = true;
-    d->strictAlignment = nodeAlign > 8;
-    d->reserved = 0;
-
-    if (numBuckets) {
-        QT_TRY {
-            d->buckets = new Node *[numBuckets];
-        } QT_CATCH(...) {
-            // restore a consistent state for d
-            d->numBuckets = 0;
-            // roll back
-            d->free_helper(node_delete);
-            QT_RETHROW;
-        }
-
-        Node *this_e = reinterpret_cast<Node *>(this);
-        for (int i = 0; i < numBuckets; ++i) {
-            Node **nextNode = &d->buckets[i];
-            Node *oldNode = buckets[i];
-            while (oldNode != this_e) {
-                QT_TRY {
-                    Node *dup = static_cast<Node *>(allocateNode(nodeAlign));
-
-                    QT_TRY {
-                        node_duplicate(oldNode, dup);
-                    } QT_CATCH(...) {
-                        freeNode( dup );
-                        QT_RETHROW;
-                    }
-
-                    *nextNode = dup;
-                    nextNode = &dup->next;
-                    oldNode = oldNode->next;
-                } QT_CATCH(...) {
-                    // restore a consistent state for d
-                    *nextNode = e;
-                    d->numBuckets = i+1;
-                    // roll back
-                    d->free_helper(node_delete);
-                    QT_RETHROW;
-                }
-            }
-            *nextNode = e;
-        }
-    }
-    return d;
-}
-
-void QHashData::free_helper(void (*node_delete)(Node *))
-{
-    if (node_delete) {
-        Node *this_e = reinterpret_cast<Node *>(this);
-        Node **bucket = reinterpret_cast<Node **>(this->buckets);
-
-        int n = numBuckets;
-        while (n--) {
-            Node *cur = *bucket++;
-            while (cur != this_e) {
-                Node *next = cur->next;
-                node_delete(cur);
-                freeNode(cur);
-                cur = next;
-            }
-        }
-    }
-    delete [] buckets;
-    delete this;
-}
-
-QHashData::Node *QHashData::nextNode(Node *node)
-{
-    union {
-        Node *next;
-        Node *e;
-        QHashData *d;
-    };
-    next = node->next;
-    Q_ASSERT_X(next, "QHash", "Iterating beyond end()");
-    if (next->next)
-        return next;
-
-    int start = (node->h % d->numBuckets) + 1;
-    Node **bucket = d->buckets + start;
-    int n = d->numBuckets - start;
-    while (n--) {
-        if (*bucket != e)
-            return *bucket;
-        ++bucket;
-    }
-    return e;
-}
-
-QHashData::Node *QHashData::previousNode(Node *node)
-{
-    union {
-        Node *e;
-        QHashData *d;
-    };
-
-    e = node;
-    while (e->next)
-        e = e->next;
-
-    int start;
-    if (node == e)
-        start = d->numBuckets - 1;
-    else
-        start = node->h % d->numBuckets;
-
-    Node *sentinel = node;
-    Node **bucket = d->buckets + start;
-    while (start >= 0) {
-        if (*bucket != sentinel) {
-            Node *prev = *bucket;
-            while (prev->next != sentinel)
-                prev = prev->next;
-            return prev;
-        }
-
-        sentinel = e;
-        --bucket;
-        --start;
-    }
-    Q_ASSERT_X(start >= 0, "QHash", "Iterating backward beyond begin()");
-    return e;
-}
-
-/*
-    If hint is negative, -hint gives the approximate number of
-    buckets that should be used for the hash table. If hint is
-    nonnegative, (1 << hint) gives the approximate number
-    of buckets that should be used.
-*/
-void QHashData::rehash(int hint)
-{
-    if (hint < 0) {
-        hint = countBits(-hint);
-        if (hint < MinNumBits)
-            hint = MinNumBits;
-        userNumBits = hint;
-        while (primeForNumBits(hint) < (size >> 1))
-            ++hint;
-    } else if (hint < MinNumBits) {
-        hint = MinNumBits;
-    }
-
-    if (numBits != hint) {
-        Node *e = reinterpret_cast<Node *>(this);
-        Node **oldBuckets = buckets;
-        int oldNumBuckets = numBuckets;
-
-        int nb = primeForNumBits(hint);
-        buckets = new Node *[nb];
-        numBits = hint;
-        numBuckets = nb;
-        for (int i = 0; i < numBuckets; ++i)
-            buckets[i] = e;
-
-        for (int i = 0; i < oldNumBuckets; ++i) {
-            Node *firstNode = oldBuckets[i];
-            while (firstNode != e) {
-                uint h = firstNode->h;
-                Node *lastNode = firstNode;
-                while (lastNode->next != e && lastNode->next->h == h)
-                    lastNode = lastNode->next;
-
-                Node *afterLastNode = lastNode->next;
-                Node **beforeFirstNode = &buckets[h % numBuckets];
-                while (*beforeFirstNode != e)
-                    beforeFirstNode = &(*beforeFirstNode)->next;
-                lastNode->next = *beforeFirstNode;
-                *beforeFirstNode = firstNode;
-                firstNode = afterLastNode;
-            }
-        }
-        delete [] oldBuckets;
-    }
-}
-
-#ifdef QT_QHASH_DEBUG
-
-void QHashData::dump()
-{
-    qDebug("Hash data (ref = %d, size = %d, nodeSize = %d, userNumBits = %d, numBits = %d, numBuckets = %d)",
-            int(ref), size, nodeSize, userNumBits, numBits,
-            numBuckets);
-    qDebug("    %p (fakeNode = %p)", this, fakeNext);
-    for (int i = 0; i < numBuckets; ++i) {
-        Node *n = buckets[i];
-        if (n != reinterpret_cast<Node *>(this)) {
-            QString line = QString::asprintf("%d:", i);
-            while (n != reinterpret_cast<Node *>(this)) {
-                line += QString::asprintf(" -> [%p]", n);
-                if (!n) {
-                    line += " (CORRUPT)";
-                    break;
-                }
-                n = n->next;
-            }
-            qDebug("%ls", qUtf16Printable(line));
-        }
-    }
-}
-
-void QHashData::checkSanity()
-{
-    if (Q_UNLIKELY(fakeNext))
-        qFatal("Fake next isn't 0");
-
-    for (int i = 0; i < numBuckets; ++i) {
-        Node *n = buckets[i];
-        Node *p = n;
-        if (Q_UNLIKELY(!n))
-            qFatal("%d: Bucket entry is 0", i);
-        if (n != reinterpret_cast<Node *>(this)) {
-            while (n != reinterpret_cast<Node *>(this)) {
-                if (Q_UNLIKELY(!n->next))
-                    qFatal("%d: Next of %p is 0, should be %p", i, n, this);
-                n = n->next;
-            }
-        }
-    }
-}
-#endif
-
 /*!
     \fn template <typename T1, typename T2> uint qHash(const QPair<T1, T2> &key, uint seed = 0)
     \since 5.0
author	Lars Knoll <lars.knoll@qt.io>	2020-01-17 14:33:53 +0100
committer	Lars Knoll <lars.knoll@qt.io>	2020-04-09 20:02:55 +0200
commit	5b7c3e31b538376f2b4733bd868b5875b504cdb3 (patch)
tree	e3e45f65f1bdc2db5dad3b25ec79bfe04320d9e6 /src/corelib/tools/qhash.cpp
parent	926a0886d1961a3f384d3e6c36919e6dd8055dce (diff)