QHash: tame HasQHashSingleArgOverload ODR violations

qhashfunctions.h defines a catch-all 2-arguments qHash(T, seed) in order to support datatypes that implement a 1-argument overload of qHash (i.e. qHash(Type)). The catch-all calls the 1-argument overload and XORs the result with the seed. The catch-all is constrained on the existence of such a 1-argument overload. This is done in order to make the catch-all SFINAE-friendly; otherwise merely instantiating the catch-all would trigger a hard error. Such an error would make it impossible to build a type trait that detects if one can call qHash(T, size_t) for a given type T. The constraint itself is called HasQHashSingleArgOverload and lives in a private namespace. It has been observed that HasQHashSingleArgOverload misbehaves for some datatypes. For instance, HasQHashSingleArgOverload<int> is actually false, despite qHash(123) being perfectly callable. (The second argument of qHash(int, size_t) is defaulted, so the call *is* possible.) -- Why is HasQHashSingleArgOverload<int> false? This has to do with how HasQHashSingleArgOverload<T> is implemented: as a detection trait that checks if qHash(declval<T>()) is callable. The detection itself is not a problem. Consider this code: template <typename T> constexpr bool HasQHashSingleArgOverload = /* magic */; class MyClass {}; size_t qHash(MyClass); static_assert(HasQHashSingleArgOverload<MyClass>); // OK Here, the static_assert passes, even if qHash(MyClass) (and MyClass itself) were not defined at all when HasQHashSingleArgOverload was defined. This is nothing but 2-phase lookup at work ([temp.dep.res]): the detection inside HasQHashSingleArgOverload takes into account the qHash overloads available when HasQHashSingleArgOverload was declared, as well as any other overload declared before the "point of instantiation". This means that qHash(MyClass) will be visible and detected. Let's try something slightly different: template <typename T> constexpr bool HasQHashSingleArgOverload = /* magic */; size_t qHash(int); static_assert(HasQHashSingleArgOverload<int>); // ERROR This one *does not work*. How is it possible? The answer is that 2-phase name lookup combines the names found at definition time with the names _found at instantiation time using argument-dependent lookup only_. `int` is a fundamental type and does not participate in ADL. In the example, HasQHashSingleArgOverload has actually no qHash overloads to even consider, and therefore its detection fails. You can restore detection by moving the declaration of the qHash(int) overload *before* the definition of HasQHashSingleArgOverload, so it's captured at definition time: size_t qHash(int); template <typename T> constexpr bool HasQHashSingleArgOverload = /* magic */; static_assert(HasQHashSingleArgOverload<int>); // OK! This is why HasQHashSingleArgOverload<int> is currently returning `false`: because HasQHashSingleArgOverload is defined *before* all the qHash(fundamental_type) overloads in qhashfunctions.h. -- Now consider this variation of the above, where we keep the qHash(int) overload after the detector (so, it's not found), but also prepend an Evil class implicitly convertible from int: struct Evil { Evil(int); }; size_t qHash(Evil); template <typename T> constexpr bool HasQHashSingleArgOverload = /* magic */; size_t qHash(int); static_assert(HasQHashSingleArgOverload<int>); // OK Now the static_assert passes. HasQHashSingleArgOverload is still not considering qHash(int) (it's declared after), but it's considering qHash(Evil). Can you call *that* one with an int? Yes, after a conversion to Evil. This is extremely fragile and likely an ODR violation (if not ODR, then likely falls into [temp.dep.candidate/1]). -- Does this "really matter" for a type like `int`? The answer is no. If HasQHashSingleArgOverload<int> is true, then a call like qHash(42, 123uz); will have two overloads in its overloads set: 1) qHash(int, size_t) 2) qHash(T, size_t), i.e. the catch-all template. To be pedantic, qHash<int>(const int &, size_t), that is, the instantiation of the catch-all after template type deduction for T (= int) ([over.match.funcs.general/8]). Although it may look like this is ambiguous as both calls have perfect matches for the arguments, 1) is actually a better match than 2) because it is not a template specialization ([over.match.best/2.4]). In other words: qHash(int, size_t) is *always* called when the argument is `int`, no matter the value of HasQHashSingleArgOverload<int>. The catch-all template may be added or not to the overload set, but it's a worse match anyways. -- Now, let's consider this code: enum MyEnum { E1, E2, E3 }; qHash(E1, 42uz); This code compiles, although we do not define any qHash overload specifically for enumeration types (nor one is defined by MyEnum's author). Which qHash overload gets called? Again there are two possible overloads available: 1) qHash(int, size_t). E1 can be converted to `int` ([conv.prom/3]), and this overload selected. 2) qHash(T, size_t), which after instantiation, is qHash<MyEnum>(const MyEnum &, size_t). In this case, 2) is a better match than 1), because it does not require any conversion for the arguments. Is 2) a viable overload? Unfortunately the answer here is "it depends", because it's subject to what we've learned before: since the catch-all is constrained by the HasQHashSingleArgOverload trait, names introduced before the trait may exclude or include the overload. This code: #include <qhashfunctions.h> enum MyEnum { E1, E2, E3 }; qHash(E1, 42uz); static_assert(HasQHashSingleArgOverload<MyEnum>); // ERROR will fail the static_assert. This means that only qHash(int, size_t) is in the overload set. However, this code: struct Evil { Evil(int); }; size_t qHash(Evil); #include <qhashfunctions.h> enum MyEnum { E1, E2, E3 }; qHash(E1, 42uz); static_assert(HasQHashSingleArgOverload<MyEnum>); // OK will pass the static_assert. qHash(Evil) can be called with an object of type MyEnum after an user-defined conversion sequence ([over.best.ics.general], [over.ics.user]: a standard conversion sequence, made of a lvalue-to-rvalue conversion + a integral promotion, followed by a conversion by constructor [class.conv.ctor]). Therefore, HasQHashSingleArgOverload<MyEnum> is true here; the catch-all template is added to the overload set; and it's a best match for the qHash(E1, 42uz) call. -- Is this a problem? **Yes**, and a huge one: the catch-all template does not yield the same value as the qHash(int, size_t) overload. This means that calculating hash values (e.g. QHash, QSet) will have different results depending on include ordering! A translation unit TU1 may have #include <QSet> #include <Evil> QSet<MyEnum> calculateSet { /* ... */ } And another translation unit TU2 may have #include <Evil> #include <QSet> // different order void use() { QSet<MyEnum> set = calculateSet(); } And now the two TUs cannot exchange QHash/QSet objects as they would hash the contents differently. -- `Evil` actually exists in Qt. The bug report specifies QKeySequence, which has an implicit constructor from int, but one can concoct infinite other examples. -- Congratulations if you've read so far. ========================= === PROPOSED SOLUTION === ========================= 1) Move the HasQHashSingleArgOverload detection after declaring the overloads for all the fundamental types (which we already do anyways). This means that HasQHashSingleArgOverload<fundamental_type> will now be true. It also means that the catch-all becomes available for all fundamental types, but as discussed before, for all of them we have better matches anyways. 2) For unscoped enumeration types, this means however an ABI break: the catch-all template becomes always the best match. Code compiled before this change would call qHash(int, size_t), and code compiled after this change would call the catch-all qHash<Enum>(Enum, size_t); as discussed before, the two don't yield the same results, so mixing old code and new code will break. In order to restore the old behavior, add a qHash overload for enumeration types that forwards the implementation to the integer overloads (using qToUnderlying¹). (Here I'm considering the "old", correct behavior the one that one gets by simply including QHash/QSet, declaring an enumeration and calling qHash on it. In other words, without having Evil around before including QHash.) This avoids an ABI break for most enumeration types, for which one does not explicitly define a qHash overload. It however *introduces* an ABI break for enumeration types for which there is a single-argument qHash(E) overload. This is because - before this change, the catch-all template was called, and that in turn called qHash(E) and XOR'ed the result with the seed; - after this change, the newly introduced qHash overload for enumerations gets called. It's very likely that it would not give the same result as before. I don't have a solution for this, so we'll have to accept the ABI break. Note that if one defines a two-arguments overload for an enum type, then nothing changes there (the overload is still the best match). 3) Make plans to kill the catch-all template, for Qt 7.0 at the latest. We've asked users to provide a two-args qHash overload for a very long time, it's time to stop working around that. 4) Make plans to switch from overloading qHash to specializing std::hash (or equivalent). Specializations don't overload, and we'd get rid of all these troubles with implicit conversions. -- ¹ To nitpick, qToUnderlying may select a *different* overload than the one selected by an implicit conversion. That's because an unscoped enumeration without a fixed underlying type is allowed to have an underlying type U, and implicitly convert to V, with U and V being two different types (!). U is "an integral type that can represent all the enumerator values" ([dcl.enum/7]). V is selected in a specific list in a specific order ([conv.prom]/3). This means that in theory a compiler can take enum E { E1, E2 }, give it `unsigned long long` as underlying type, and still allow for a conversion to `int`. As far as I know, no compiler we use does something as crazy as that, but if it's a concern, it needs to be fixed. [ChangeLog][Deprecation Notice] Support for overloads of qHash with only one argument is going to be removed in Qt 7. Users are encouraged to upgrade to the two-arguments overload. Please refer to the QHash documentation for more information. [ChangeLog][Potentially Binary-Incompatible Changes] If an enumeration type for which a single-argument qHash overload has been declared is being used as a key type in QHash, QMultiHash or QSet, then objects of these types are no longer binary compatible with code compiled against an earlier version of Qt. It is very unlikely that such qHash overloads exist, because enumeration types work out of the box as keys Qt unordered associative containers; users do not need to define qHash overloads for their custom enumerations. Note that there is no binary incompatibity if a *two* arguments qHash overload has been declared instead. Fixes: QTBUG-108032 Fixes: QTBUG-107033 Pick-to: 6.2 6.4 Change-Id: I2ebffb2820c553e5fdc3a341019433793a58e3ab Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
author: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com> 2022-10-28 14:06:12 +0200
committer: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com> 2022-11-01 01:52:13 +0200
commit: fb4bc5fa262336504e0f28603658bb2572796ce5 (patch)
tree: 3959de8c1c981dd8a8c5945f5c25ec704f047a93 /src/corelib/doc/snippets/code
parent: aa05d7307643efea482e155921e1211d73b3e9bb (diff)
1 files changed, 3 insertions, 3 deletions
diff --git a/src/corelib/doc/snippets/code/src_corelib_tools_qhash.cpp b/src/corelib/doc/snippets/code/src_corelib_tools_qhash.cpp
index f1be49c1d4..b7e56c5ec3 100644
--- a/src/corelib/doc/snippets/code/src_corelib_tools_qhash.cpp
+++ b/src/corelib/doc/snippets/code/src_corelib_tools_qhash.cpp
@@ -297,11 +297,11 @@ inline size_t qHash(const std::unordered_set<int> &key, size_t seed = 0)
 //! [31]
 
 //! [32]
-size_t qHash(K key);
-size_t qHash(const K &key);
-
 size_t qHash(K key, size_t seed);
 size_t qHash(const K &key, size_t seed);
+
+size_t qHash(K key);        // deprecated, do not use
+size_t qHash(const K &key); // deprecated, do not use
 //! [32]
 
 //! [33]
author	Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>	2022-10-28 14:06:12 +0200
committer	Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>	2022-11-01 01:52:13 +0200
commit	fb4bc5fa262336504e0f28603658bb2572796ce5 (patch)
tree	3959de8c1c981dd8a8c5945f5c25ec704f047a93 /src/corelib/doc/snippets/code
parent	aa05d7307643efea482e155921e1211d73b3e9bb (diff)