From ae82e039eb74b092604910a1f3f7fa4887397cbc Mon Sep 17 00:00:00 2001
From: Christian Kandeler <christian.kandeler@qt.io>
Date: Mon, 10 Jul 2023 15:19:29 +0200
Subject: Loader: Resolve products in parallel

The resolveProduct() function is now executed for several products
simultaneously, with the (relatively few) accesses to common resources
guarded by mutexes.

Using Qt Creator as a mid-to-large-sized test project, we see the
following changes in the time it takes to resolve the project on some
example machines:
  - Linux (36 cores): 10.5s -> 4.8s
  - Linux (8 cores): 17s -> 6.5s
  - macOS (6 cores): 41s -> 16s
  - Windows (8 cores): 20s -> 9s
Unsurprisingly, the speed-up does not scale with the number of
processors, as there are typically lots of inter-product dependencies
and some expensive resources such as Probes are shared globally.
However, we do see a factor of two to three across all the hardware and
OS configuarations, which is a good practical result for users.
Note that running with -j1, i.e. forcing the use of only a single core,
takes the same amount of time everywhere as it did without the patch, so
there is no scheduling overhead in the single-core case.

The results of our benchmarker tool look interesting. Here they are for
qbs and Qt Creator, respectively:
========== Performance data for Resolving ========== (qbs)
    Old instruction count: 9121688266
    New instruction count: 15736125513
    Relative change: +72 %
    Old peak memory usage: 84155384 Bytes
    New peak memory usage: 187776736 Bytes
    Relative change: +123 %
========== Performance data for Resolving ========== (QtC)
    Old instruction count: 59901017190
    New instruction count: 65227937765
    Relative change: +8 %
    Old peak memory usage: 621560008 Bytes
    New peak memory usage: 761732040 Bytes
    Relative change: +22 %
The increased peak memory usage is to be expected, as there are now
several JS engines running in parallel. The instruction count increase
is likely due to a higher amount of deferrals. Importantly, it appears
to go down massively with increased project size, so it does not seem
that the parallelism hides a serious per-thread slowdown.

Change-Id: Ib4d9ca9aa0687c1056ff82f9805b565cc5a35894
Reviewed-by: Ivan Komissarov <ABBAPOH@gmail.com>
---
 src/lib/corelib/language/scriptengine.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'src/lib/corelib/language/scriptengine.h')
diff --git a/src/lib/corelib/language/scriptengine.h b/src/lib/corelib/language/scriptengine.h
index 942b7519c..4a55392e3 100644
--- a/src/lib/corelib/language/scriptengine.h
+++ b/src/lib/corelib/language/scriptengine.h
@@ -58,6 +58,7 @@
 #include <QtCore/qprocess.h>
 #include <QtCore/qstring.h>
 
+#include <atomic>
 #include <functional>
 #include <memory>
 #include <mutex>
@@ -360,7 +361,7 @@ private:
     std::unordered_map<QString, JSValue> m_jsFileCache;
     bool m_propertyCacheEnabled = true;
     bool m_active = false;
-    bool m_canceling = false;
+    std::atomic_bool m_canceling = false;
     QHash<PropertyCacheKey, QVariant> m_propertyCache;
     PropertySet m_propertiesRequestedInScript;
     QHash<QString, PropertySet> m_propertiesRequestedFromArtifact;
-- 
cgit v1.2.3