qt-creator/perfparser.git - Parser for perf data files, creates output suitable for the QtCreator plugin

	Commit message (Collapse)	Author	Age	Files	Lines
...
* \|	Fix qbs build	Christian Kandeler	2017-05-03	1	-6/+2
\| \| \| \| \| \| \| \| \| \|	Change-Id: I4d71a668ae6c2a24f568e7ca73170d9fc3fe677e Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Drop the bundled elfutils	Ulf Hermann	2017-05-02	665	-89322/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We are going to provide elfutils separately. Change-Id: Ib78b78bf4d11d7921ae5f53a1d1dfa2a1aab3e53 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	Use open, close, free, cxa_demangle from elfutils	Ulf Hermann	2017-04-28	4	-15/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We cannot handle resources passed to and from elfutils with a different C library. Change-Id: I47e789b016d13c249d82a7bd1091cd5fb769ce9d Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	Allow specification of elfutils install dir	Ulf Hermann	2017-04-28	3	-18/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Link against elfutils in that directory, and copy files from there in the deploy step. Also, drop the windows specific library naming dance as we are never going to build the bundled elfutils on windows. Change-Id: Ia1dd2583856918b2c2623016f6ed7a80c0c7ef07 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	Use platform-specific path conventions	Ulf Hermann	2017-04-28	3	-17/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On windows we want backslashes as directory separators and semicolons as path separators. Change-Id: I4feaf4864ddd5c1ddaf7d60a5e8f2de3319af8ef Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	On memoryRead, copy only the word width	Ulf Hermann	2017-04-26	3	-8/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If Dwarf_Word is bigger than the target platform's word width, we otherwise get bogus values this way. The values we read are implicitly used as pointers and register values and the predefined memory_read implementations in elfutils do respect the word width. Change-Id: Idbbb76abc72a9b4bacc075b431fa0c854a54fc2e Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	Merge remote-tracking branch 'origin/4.3'	Eike Ziller	2017-04-18	4	-17/+48
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: app/perfsymboltable.cpp app/perfunwind.cpp Change-Id: If343bb33fabeb60a3eab566769cf2c4dda88fcc5
\| *	Output a sensible version on --version	Ulf Hermann	2017-04-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Yes, this has to be updated on every release, but I don't see a better way right now. Change-Id: Ie81849c75c4e3e55cc0265e66ab01ab60d6d2778 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
\| *	Retry unwinding when the symbol cache is dirty	Milian Wolff	2017-04-13	3	-16/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mmap()'d binaries can overlap in the profiled application's address space. For example ld.so usually reserves a huge chunk of address space for itself, which subsequently gets filled with other libraries. However, libdw won't reliably accept binaries reported "on top of" existing modules. In seemingly random cases it will complain that the modules overlap and reject the new one. So, when we fail to report a module to Dwfl, we mark the symbol cache as dirty and retry unwinding at most once after clearing the cache. Clearing the cache resets libdw's state and allows us to report the new module first, and unwind symbols from it. Change-Id: Idb5d85afb39e05c0439206b8d4938b79b6173b2c Reviewed-by: Ulf Hermann <ulf.hermann@qt.io> Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	Do not wait indefinitely when parsing empty/broken perf.data file	Milian Wolff	2017-04-13	1	-1/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	So far, perfparser hangs for cases like the following: echo -n > perf.data.empty ./perfparser --input perf.data.empty Similarly, we can hang forever when we parse a file that only contains a perf header but nothing else. This patch adds checks on the file size for non-sequential inputs (i.e. files) which will trigger an early return when the input data is broken. Change-Id: I9c22010dd3628ef65e52a785e36c928445633570 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Skip symbol demangling on non-unix platforms	Ulf Hermann	2017-04-13	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The hotst's name mangling will likely differ from the target's anyway. Change-Id: Iea8672c6697b9526a48dd951973fdbc9c1dae04d Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	Drop the begin/end decls workaround	Ulf Hermann	2017-04-13	1	-6/+0
\| \| \| \| \| \| \| \| \| \|	Change-Id: I6b4bc432de56d6d068ac0b90ac356bd7783a30c7 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	Fix some compile errors on windows	Ulf Hermann	2017-04-13	3	-9/+10
\| \| \| \| \| \| \| \| \| \|	Change-Id: Ie8073c6f32cd0184ab666ced9d10cf48e59f11c3 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	Include io.h for setmode on windows	Ulf Hermann	2017-04-13	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	Change-Id: Id993204f2a0be67edf5d29a9400fb71d63774887 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	Support the old ~/.debug directory format	Milian Wolff	2017-04-13	1	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only newer perf tools will write the debug file to the location `~/.debug/<buildid>/elf`. Older tools instead will write the debug file directly to a file called `~/.debug/<buildid>`. Change-Id: I4d7e24e5774c2d6888cf74a51ec40275647da8f9 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Fix compile warning	Milian Wolff	2017-04-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	app/perfelfmap.cpp:111:21: warning: converting to ‘PerfElfMap::ElfInfo’ from initializer list would use explicit constructor ‘PerfElfMap::ElfInfo::ElfInfo(const QFileInfo&, quint64, quint64, quint64, const QByteArray&)’ return {}; Introduced by my recent change to make the ElfInfo constructor explicit. Sorry, I did not notice it before. Change-Id: Ib7caedd047f16c98bafc079b92b37543db925cc1 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Send progress messages when parsing perf.data files	Milian Wolff	2017-04-13	3	-2/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In these cases, we know the expected file size and can thus compute our current progress value. We do this up to 100 times and send the progress value as a normalized float percent, i.e. a value between 0 and 1. This is a helpful feature, as large data files take a long time to parse. Showing the user that we make some progress is a good thing. Change-Id: Icb0c9564e06173a526b726e93d75d4f5b7e8949d Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Make ElfInfo constructor explicit	Milian Wolff	2017-04-13	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise a statement like qDebug() << QFileInfo() would pick up this constructor and lead to confusing debug output. Change-Id: Idb9692bd36983b055409cb347e3175aaf5d75eda Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Return early when reading the file attributes or features failed	Milian Wolff	2017-04-13	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When this happens, it is an indicator for a broken file and we don't even need to go in and parse its contents. Change-Id: If96e0b1e9fed2cb1069b6d3f4bfc03193321c132 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Report an error when the kallsyms file could not be parsed	Milian Wolff	2017-04-12	5	-10/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a wrong kallsym path is given, or the file could not be parsed correctly, e.g. when it was empty, then we want to notify the user about this. Otherwise, it may not be clear why symbol resolution for kernel addresses is broken. Change-Id: Icf51fa3038810e69a91d332a33495e7678b3977a Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Do not set app nor extra library paths by default	Milian Wolff	2017-04-12	2	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default, both paths pointed to the current working directory. When I use perfparser in a folder that has many (sub)directories and files, it is excruciatingly slow: 145338.874552 task-clock (msec) # 0.997 CPUs utilized 13,353 context-switches # 0.092 K/sec 147 cpu-migrations # 0.001 K/sec 4,497 page-faults # 0.031 K/sec 557,953,349,806 cycles # 3.839 GHz 1,009,672,374,742 instructions # 1.81 insn per cycle 238,669,106,565 branches # 1642.156 M/sec 3,017,437,636 branch-misses # 1.26% of all branches 145.823501862 seconds time elapsed This is on a SSD with an ext4 file system, but going through 104164 files for every mmap event is simply going to take its time. This patch improves this situation by dropping the implicit recursive lookup in the current working directory. The performance impact is tremendous: 4425.928440 task-clock (msec) # 0.999 CPUs utilized 158 context-switches # 0.036 K/sec 1 cpu-migrations # 0.000 K/sec 3,299 page-faults # 0.745 K/sec 17,042,783,950 cycles # 3.851 GHz 36,178,866,218 instructions # 2.12 insn per cycle 8,448,978,802 branches # 1908.973 M/sec 63,738,579 branch-misses # 0.75% of all branches 4.432039578 seconds time elapsed I.e. this patch makes this case more than 33 times faster. Change-Id: I9a2c4e84ed739e1fc602be675bd01369b1c39f4c Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Report an error when an ELF file could not be found	Milian Wolff	2017-04-12	2	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we fail to find a matching ELF file for an mmap event, the usefulness of the perfparser can be severly impacted: - unwinding will not work - symbol resolution will not work - potentially other things will not work As such, report an error to the user when this occurs. Change-Id: I8a47f8725a29684ac11b24dadb20e669a45d3016 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Merge remote-tracking branch 'origin/4.3'	Ulf Hermann	2017-04-05	1	-4/+14
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: app/perfsymboltable.cpp Change-Id: I66e3a8aa490628246a507769daa32d69ec7b4bd3
\| *	Do report overlapping modules from lookupFrame()	Ulf Hermann	2017-04-05	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We always get many modules that overlap ld.so. We have to report them when we find them, or otherwise the unwinding will fail. dwfl apparently doesn't mind the overlap in this case, so we don't have to clear the cache before. Change-Id: I68e9f6fe1653073b555755f546e743621e8c7919 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
\| *	Continue path search if the file we are looking for is not a directory	Ulf Hermann	2017-04-05	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We want to check the parent directory of the file, but that is already implicit in the entryList() call, which will just return an empty list if it doesn't exist. Change-Id: I087ed4fdd6db66e6c02d8604af219c68b5280af7 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	Fix qbs build	Christian Kandeler	2017-04-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adaptation to ade8449ea2. Change-Id: Ic277a584140278905066194feaa4c8188c581c09 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Fix build with namespaced Qt	Christian Kandeler	2017-04-03	2	-0/+5
\| \| \| \| \| \| \| \| \| \|	Change-Id: I4e9114cc9f9adb0eb7a46dc30a9cbbda4c6dacda Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Avoid passing QMAKE_ARCH to config.h	Ulf Hermann	2017-03-29	2	-12/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't need qmake to determine sizeof(long). Change-Id: Iced95d685b4c82fb3925bb164691203501e395d9 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	Build correctly with debug_and_release or non-standard tools	Ulf Hermann	2017-03-29	12	-16/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On some platform make is not called "make" and we want to explicitly place the intermediate artifacts in OUT_PWD so that the next step can find them without digging through the "Debug" or "Release" folders. Change-Id: I4f9139b471030a57b7cab374cf0fe360be633b02 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	Include sample period and weight in parsed output	Milian Wolff	2017-03-29	3	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sample period equals the performance counter value since the last sample, and is e.g. emitted also by `perf script`. The sample weight is important for client application to correctly attribute the sampling cost. I.e. it is not enough to just count the number of total sample, but rather one must use the weighted number of samples. Change-Id: I052ae25dcca972320ca8601b3d821398c08401ad Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Make maximum number of frames configurable via a CLI argument	Milian Wolff	2017-03-29	3	-6/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The default stays at 64, but it is now possible to unwind more frames if desired. Change-Id: I8da2ea340bf97678b2bbbd495b4864da0cf0fddc Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Try to find elfs in debug path, when build id is available	Milian Wolff	2017-03-29	7	-33/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This enables us to parse perf.data files when the corresponding elf objects do not exist anymore in the version that was used at perf record time. This also enables us to use `perf archive` to evaluate perf.data files with perfparser on different machines. Change-Id: Id7ac1af125dd3818dc86880f25a0f74d8d09bfc1 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Pass an explicit Elf* to deduce the target architecture	Milian Wolff	2017-03-29	4	-34/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of reporting a random elf map, open the first elf manually and pass that to dwfl_attach_state. This will then be used to deduce the target architecture and thus gives us more control about what is going on, without influencing the actual mappings. A perf record for a runtime-attached profile could potentially contain mmap events for files first. This happened when attaching to KDevelop, which opens files internally using mmap. In such cases, we would try to guess the architecture from a text file containing code in text form, instead of a binary ELF, which of course did not work. Instead, validate the ELF we pass to dwfl_attach_state to guess the architecture via elf_kind. If that returns ELF_K_NONE, the file is not a valid ELF object and can thus not be used to guess the architecture. We simply silently skip this and try with the next elf file we encounter, until it hopefully works. Change-Id: I00ec7fa1da669c4b5ed9156654818b64bdf050ef Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Completely remove time handling from the PerfElfMap	Milian Wolff	2017-03-29	7	-114/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we sort the mmap and sample events by time, we can assume that we will only look for mappings based on the current time, i.e. the time of the last added elf mapping. This simplifies the code a bit further and would allow us to optimize it later, if need be. Note that this theoretically breaks the handling of samples that violate the time ordering across FINISHED_ROUND events. This was broken earlier already when we removed the overwritten elf mappings. And note how this does not pose any real-world issues in my tests. Change-Id: I24f14afdf17cf5d4f7dcb5440dc04d02f591fcb8 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Remove cached QFileInfo::isFile() from PerfElfMap::ElfInfo	Milian Wolff	2017-03-29	3	-9/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This cache is not necessary, as the information is cached within QFileInfo already. Removing it decreases the ElfInfo size by 8 byte. Now that we store the mappings in a vector, this actually improves the performance more than caching the isFile() value again. Change-Id: I6bf2cc7a165f3a00d4e42dcd0922d126c40987fa Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Remove overwritten elf mappings from the PerfElfMap	Milian Wolff	2017-03-29	5	-52/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we handle the elf map events in a time-ordered fashion, we can greatly optimize the map by removing outdated information. This happens extremely often in the real-world, when the heap map grows over time when the perf client application allocates a lot of memory. Note that this patch could potentially result in no mapping getting returned for buggy samples that violate the time order across FINISHED_ROUND events. This patch dramatically improves the performance of perfparser for real-world applications for me. Before, I measured the following numbers for perfparser: Performance counter stats for './lib/hotspot/libexec/hotspot-perfparser --input perf.data.heaptrack.normal --output /dev/null': 73063.166128 task-clock:u (msec) # 0.998 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 29,923 page-faults:u # 0.410 K/sec 223,352,922,905 cycles:u # 3.057 GHz 138,541,646,788 instructions:u # 0.62 insn per cycle 39,581,056,564 branches:u # 541.737 M/sec 242,935,966 branch-misses:u # 0.61% of all branches 73.242094659 seconds time elapsed Performance counter stats for './lib/hotspot/libexec/hotspot-perfparser --input perf.data.heaptrack.stream --output /dev/null': 137772.664268 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 28,393 page-faults:u # 0.206 K/sec 398,086,328,511 cycles:u # 2.889 GHz 164,041,587,534 instructions:u # 0.41 insn per cycle 51,009,555,879 branches:u # 370.244 M/sec 350,270,354 branch-misses:u # 0.69% of all branches 137.855666379 seconds time elapsed Now, this goes down to: Performance counter stats for './lib/hotspot/libexec/hotspot-perfparser --input perf.data.heaptrack.normal --output /dev/null': 9253.921384 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 29,627 page-faults:u # 0.003 M/sec 28,099,608,231 cycles:u # 3.037 GHz 73,635,879,583 instructions:u # 2.62 insn per cycle 17,114,401,461 branches:u # 1849.422 M/sec 96,904,616 branch-misses:u # 0.57% of all branches 9.266196437 seconds time elapsed Performance counter stats for './lib/hotspot/libexec/hotspot-perfparser --input perf.data.heaptrack.stream --output /dev/null': 8331.098618 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 26,379 page-faults:u # 0.003 M/sec 25,206,589,319 cycles:u # 3.026 GHz 65,041,985,552 instructions:u # 2.58 insn per cycle 15,106,429,469 branches:u # 1813.258 M/sec 82,649,273 branch-misses:u # 0.55% of all branches 8.343114295 seconds time elapsed Similarly, the benchmarks also dramatically improve. Before I measured: ******* Start testing of TestElfMap ***** PASS : TestElfMap::benchRegisterElfExpanding(10) RESULT : TestElfMap::benchRegisterElfExpanding():"10": 0.0018 msecs per iteration (total: 59, iterations: 32768) PASS : TestElfMap::benchRegisterElfExpanding(100) RESULT : TestElfMap::benchRegisterElfExpanding():"100": 0.024 msecs per iteration (total: 51, iterations: 2048) PASS : TestElfMap::benchRegisterElfExpanding(1000) RESULT : TestElfMap::benchRegisterElfExpanding():"1000": 1.3 msecs per iteration (total: 85, iterations: 64) PASS : TestElfMap::benchRegisterElfExpanding(2000) RESULT : TestElfMap::benchRegisterElfExpanding():"2000": 5.6 msecs per iteration (total: 91, iterations: 16) PASS : TestElfMap::benchFindElfDisjunct(10) RESULT : TestElfMap::benchFindElfDisjunct():"10": 0.0028 msecs per iteration (total: 92, iterations: 32768) PASS : TestElfMap::benchFindElfDisjunct(100) RESULT : TestElfMap::benchFindElfDisjunct():"100": 0.031 msecs per iteration (total: 64, iterations: 2048) PASS : TestElfMap::benchFindElfDisjunct(1000) RESULT : TestElfMap::benchFindElfDisjunct():"1000": 0.38 msecs per iteration (total: 98, iterations: 256) PASS : TestElfMap::benchFindElfDisjunct(2000) RESULT : TestElfMap::benchFindElfDisjunct():"2000": 0.789 msecs per iteration (total: 101, iterations: 128) PASS : TestElfMap::benchFindElfOverlapping(10) RESULT : TestElfMap::benchFindElfOverlapping():"10": 0.0029 msecs per iteration (total: 98, iterations: 32768) PASS : TestElfMap::benchFindElfOverlapping(100) RESULT : TestElfMap::benchFindElfOverlapping():"100": 0.035 msecs per iteration (total: 72, iterations: 2048) PASS : TestElfMap::benchFindElfOverlapping(1000) RESULT : TestElfMap::benchFindElfOverlapping():"1000": 0.40 msecs per iteration (total: 52, iterations: 128) PASS : TestElfMap::benchFindElfOverlapping(2000) RESULT : TestElfMap::benchFindElfOverlapping():"2000": 0.82 msecs per iteration (total: 53, iterations: 64) PASS : TestElfMap::benchFindElfExpanding(10) RESULT : TestElfMap::benchFindElfExpanding():"10": 0.0034 msecs per iteration (total: 57, iterations: 16384) PASS : TestElfMap::benchFindElfExpanding(100) RESULT : TestElfMap::benchFindElfExpanding():"100": 0.11 msecs per iteration (total: 59, iterations: 512) PASS : TestElfMap::benchFindElfExpanding(1000) RESULT : TestElfMap::benchFindElfExpanding():"1000": 10 msecs per iteration (total: 80, iterations: 8) PASS : TestElfMap::benchFindElfExpanding(2000) RESULT : TestElfMap::benchFindElfExpanding():"2000": 52.0 msecs per iteration (total: 104, iterations: 2) Totals: 33 passed, 0 failed, 0 skipped, 0 blacklisted, 7198ms ***** Finished testing of TestElfMap ***** Now, this goes down to: ***** Start testing of TestElfMap ***** Config: Using QtTest library 5.8.0, Qt 5.8.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 6.3.1 20170109) PASS : TestElfMap::benchRegisterElfDisjunct(10) RESULT : TestElfMap::benchRegisterElfDisjunct():"10": 0.0016 msecs per iteration (total: 54, iterations: 32768) PASS : TestElfMap::benchRegisterElfDisjunct(100) RESULT : TestElfMap::benchRegisterElfDisjunct():"100": 0.018 msecs per iteration (total: 74, iterations: 4096) PASS : TestElfMap::benchRegisterElfDisjunct(1000) RESULT : TestElfMap::benchRegisterElfDisjunct():"1000": 0.53 msecs per iteration (total: 68, iterations: 128) PASS : TestElfMap::benchRegisterElfDisjunct(2000) RESULT : TestElfMap::benchRegisterElfDisjunct():"2000": 1.9 msecs per iteration (total: 62, iterations: 32) PASS : TestElfMap::benchRegisterElfOverlapping(10) RESULT : TestElfMap::benchRegisterElfOverlapping():"10": 0.0023 msecs per iteration (total: 76, iterations: 32768) PASS : TestElfMap::benchRegisterElfOverlapping(100) RESULT : TestElfMap::benchRegisterElfOverlapping():"100": 0.025 msecs per iteration (total: 52, iterations: 2048) PASS : TestElfMap::benchRegisterElfOverlapping(1000) RESULT : TestElfMap::benchRegisterElfOverlapping():"1000": 0.59 msecs per iteration (total: 76, iterations: 128) PASS : TestElfMap::benchRegisterElfOverlapping(2000) RESULT : TestElfMap::benchRegisterElfOverlapping():"2000": 2.0 msecs per iteration (total: 66, iterations: 32) PASS : TestElfMap::benchRegisterElfExpanding(10) RESULT : TestElfMap::benchRegisterElfExpanding():"10": 0.0015 msecs per iteration (total: 52, iterations: 32768) PASS : TestElfMap::benchRegisterElfExpanding(100) RESULT : TestElfMap::benchRegisterElfExpanding():"100": 0.015 msecs per iteration (total: 65, iterations: 4096) PASS : TestElfMap::benchRegisterElfExpanding(1000) RESULT : TestElfMap::benchRegisterElfExpanding():"1000": 0.15 msecs per iteration (total: 81, iterations: 512) PASS : TestElfMap::benchRegisterElfExpanding(2000) RESULT : TestElfMap::benchRegisterElfExpanding():"2000": 0.31 msecs per iteration (total: 81, iterations: 256) PASS : TestElfMap::benchFindElfDisjunct(10) RESULT : TestElfMap::benchFindElfDisjunct():"10": 0.0028 msecs per iteration (total: 93, iterations: 32768) PASS : TestElfMap::benchFindElfDisjunct(100) RESULT : TestElfMap::benchFindElfDisjunct():"100": 0.031 msecs per iteration (total: 65, iterations: 2048) PASS : TestElfMap::benchFindElfDisjunct(1000) RESULT : TestElfMap::benchFindElfDisjunct():"1000": 0.38 msecs per iteration (total: 99, iterations: 256) PASS : TestElfMap::benchFindElfDisjunct(2000) RESULT : TestElfMap::benchFindElfDisjunct():"2000": 0.79 msecs per iteration (total: 51, iterations: 64) PASS : TestElfMap::benchFindElfOverlapping(10) RESULT : TestElfMap::benchFindElfOverlapping():"10": 0.0028 msecs per iteration (total: 93, iterations: 32768) PASS : TestElfMap::benchFindElfOverlapping(100) RESULT : TestElfMap::benchFindElfOverlapping():"100": 0.031 msecs per iteration (total: 64, iterations: 2048) PASS : TestElfMap::benchFindElfOverlapping(1000) RESULT : TestElfMap::benchFindElfOverlapping():"1000": 0.39 msecs per iteration (total: 51, iterations: 128) PASS : TestElfMap::benchFindElfOverlapping(2000) RESULT : TestElfMap::benchFindElfOverlapping():"2000": 0.79 msecs per iteration (total: 51, iterations: 64) PASS : TestElfMap::benchFindElfExpanding(10) RESULT : TestElfMap::benchFindElfExpanding():"10": 0.0032 msecs per iteration (total: 53, iterations: 16384) PASS : TestElfMap::benchFindElfExpanding(100) RESULT : TestElfMap::benchFindElfExpanding():"100": 0.032 msecs per iteration (total: 67, iterations: 2048) PASS : TestElfMap::benchFindElfExpanding(1000) RESULT : TestElfMap::benchFindElfExpanding():"1000": 0.32 msecs per iteration (total: 84, iterations: 256) PASS : TestElfMap::benchFindElfExpanding(2000) RESULT : TestElfMap::benchFindElfExpanding():"2000": 0.65 msecs per iteration (total: 84, iterations: 128) Totals: 33 passed, 0 failed, 0 skipped, 0 blacklisted, 6747ms ***** Finished testing of TestElfMap ******* Change-Id: I6eaca5d6561dcdb0cee0d3aed4eec8f0f6c9c9a3 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Use sorted QVector instead of a QMap in PerfElfMap	Milian Wolff	2017-03-29	3	-19/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This improves the performance of the benchmarks considerably: ******* Start testing of TestElfMap ***** Config: Using QtTest library 5.8.0, Qt 5.8.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 6.3.1 20170109) PASS : TestElfMap::benchRegisterElfDisjunct(10) RESULT : TestElfMap::benchRegisterElfDisjunct():"10": 0.0017 msecs per iteration (total: 57, iterations: 32768) PASS : TestElfMap::benchRegisterElfDisjunct(100) RESULT : TestElfMap::benchRegisterElfDisjunct():"100": 0.018 msecs per iteration (total: 74, iterations: 4096) PASS : TestElfMap::benchRegisterElfDisjunct(1000) RESULT : TestElfMap::benchRegisterElfDisjunct():"1000": 0.51 msecs per iteration (total: 66, iterations: 128) PASS : TestElfMap::benchRegisterElfDisjunct(2000) RESULT : TestElfMap::benchRegisterElfDisjunct():"2000": 1.8 msecs per iteration (total: 58, iterations: 32) PASS : TestElfMap::benchRegisterElfOverlapping(10) RESULT : TestElfMap::benchRegisterElfOverlapping():"10": 0.0024 msecs per iteration (total: 81, iterations: 32768) PASS : TestElfMap::benchRegisterElfOverlapping(100) RESULT : TestElfMap::benchRegisterElfOverlapping():"100": 0.033 msecs per iteration (total: 68, iterations: 2048) PASS : TestElfMap::benchRegisterElfOverlapping(1000) RESULT : TestElfMap::benchRegisterElfOverlapping():"1000": 1.4 msecs per iteration (total: 95, iterations: 64) PASS : TestElfMap::benchRegisterElfOverlapping(2000) RESULT : TestElfMap::benchRegisterElfOverlapping():"2000": 5.4 msecs per iteration (total: 87, iterations: 16) PASS : TestElfMap::benchRegisterElfExpanding(10) RESULT : TestElfMap::benchRegisterElfExpanding():"10": 0.0018 msecs per iteration (total: 59, iterations: 32768) PASS : TestElfMap::benchRegisterElfExpanding(100) RESULT : TestElfMap::benchRegisterElfExpanding():"100": 0.024 msecs per iteration (total: 51, iterations: 2048) PASS : TestElfMap::benchRegisterElfExpanding(1000) RESULT : TestElfMap::benchRegisterElfExpanding():"1000": 1.3 msecs per iteration (total: 85, iterations: 64) PASS : TestElfMap::benchRegisterElfExpanding(2000) RESULT : TestElfMap::benchRegisterElfExpanding():"2000": 5.6 msecs per iteration (total: 91, iterations: 16) PASS : TestElfMap::benchFindElfDisjunct(10) RESULT : TestElfMap::benchFindElfDisjunct():"10": 0.0028 msecs per iteration (total: 92, iterations: 32768) PASS : TestElfMap::benchFindElfDisjunct(100) RESULT : TestElfMap::benchFindElfDisjunct():"100": 0.031 msecs per iteration (total: 64, iterations: 2048) PASS : TestElfMap::benchFindElfDisjunct(1000) RESULT : TestElfMap::benchFindElfDisjunct():"1000": 0.38 msecs per iteration (total: 98, iterations: 256) PASS : TestElfMap::benchFindElfDisjunct(2000) RESULT : TestElfMap::benchFindElfDisjunct():"2000": 0.789 msecs per iteration (total: 101, iterations: 128) PASS : TestElfMap::benchFindElfOverlapping(10) RESULT : TestElfMap::benchFindElfOverlapping():"10": 0.0029 msecs per iteration (total: 98, iterations: 32768) PASS : TestElfMap::benchFindElfOverlapping(100) RESULT : TestElfMap::benchFindElfOverlapping():"100": 0.035 msecs per iteration (total: 72, iterations: 2048) PASS : TestElfMap::benchFindElfOverlapping(1000) RESULT : TestElfMap::benchFindElfOverlapping():"1000": 0.40 msecs per iteration (total: 52, iterations: 128) PASS : TestElfMap::benchFindElfOverlapping(2000) RESULT : TestElfMap::benchFindElfOverlapping():"2000": 0.82 msecs per iteration (total: 53, iterations: 64) PASS : TestElfMap::benchFindElfExpanding(10) RESULT : TestElfMap::benchFindElfExpanding():"10": 0.0034 msecs per iteration (total: 57, iterations: 16384) PASS : TestElfMap::benchFindElfExpanding(100) RESULT : TestElfMap::benchFindElfExpanding():"100": 0.11 msecs per iteration (total: 59, iterations: 512) PASS : TestElfMap::benchFindElfExpanding(1000) RESULT : TestElfMap::benchFindElfExpanding():"1000": 10 msecs per iteration (total: 80, iterations: 8) PASS : TestElfMap::benchFindElfExpanding(2000) RESULT : TestElfMap::benchFindElfExpanding():"2000": 52.0 msecs per iteration (total: 104, iterations: 2) Totals: 33 passed, 0 failed, 0 skipped, 0 blacklisted, 7198ms ***** Finished testing of TestElfMap ***** Without this patch, the numbers on my machine here were: ***** Start testing of TestElfMap ***** Config: Using QtTest library 5.8.0, Qt 5.8.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 6.3.1 20170109) PASS : TestElfMap::benchRegisterElfDisjunct(10) RESULT : TestElfMap::benchRegisterElfDisjunct():"10": 0.0018 msecs per iteration (total: 59, iterations: 32768) PASS : TestElfMap::benchRegisterElfDisjunct(100) RESULT : TestElfMap::benchRegisterElfDisjunct():"100": 0.038 msecs per iteration (total: 79, iterations: 2048) PASS : TestElfMap::benchRegisterElfDisjunct(1000) RESULT : TestElfMap::benchRegisterElfDisjunct():"1000": 3.7 msecs per iteration (total: 60, iterations: 16) PASS : TestElfMap::benchRegisterElfDisjunct(2000) RESULT : TestElfMap::benchRegisterElfDisjunct():"2000": 20 msecs per iteration (total: 80, iterations: 4) PASS : TestElfMap::benchRegisterElfOverlapping(10) RESULT : TestElfMap::benchRegisterElfOverlapping():"10": 0.0037 msecs per iteration (total: 61, iterations: 16384) PASS : TestElfMap::benchRegisterElfOverlapping(100) RESULT : TestElfMap::benchRegisterElfOverlapping():"100": 0.085 msecs per iteration (total: 88, iterations: 1024) PASS : TestElfMap::benchRegisterElfOverlapping(1000) RESULT : TestElfMap::benchRegisterElfOverlapping():"1000": 9.3 msecs per iteration (total: 75, iterations: 8) PASS : TestElfMap::benchRegisterElfOverlapping(2000) RESULT : TestElfMap::benchRegisterElfOverlapping():"2000": 34.5 msecs per iteration (total: 138, iterations: 4) PASS : TestElfMap::benchRegisterElfExpanding(10) RESULT : TestElfMap::benchRegisterElfExpanding():"10": 0.0018 msecs per iteration (total: 60, iterations: 32768) PASS : TestElfMap::benchRegisterElfExpanding(100) RESULT : TestElfMap::benchRegisterElfExpanding():"100": 0.042 msecs per iteration (total: 87, iterations: 2048) PASS : TestElfMap::benchRegisterElfExpanding(1000) RESULT : TestElfMap::benchRegisterElfExpanding():"1000": 4.1 msecs per iteration (total: 67, iterations: 16) PASS : TestElfMap::benchRegisterElfExpanding(2000) RESULT : TestElfMap::benchRegisterElfExpanding():"2000": 21 msecs per iteration (total: 86, iterations: 4) PASS : TestElfMap::benchFindElfDisjunct(10) RESULT : TestElfMap::benchFindElfDisjunct():"10": 0.0027 msecs per iteration (total: 91, iterations: 32768) PASS : TestElfMap::benchFindElfDisjunct(100) RESULT : TestElfMap::benchFindElfDisjunct():"100": 0.031 msecs per iteration (total: 64, iterations: 2048) PASS : TestElfMap::benchFindElfDisjunct(1000) RESULT : TestElfMap::benchFindElfDisjunct():"1000": 0.39 msecs per iteration (total: 51, iterations: 128) PASS : TestElfMap::benchFindElfDisjunct(2000) RESULT : TestElfMap::benchFindElfDisjunct():"2000": 0.82 msecs per iteration (total: 53, iterations: 64) PASS : TestElfMap::benchFindElfOverlapping(10) RESULT : TestElfMap::benchFindElfOverlapping():"10": 0.0031 msecs per iteration (total: 51, iterations: 16384) PASS : TestElfMap::benchFindElfOverlapping(100) RESULT : TestElfMap::benchFindElfOverlapping():"100": 0.039 msecs per iteration (total: 81, iterations: 2048) PASS : TestElfMap::benchFindElfOverlapping(1000) RESULT : TestElfMap::benchFindElfOverlapping():"1000": 0.46 msecs per iteration (total: 60, iterations: 128) PASS : TestElfMap::benchFindElfOverlapping(2000) RESULT : TestElfMap::benchFindElfOverlapping():"2000": 1.0 msecs per iteration (total: 64, iterations: 64) PASS : TestElfMap::benchFindElfExpanding(10) RESULT : TestElfMap::benchFindElfExpanding():"10": 0.0059 msecs per iteration (total: 98, iterations: 16384) PASS : TestElfMap::benchFindElfExpanding(100) RESULT : TestElfMap::benchFindElfExpanding():"100": 0.67 msecs per iteration (total: 87, iterations: 128) PASS : TestElfMap::benchFindElfExpanding(1000) RESULT : TestElfMap::benchFindElfExpanding():"1000": 131 msecs per iteration (total: 131, iterations: 1) PASS : TestElfMap::benchFindElfExpanding(2000) RESULT : TestElfMap::benchFindElfExpanding():"2000": 685 msecs per iteration (total: 685, iterations: 1) Totals: 33 passed, 0 failed, 0 skipped, 0 blacklisted, 9217ms ***** Finished testing of TestElfMap ******* Change-Id: I7b39275960cbb709b60b2b441751077117ccc304 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Simplify PerfElfMap::registerElf now that the input is sorted by time	Milian Wolff	2017-03-29	2	-59/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because the mmap events are now added in time order, we can simplify the implementation of registerElf as we do not longer need to account for older events getting added. Change-Id: I131ca75fcb52e6e1f4238470f276f34a13bea537 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Handle mmap and sample events in time order	Milian Wolff	2017-03-29	3	-38/+179
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the perf file format documentation, events can arrive in out-of-time-order. But no time-reordering should occur across a FINISHED_ROUND event. This pseudo-event was added to perf in Linux v2.6.35, released in 2010. But, sadly, it is only being used for non-tracing perf data files since Linux 3.17 from around 2014. Additionally, time order violations across the FINISHED_ROUND event can be observed even for data files obtained today. As such, we cannot always use FINISHED_ROUND to order our events. Instead, we use the following heuristic: - we buffer both samples and mmap events - when the combined size of both buffers exceeds a certain threshold, defined via the new --buffer-size CLI argument which defaults to 10MB, we flush the buffers: -- we sort samples and events by time -- then we iterate over the samples and handle all mmap events before the sample time -- we stop flushing the buffers when we handled half of the buffers by size, i.e. when we only have 5MB left - then we continue to buffer until we reach the threshold the next time - when we finish, we flush the full buffer This heuristic is only applied until we know that we can actually rely on FINISHED_ROUND events. This is the case when: - the user passed the CLI argument --buffer-size 0 - when we encountered a FINISHED_ROUND event - when we encountered a PerfFeatures event that tells us the perf that was used to record the data file is newer than 3.17 When we rely on FINISHED_ROUND events, we still only ever analyze half of our buffers, to work-around the upstream issues in perf that lead to time order violations across FINISHED_ROUND events. While somewhat complicated, this patch allows us to simplify the elf map significantly. This is done in the follow-up commits. The statistics mode is further extended, which shows us how this new behavior plays out in terms of memory consumption. For a normal perf file where we rely on the finished round events we observe the following values: ~~~~~~~~~~ samples: 20260 mmaps: 24331 rounds: 957 buffer flushes: 958 samples time violations: 0 mmaps time violations: 0 max samples per round: 75 max mmaps per round: 474 max samples per flush: 51 max mmaps per flush: 429 max buffer size: 847328 max total event size per round: 647040 max time: 1143246068673300 max time between rounds: 739096862 max reorder time: 738811755 ~~~~~~~~~~ For a perf file that was recorded with `-m 8192`, we instead see the following statistics. Note how the buffer size automatically is sized to fit the actual round size, and no time violations occur: ~~~~~~~~~~ samples: 20068 mmaps: 24341 rounds: 24 buffer flushes: 25 samples time violations: 0 mmaps time violations: 0 max samples per round: 3577 max mmaps per round: 6103 max samples per flush: 1893 max mmaps per flush: 2910 max buffer size: 31946624 max total event size per round: 30106704 max time: 1141236144717550 max time between rounds: 965150979 max reorder time: 964957358 ~~~~~~~~~~ When we parse a file in the streaming format, we cannot know at the beginning that we should rely on the finished round events. Then the statistics for a normal record look like this: ~~~~~~~~~~ samples: 20303 mmaps: 24339 rounds: 1029 buffer flushes: 1029 samples time violations: 0 mmaps time violations: 0 max samples per round: 98 max mmaps per round: 489 max samples per flush: 61 max mmaps per flush: 610 max buffer size: 1144496 max total event size per round: 863472 max time: 1143595838784853 max time between rounds: 554771976 max reorder time: 13834481 ~~~~~~~~~~ If the record was done with `-m 8192` we instead observe some time order violations at the beginning which could be work-arounded by passing a larger buffer size. Once we encounter the first finished round event, we follow those and do not suffer from time violations anymore: ~~~~~~~~~~ samples: 19854 mmaps: 24338 rounds: 21 buffer flushes: 24 samples time violations: 465 mmaps time violations: 395 max samples per round: 5115 max mmaps per round: 6316 max samples per flush: 2700 max mmaps per flush: 5364 max buffer size: 46458392 max total event size per round: 43698384 max time: 1143585885779204 max time between rounds: 912757434 max reorder time: 908114269 ~~~~~~~~~~ When we parse a perf.data file without FINISHED_ROUND events, we get for a normal file: ~~~~~~~~~~ samples: 20303 mmaps: 24339 rounds: 1 buffer flushes: 33 samples time violations: 0 mmaps time violations: 0 max samples per round: 20303 max mmaps per round: 24339 max samples per flush: 654 max mmaps per flush: 3004 max buffer size: 10494104 max total event size per round: 173458680 max time: 1143595838784853 max time between rounds: 0 max reorder time: 13834481 ~~~~~~~~~~ If the file has huge buffers, i.e. again `-m 8192` was passed to perf record, we instead see: ~~~~~~~~~~ samples: 19854 mmaps: 24338 rounds: 1 buffer flushes: 32 samples time violations: 4859 mmaps time violations: 3586 max samples per round: 19854 max mmaps per round: 24338 max samples per flush: 817 max mmaps per flush: 3089 max buffer size: 10493280 max total event size per round: 169569288 max time: 1143585885779204 max time between rounds: 0 max reorder time: 908114269 ~~~~~~~~~~ Change-Id: I756c4cccf75b4ce0179e965996f8b821bf60e3dd Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Handle FINISHED_ROUND events	Milian Wolff	2017-03-29	4	-4/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This silences the warnings on the command line: unhandled event type 68 Additionally, these events are used to compute some more statistic values that are useful as a baseline for values we can use in future heuristics: ~~~~~~~~~~ samples: 20993 mmaps: 24331 rounds: 347 max samples per round: 311 max mmaps per round: 781 max buffer size: 1057128 max total event size per round: 2632552 max time: 629013777317625 max time between rounds: 453996043 max reorder time: 375052737 ~~~~~~~~~~ Change-Id: I7e087410ee5551ce66d2bcee223ec57530bcf58d Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Add mode to print statistics for a given perf data file	Milian Wolff	2017-03-29	3	-8/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To find fitting values for some magic heuristic values we will need in the follow-up commits to sort events by time, this mode analyzes a data file and computes some statistics. Unwinding and similarly expensive operations are disabled in the `--print-stats` mode. The output of this mode is e.g.: ~~~~~~~~~~~~~~ samples: 20210 mmaps: 24330 max buffer size: 1057128 max time: 628782799569406 max reorder time: 376374129 ~~~~~~~~~~~~~~ Change-Id: I5d1344618925502b08ba303239a75d9945d965e7 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Merge remote-tracking branch 'origin/4.3'	Eike Ziller	2017-03-28	2	-8/+17
\|\\| \| \| \| \| \| \|	Change-Id: I210e819f30185a0f8d4ad3bc7d35e8d4d7593cbd
\| *	Also map global attribute ids to internal idsv4.3.0-beta1	Milian Wolff	2017-03-27	2	-8/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	So far, only the streamed PerfRecordAttr events updated the mapping of the attribute ids to the internal ids. Now, we also do this for the global attributes contained within the PerfFeatures. This fixes the resolution of the attribute ids for samples in a non-streamed perf.data file. Change-Id: I1fabb99727d70e3a1c237691ecd4b7421d76a44e Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Refactor PerfElfMap to make it easier to change its internals	Milian Wolff	2017-03-21	5	-136/+192
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of leaking the implementation details such as the QMap iterators in the API, always return an ElfInfo struct. This requires us to add the address to the ElfInfo, essentially duplicating the data that is already used for the QMap key. But this only marginally increases memory consumption, and does not decrease performance significantly. Also, future commits to improve the performance will probably require this anyways. The code in the symbol table that so far relied on accessing the mapping internals directly via the iterators is moved into the elfmap directly. This also allows us to test this part of the code, and enables us to hide the internals. One exception is the iteration over all elf infos, which is required to guess the target architecture. This is still possible, but now uses a simpler foreach loop over all elf infos, which also makes it possible to change the underlying data type seamlessly in the future. To simplify the testing process, ElfInfo also gets a proper QDebug streaming operator as well as a QTest::toString overload. The test is updated accordingly, to leverage this new API. Furthermore, the testing code is simplified by removing the boolean found parameter in the ElfInfo ctor. Instead, it is now initialized by calling `file.isFile()` internally. To show that the performance impact of this change is negleglible, please compare the following benchmark results to those of the previous two commits: PASS : TestElfMap::benchRegisterElfDisjunct(10) RESULT : TestElfMap::benchRegisterElfDisjunct():"10": 0.0017 msecs per iteration (total: 58, iterations: 32768) PASS : TestElfMap::benchRegisterElfDisjunct(100) RESULT : TestElfMap::benchRegisterElfDisjunct():"100": 0.037 msecs per iteration (total: 77, iterations: 2048) PASS : TestElfMap::benchRegisterElfDisjunct(1000) RESULT : TestElfMap::benchRegisterElfDisjunct():"1000": 3.8 msecs per iteration (total: 61, iterations: 16) PASS : TestElfMap::benchRegisterElfDisjunct(2000) RESULT : TestElfMap::benchRegisterElfDisjunct():"2000": 21 msecs per iteration (total: 85, iterations: 4) PASS : TestElfMap::benchRegisterElfOverlapping(10) RESULT : TestElfMap::benchRegisterElfOverlapping():"10": 0.0040 msecs per iteration (total: 66, iterations: 16384) PASS : TestElfMap::benchRegisterElfOverlapping(100) RESULT : TestElfMap::benchRegisterElfOverlapping():"100": 0.086 msecs per iteration (total: 89, iterations: 1024) PASS : TestElfMap::benchRegisterElfOverlapping(1000) RESULT : TestElfMap::benchRegisterElfOverlapping():"1000": 9.7 msecs per iteration (total: 78, iterations: 8) PASS : TestElfMap::benchRegisterElfOverlapping(2000) RESULT : TestElfMap::benchRegisterElfOverlapping():"2000": 35.2 msecs per iteration (total: 141, iterations: 4) PASS : TestElfMap::benchRegisterElfExpanding(10) RESULT : TestElfMap::benchRegisterElfExpanding():"10": 0.0019 msecs per iteration (total: 63, iterations: 32768) PASS : TestElfMap::benchRegisterElfExpanding(100) RESULT : TestElfMap::benchRegisterElfExpanding():"100": 0.043 msecs per iteration (total: 90, iterations: 2048) PASS : TestElfMap::benchRegisterElfExpanding(1000) RESULT : TestElfMap::benchRegisterElfExpanding():"1000": 4.6 msecs per iteration (total: 74, iterations: 16) PASS : TestElfMap::benchRegisterElfExpanding(2000) RESULT : TestElfMap::benchRegisterElfExpanding():"2000": 22 msecs per iteration (total: 91, iterations: 4) PASS : TestElfMap::benchFindElfDisjunct(10) RESULT : TestElfMap::benchFindElfDisjunct():"10": 0.0029 msecs per iteration (total: 98, iterations: 32768) PASS : TestElfMap::benchFindElfDisjunct(100) RESULT : TestElfMap::benchFindElfDisjunct():"100": 0.031 msecs per iteration (total: 65, iterations: 2048) PASS : TestElfMap::benchFindElfDisjunct(1000) RESULT : TestElfMap::benchFindElfDisjunct():"1000": 0.40 msecs per iteration (total: 52, iterations: 128) PASS : TestElfMap::benchFindElfDisjunct(2000) RESULT : TestElfMap::benchFindElfDisjunct():"2000": 0.85 msecs per iteration (total: 55, iterations: 64) PASS : TestElfMap::benchFindElfOverlapping(10) RESULT : TestElfMap::benchFindElfOverlapping():"10": 0.0032 msecs per iteration (total: 53, iterations: 16384) PASS : TestElfMap::benchFindElfOverlapping(100) RESULT : TestElfMap::benchFindElfOverlapping():"100": 0.041 msecs per iteration (total: 85, iterations: 2048) PASS : TestElfMap::benchFindElfOverlapping(1000) RESULT : TestElfMap::benchFindElfOverlapping():"1000": 0.48 msecs per iteration (total: 62, iterations: 128) PASS : TestElfMap::benchFindElfOverlapping(2000) RESULT : TestElfMap::benchFindElfOverlapping():"2000": 1.0 msecs per iteration (total: 65, iterations: 64) PASS : TestElfMap::benchFindElfExpanding(10) RESULT : TestElfMap::benchFindElfExpanding():"10": 0.0060 msecs per iteration (total: 99, iterations: 16384) PASS : TestElfMap::benchFindElfExpanding(100) RESULT : TestElfMap::benchFindElfExpanding():"100": 0.67 msecs per iteration (total: 87, iterations: 128) PASS : TestElfMap::benchFindElfExpanding(1000) RESULT : TestElfMap::benchFindElfExpanding():"1000": 120 msecs per iteration (total: 120, iterations: 1) PASS : TestElfMap::benchFindElfExpanding(2000) RESULT : TestElfMap::benchFindElfExpanding():"2000": 696 msecs per iteration (total: 696, iterations: 1) Change-Id: Id48eb38cc8615b6fa08e84bc4bb6d342b58290b4 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Use ~/.debug as default debug path	Milian Wolff	2017-03-21	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous default of .debug/ tried to look for a folder called .debug in the current working directory. That usually does not exist, the .debug cache folder usually resides in your home folder. This path is now searched by default. Change-Id: I86f86743d2b3bde00dd210ef7802c8079d27b5ce Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Fix build with -DQT_STRICT_ITERATORS	David Faure	2017-03-16	2	-2/+2
\| \| \| \| \| \| \| \| \| \|	Change-Id: I30b1cf22e69989ac96b69c0497d0ba211bfc4a13 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Merge remote-tracking branch 'origin/4.3'	Eike Ziller	2017-03-16	2	-3/+14
\|\\| \| \| \| \| \| \|	Change-Id: I63447a15bc912c00dfd1f003ffb955a96fae77d2
\| *	Fix detection of interworking veneers	Ulf Hermann	2017-03-14	2	-3/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	They are called "$a" or "$t", not "$at". Also, dwfl might detect more or less valid frames underneath the veneer. The LR method is generally better, though. So, try both and use the one that results in a longer stack. Change-Id: I3c60640649d200bd9db7744f3a5d6610784a4d28 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \|	Merge remote-tracking branch 'origin/4.3'	Eike Ziller	2017-03-13	1	-0/+21
\|\\| \| \| \| \| \| \|	Change-Id: I34eda923b93e7c0e755b4bc210a1d7d2402a221c
\| *	qbs build: Fix race condition when building elfutils	Christian Kandeler	2017-03-13	1	-0/+21
\| \| \| \| \| \| \| \| \| \|	Change-Id: Ic838d60269159f792f38e87322e84ab3c1be886d Reviewed-by: Joerg Bornemann <joerg.bornemann@qt.io>