| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| |
| |
| |
| |
| | |
Change-Id: I4d71a668ae6c2a24f568e7ca73170d9fc3fe677e
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| | |
We are going to provide elfutils separately.
Change-Id: Ib78b78bf4d11d7921ae5f53a1d1dfa2a1aab3e53
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
We cannot handle resources passed to and from elfutils with a different
C library.
Change-Id: I47e789b016d13c249d82a7bd1091cd5fb769ce9d
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Link against elfutils in that directory, and copy files from there in
the deploy step. Also, drop the windows specific library naming dance as
we are never going to build the bundled elfutils on windows.
Change-Id: Ia1dd2583856918b2c2623016f6ed7a80c0c7ef07
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
On windows we want backslashes as directory separators and semicolons as
path separators.
Change-Id: I4feaf4864ddd5c1ddaf7d60a5e8f2de3319af8ef
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If Dwarf_Word is bigger than the target platform's word width, we
otherwise get bogus values this way. The values we read are implicitly
used as pointers and register values and the predefined memory_read
implementations in elfutils do respect the word width.
Change-Id: Idbbb76abc72a9b4bacc075b431fa0c854a54fc2e
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
|\|
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
app/perfsymboltable.cpp
app/perfunwind.cpp
Change-Id: If343bb33fabeb60a3eab566769cf2c4dda88fcc5
|
| |
| |
| |
| |
| |
| |
| |
| | |
Yes, this has to be updated on every release, but I don't see a better
way right now.
Change-Id: Ie81849c75c4e3e55cc0265e66ab01ab60d6d2778
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
mmap()'d binaries can overlap in the profiled application's address
space. For example ld.so usually reserves a huge chunk of address space
for itself, which subsequently gets filled with other libraries.
However, libdw won't reliably accept binaries reported "on top of"
existing modules. In seemingly random cases it will complain that the
modules overlap and reject the new one.
So, when we fail to report a module to Dwfl, we mark the symbol cache
as dirty and retry unwinding at most once after clearing the cache.
Clearing the cache resets libdw's state and allows us to report the new
module first, and unwind symbols from it.
Change-Id: Idb5d85afb39e05c0439206b8d4938b79b6173b2c
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
So far, perfparser hangs for cases like the following:
echo -n > perf.data.empty
./perfparser --input perf.data.empty
Similarly, we can hang forever when we parse a file that only
contains a perf header but nothing else. This patch adds checks
on the file size for non-sequential inputs (i.e. files) which will
trigger an early return when the input data is broken.
Change-Id: I9c22010dd3628ef65e52a785e36c928445633570
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| | |
The hotst's name mangling will likely differ from the target's anyway.
Change-Id: Iea8672c6697b9526a48dd951973fdbc9c1dae04d
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| | |
Change-Id: I6b4bc432de56d6d068ac0b90ac356bd7783a30c7
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| | |
Change-Id: Ie8073c6f32cd0184ab666ced9d10cf48e59f11c3
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| | |
Change-Id: Id993204f2a0be67edf5d29a9400fb71d63774887
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Only newer perf tools will write the debug file to the location
`~/.debug/<buildid>/elf`. Older tools instead will write the debug
file directly to a file called `~/.debug/<buildid>`.
Change-Id: I4d7e24e5774c2d6888cf74a51ec40275647da8f9
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
app/perfelfmap.cpp:111:21: warning: converting to ‘PerfElfMap::ElfInfo’
from initializer list would use explicit constructor
‘PerfElfMap::ElfInfo::ElfInfo(const QFileInfo&, quint64, quint64, quint64, const QByteArray&)’
return {};
Introduced by my recent change to make the ElfInfo constructor
explicit. Sorry, I did not notice it before.
Change-Id: Ib7caedd047f16c98bafc079b92b37543db925cc1
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In these cases, we know the expected file size and can thus
compute our current progress value. We do this up to 100 times and
send the progress value as a normalized float percent, i.e. a value
between 0 and 1.
This is a helpful feature, as large data files take a long time
to parse. Showing the user that we make some progress is a good thing.
Change-Id: Icb0c9564e06173a526b726e93d75d4f5b7e8949d
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Otherwise a statement like qDebug() << QFileInfo() would pick up
this constructor and lead to confusing debug output.
Change-Id: Idb9692bd36983b055409cb347e3175aaf5d75eda
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| | |
When this happens, it is an indicator for a broken file and we
don't even need to go in and parse its contents.
Change-Id: If96e0b1e9fed2cb1069b6d3f4bfc03193321c132
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When a wrong kallsym path is given, or the file could not be
parsed correctly, e.g. when it was empty, then we want to
notify the user about this. Otherwise, it may not be clear why
symbol resolution for kernel addresses is broken.
Change-Id: Icf51fa3038810e69a91d332a33495e7678b3977a
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
By default, both paths pointed to the current working directory.
When I use perfparser in a folder that has many (sub)directories and
files, it is excruciatingly slow:
145338.874552 task-clock (msec) # 0.997 CPUs utilized
13,353 context-switches # 0.092 K/sec
147 cpu-migrations # 0.001 K/sec
4,497 page-faults # 0.031 K/sec
557,953,349,806 cycles # 3.839 GHz
1,009,672,374,742 instructions # 1.81 insn per cycle
238,669,106,565 branches # 1642.156 M/sec
3,017,437,636 branch-misses # 1.26% of all branches
145.823501862 seconds time elapsed
This is on a SSD with an ext4 file system, but going through 104164
files for every mmap event is simply going to take its time.
This patch improves this situation by dropping the implicit recursive
lookup in the current working directory.
The performance impact is tremendous:
4425.928440 task-clock (msec) # 0.999 CPUs utilized
158 context-switches # 0.036 K/sec
1 cpu-migrations # 0.000 K/sec
3,299 page-faults # 0.745 K/sec
17,042,783,950 cycles # 3.851 GHz
36,178,866,218 instructions # 2.12 insn per cycle
8,448,978,802 branches # 1908.973 M/sec
63,738,579 branch-misses # 0.75% of all branches
4.432039578 seconds time elapsed
I.e. this patch makes this case more than 33 times faster.
Change-Id: I9a2c4e84ed739e1fc602be675bd01369b1c39f4c
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When we fail to find a matching ELF file for an mmap event, the
usefulness of the perfparser can be severly impacted:
- unwinding will not work
- symbol resolution will not work
- potentially other things will not work
As such, report an error to the user when this occurs.
Change-Id: I8a47f8725a29684ac11b24dadb20e669a45d3016
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|\|
| |
| |
| |
| |
| |
| | |
Conflicts:
app/perfsymboltable.cpp
Change-Id: I66e3a8aa490628246a507769daa32d69ec7b4bd3
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We always get many modules that overlap ld.so. We have to report them
when we find them, or otherwise the unwinding will fail. dwfl apparently
doesn't mind the overlap in this case, so we don't have to clear the
cache before.
Change-Id: I68e9f6fe1653073b555755f546e743621e8c7919
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We want to check the parent directory of the file, but that is already
implicit in the entryList() call, which will just return an empty list
if it doesn't exist.
Change-Id: I087ed4fdd6db66e6c02d8604af219c68b5280af7
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| | |
Adaptation to ade8449ea2.
Change-Id: Ic277a584140278905066194feaa4c8188c581c09
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| | |
Change-Id: I4e9114cc9f9adb0eb7a46dc30a9cbbda4c6dacda
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| | |
We don't need qmake to determine sizeof(long).
Change-Id: Iced95d685b4c82fb3925bb164691203501e395d9
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On some platform make is not called "make" and we want to explicitly
place the intermediate artifacts in OUT_PWD so that the next step can
find them without digging through the "Debug" or "Release" folders.
Change-Id: I4f9139b471030a57b7cab374cf0fe360be633b02
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The sample period equals the performance counter value since the
last sample, and is e.g. emitted also by `perf script`.
The sample weight is important for client application to correctly
attribute the sampling cost. I.e. it is not enough to just count the
number of total sample, but rather one must use the weighted number
of samples.
Change-Id: I052ae25dcca972320ca8601b3d821398c08401ad
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The default stays at 64, but it is now possible to unwind more
frames if desired.
Change-Id: I8da2ea340bf97678b2bbbd495b4864da0cf0fddc
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This enables us to parse perf.data files when the corresponding
elf objects do not exist anymore in the version that was used
at perf record time.
This also enables us to use `perf archive` to evaluate perf.data
files with perfparser on different machines.
Change-Id: Id7ac1af125dd3818dc86880f25a0f74d8d09bfc1
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Instead of reporting a random elf map, open the first elf manually
and pass that to dwfl_attach_state. This will then be used to
deduce the target architecture and thus gives us more control about
what is going on, without influencing the actual mappings.
A perf record for a runtime-attached profile could potentially contain
mmap events for files first. This happened when attaching to KDevelop,
which opens files internally using mmap. In such cases, we would try
to guess the architecture from a text file containing code in text
form, instead of a binary ELF, which of course did not work.
Instead, validate the ELF we pass to dwfl_attach_state to guess the
architecture via elf_kind. If that returns ELF_K_NONE, the file
is not a valid ELF object and can thus not be used to guess the
architecture. We simply silently skip this and try with the next
elf file we encounter, until it hopefully works.
Change-Id: I00ec7fa1da669c4b5ed9156654818b64bdf050ef
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that we sort the mmap and sample events by time, we can assume
that we will only look for mappings based on the current time, i.e.
the time of the last added elf mapping. This simplifies the code
a bit further and would allow us to optimize it later, if need be.
Note that this theoretically breaks the handling of samples that
violate the time ordering across FINISHED_ROUND events. This was
broken earlier already when we removed the overwritten elf mappings.
And note how this does not pose any real-world issues in my tests.
Change-Id: I24f14afdf17cf5d4f7dcb5440dc04d02f591fcb8
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This cache is not necessary, as the information is cached within
QFileInfo already. Removing it decreases the ElfInfo size by 8 byte.
Now that we store the mappings in a vector, this actually improves
the performance more than caching the isFile() value again.
Change-Id: I6bf2cc7a165f3a00d4e42dcd0922d126c40987fa
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that we handle the elf map events in a time-ordered fashion,
we can greatly optimize the map by removing outdated information.
This happens extremely often in the real-world, when the heap map
grows over time when the perf client application allocates a lot
of memory.
Note that this patch could potentially result in no mapping getting
returned for buggy samples that violate the time order across
FINISHED_ROUND events.
This patch dramatically improves the performance of perfparser for
real-world applications for me. Before, I measured the following
numbers for perfparser:
Performance counter stats for './lib/hotspot/libexec/hotspot-perfparser --input perf.data.heaptrack.normal --output /dev/null':
73063.166128 task-clock:u (msec) # 0.998 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
29,923 page-faults:u # 0.410 K/sec
223,352,922,905 cycles:u # 3.057 GHz
138,541,646,788 instructions:u # 0.62 insn per cycle
39,581,056,564 branches:u # 541.737 M/sec
242,935,966 branch-misses:u # 0.61% of all branches
73.242094659 seconds time elapsed
Performance counter stats for './lib/hotspot/libexec/hotspot-perfparser --input perf.data.heaptrack.stream --output /dev/null':
137772.664268 task-clock:u (msec) # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
28,393 page-faults:u # 0.206 K/sec
398,086,328,511 cycles:u # 2.889 GHz
164,041,587,534 instructions:u # 0.41 insn per cycle
51,009,555,879 branches:u # 370.244 M/sec
350,270,354 branch-misses:u # 0.69% of all branches
137.855666379 seconds time elapsed
Now, this goes down to:
Performance counter stats for './lib/hotspot/libexec/hotspot-perfparser --input perf.data.heaptrack.normal --output /dev/null':
9253.921384 task-clock:u (msec) # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
29,627 page-faults:u # 0.003 M/sec
28,099,608,231 cycles:u # 3.037 GHz
73,635,879,583 instructions:u # 2.62 insn per cycle
17,114,401,461 branches:u # 1849.422 M/sec
96,904,616 branch-misses:u # 0.57% of all branches
9.266196437 seconds time elapsed
Performance counter stats for './lib/hotspot/libexec/hotspot-perfparser --input perf.data.heaptrack.stream --output /dev/null':
8331.098618 task-clock:u (msec) # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
26,379 page-faults:u # 0.003 M/sec
25,206,589,319 cycles:u # 3.026 GHz
65,041,985,552 instructions:u # 2.58 insn per cycle
15,106,429,469 branches:u # 1813.258 M/sec
82,649,273 branch-misses:u # 0.55% of all branches
8.343114295 seconds time elapsed
Similarly, the benchmarks also dramatically improve. Before I measured:
********* Start testing of TestElfMap *********
PASS : TestElfMap::benchRegisterElfExpanding(10)
RESULT : TestElfMap::benchRegisterElfExpanding():"10":
0.0018 msecs per iteration (total: 59, iterations: 32768)
PASS : TestElfMap::benchRegisterElfExpanding(100)
RESULT : TestElfMap::benchRegisterElfExpanding():"100":
0.024 msecs per iteration (total: 51, iterations: 2048)
PASS : TestElfMap::benchRegisterElfExpanding(1000)
RESULT : TestElfMap::benchRegisterElfExpanding():"1000":
1.3 msecs per iteration (total: 85, iterations: 64)
PASS : TestElfMap::benchRegisterElfExpanding(2000)
RESULT : TestElfMap::benchRegisterElfExpanding():"2000":
5.6 msecs per iteration (total: 91, iterations: 16)
PASS : TestElfMap::benchFindElfDisjunct(10)
RESULT : TestElfMap::benchFindElfDisjunct():"10":
0.0028 msecs per iteration (total: 92, iterations: 32768)
PASS : TestElfMap::benchFindElfDisjunct(100)
RESULT : TestElfMap::benchFindElfDisjunct():"100":
0.031 msecs per iteration (total: 64, iterations: 2048)
PASS : TestElfMap::benchFindElfDisjunct(1000)
RESULT : TestElfMap::benchFindElfDisjunct():"1000":
0.38 msecs per iteration (total: 98, iterations: 256)
PASS : TestElfMap::benchFindElfDisjunct(2000)
RESULT : TestElfMap::benchFindElfDisjunct():"2000":
0.789 msecs per iteration (total: 101, iterations: 128)
PASS : TestElfMap::benchFindElfOverlapping(10)
RESULT : TestElfMap::benchFindElfOverlapping():"10":
0.0029 msecs per iteration (total: 98, iterations: 32768)
PASS : TestElfMap::benchFindElfOverlapping(100)
RESULT : TestElfMap::benchFindElfOverlapping():"100":
0.035 msecs per iteration (total: 72, iterations: 2048)
PASS : TestElfMap::benchFindElfOverlapping(1000)
RESULT : TestElfMap::benchFindElfOverlapping():"1000":
0.40 msecs per iteration (total: 52, iterations: 128)
PASS : TestElfMap::benchFindElfOverlapping(2000)
RESULT : TestElfMap::benchFindElfOverlapping():"2000":
0.82 msecs per iteration (total: 53, iterations: 64)
PASS : TestElfMap::benchFindElfExpanding(10)
RESULT : TestElfMap::benchFindElfExpanding():"10":
0.0034 msecs per iteration (total: 57, iterations: 16384)
PASS : TestElfMap::benchFindElfExpanding(100)
RESULT : TestElfMap::benchFindElfExpanding():"100":
0.11 msecs per iteration (total: 59, iterations: 512)
PASS : TestElfMap::benchFindElfExpanding(1000)
RESULT : TestElfMap::benchFindElfExpanding():"1000":
10 msecs per iteration (total: 80, iterations: 8)
PASS : TestElfMap::benchFindElfExpanding(2000)
RESULT : TestElfMap::benchFindElfExpanding():"2000":
52.0 msecs per iteration (total: 104, iterations: 2)
Totals: 33 passed, 0 failed, 0 skipped, 0 blacklisted, 7198ms
********* Finished testing of TestElfMap *********
Now, this goes down to:
********* Start testing of TestElfMap *********
Config: Using QtTest library 5.8.0, Qt 5.8.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 6.3.1 20170109)
PASS : TestElfMap::benchRegisterElfDisjunct(10)
RESULT : TestElfMap::benchRegisterElfDisjunct():"10":
0.0016 msecs per iteration (total: 54, iterations: 32768)
PASS : TestElfMap::benchRegisterElfDisjunct(100)
RESULT : TestElfMap::benchRegisterElfDisjunct():"100":
0.018 msecs per iteration (total: 74, iterations: 4096)
PASS : TestElfMap::benchRegisterElfDisjunct(1000)
RESULT : TestElfMap::benchRegisterElfDisjunct():"1000":
0.53 msecs per iteration (total: 68, iterations: 128)
PASS : TestElfMap::benchRegisterElfDisjunct(2000)
RESULT : TestElfMap::benchRegisterElfDisjunct():"2000":
1.9 msecs per iteration (total: 62, iterations: 32)
PASS : TestElfMap::benchRegisterElfOverlapping(10)
RESULT : TestElfMap::benchRegisterElfOverlapping():"10":
0.0023 msecs per iteration (total: 76, iterations: 32768)
PASS : TestElfMap::benchRegisterElfOverlapping(100)
RESULT : TestElfMap::benchRegisterElfOverlapping():"100":
0.025 msecs per iteration (total: 52, iterations: 2048)
PASS : TestElfMap::benchRegisterElfOverlapping(1000)
RESULT : TestElfMap::benchRegisterElfOverlapping():"1000":
0.59 msecs per iteration (total: 76, iterations: 128)
PASS : TestElfMap::benchRegisterElfOverlapping(2000)
RESULT : TestElfMap::benchRegisterElfOverlapping():"2000":
2.0 msecs per iteration (total: 66, iterations: 32)
PASS : TestElfMap::benchRegisterElfExpanding(10)
RESULT : TestElfMap::benchRegisterElfExpanding():"10":
0.0015 msecs per iteration (total: 52, iterations: 32768)
PASS : TestElfMap::benchRegisterElfExpanding(100)
RESULT : TestElfMap::benchRegisterElfExpanding():"100":
0.015 msecs per iteration (total: 65, iterations: 4096)
PASS : TestElfMap::benchRegisterElfExpanding(1000)
RESULT : TestElfMap::benchRegisterElfExpanding():"1000":
0.15 msecs per iteration (total: 81, iterations: 512)
PASS : TestElfMap::benchRegisterElfExpanding(2000)
RESULT : TestElfMap::benchRegisterElfExpanding():"2000":
0.31 msecs per iteration (total: 81, iterations: 256)
PASS : TestElfMap::benchFindElfDisjunct(10)
RESULT : TestElfMap::benchFindElfDisjunct():"10":
0.0028 msecs per iteration (total: 93, iterations: 32768)
PASS : TestElfMap::benchFindElfDisjunct(100)
RESULT : TestElfMap::benchFindElfDisjunct():"100":
0.031 msecs per iteration (total: 65, iterations: 2048)
PASS : TestElfMap::benchFindElfDisjunct(1000)
RESULT : TestElfMap::benchFindElfDisjunct():"1000":
0.38 msecs per iteration (total: 99, iterations: 256)
PASS : TestElfMap::benchFindElfDisjunct(2000)
RESULT : TestElfMap::benchFindElfDisjunct():"2000":
0.79 msecs per iteration (total: 51, iterations: 64)
PASS : TestElfMap::benchFindElfOverlapping(10)
RESULT : TestElfMap::benchFindElfOverlapping():"10":
0.0028 msecs per iteration (total: 93, iterations: 32768)
PASS : TestElfMap::benchFindElfOverlapping(100)
RESULT : TestElfMap::benchFindElfOverlapping():"100":
0.031 msecs per iteration (total: 64, iterations: 2048)
PASS : TestElfMap::benchFindElfOverlapping(1000)
RESULT : TestElfMap::benchFindElfOverlapping():"1000":
0.39 msecs per iteration (total: 51, iterations: 128)
PASS : TestElfMap::benchFindElfOverlapping(2000)
RESULT : TestElfMap::benchFindElfOverlapping():"2000":
0.79 msecs per iteration (total: 51, iterations: 64)
PASS : TestElfMap::benchFindElfExpanding(10)
RESULT : TestElfMap::benchFindElfExpanding():"10":
0.0032 msecs per iteration (total: 53, iterations: 16384)
PASS : TestElfMap::benchFindElfExpanding(100)
RESULT : TestElfMap::benchFindElfExpanding():"100":
0.032 msecs per iteration (total: 67, iterations: 2048)
PASS : TestElfMap::benchFindElfExpanding(1000)
RESULT : TestElfMap::benchFindElfExpanding():"1000":
0.32 msecs per iteration (total: 84, iterations: 256)
PASS : TestElfMap::benchFindElfExpanding(2000)
RESULT : TestElfMap::benchFindElfExpanding():"2000":
0.65 msecs per iteration (total: 84, iterations: 128)
Totals: 33 passed, 0 failed, 0 skipped, 0 blacklisted, 6747ms
********* Finished testing of TestElfMap *********
Change-Id: I6eaca5d6561dcdb0cee0d3aed4eec8f0f6c9c9a3
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This improves the performance of the benchmarks considerably:
********* Start testing of TestElfMap *********
Config: Using QtTest library 5.8.0, Qt 5.8.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 6.3.1 20170109)
PASS : TestElfMap::benchRegisterElfDisjunct(10)
RESULT : TestElfMap::benchRegisterElfDisjunct():"10":
0.0017 msecs per iteration (total: 57, iterations: 32768)
PASS : TestElfMap::benchRegisterElfDisjunct(100)
RESULT : TestElfMap::benchRegisterElfDisjunct():"100":
0.018 msecs per iteration (total: 74, iterations: 4096)
PASS : TestElfMap::benchRegisterElfDisjunct(1000)
RESULT : TestElfMap::benchRegisterElfDisjunct():"1000":
0.51 msecs per iteration (total: 66, iterations: 128)
PASS : TestElfMap::benchRegisterElfDisjunct(2000)
RESULT : TestElfMap::benchRegisterElfDisjunct():"2000":
1.8 msecs per iteration (total: 58, iterations: 32)
PASS : TestElfMap::benchRegisterElfOverlapping(10)
RESULT : TestElfMap::benchRegisterElfOverlapping():"10":
0.0024 msecs per iteration (total: 81, iterations: 32768)
PASS : TestElfMap::benchRegisterElfOverlapping(100)
RESULT : TestElfMap::benchRegisterElfOverlapping():"100":
0.033 msecs per iteration (total: 68, iterations: 2048)
PASS : TestElfMap::benchRegisterElfOverlapping(1000)
RESULT : TestElfMap::benchRegisterElfOverlapping():"1000":
1.4 msecs per iteration (total: 95, iterations: 64)
PASS : TestElfMap::benchRegisterElfOverlapping(2000)
RESULT : TestElfMap::benchRegisterElfOverlapping():"2000":
5.4 msecs per iteration (total: 87, iterations: 16)
PASS : TestElfMap::benchRegisterElfExpanding(10)
RESULT : TestElfMap::benchRegisterElfExpanding():"10":
0.0018 msecs per iteration (total: 59, iterations: 32768)
PASS : TestElfMap::benchRegisterElfExpanding(100)
RESULT : TestElfMap::benchRegisterElfExpanding():"100":
0.024 msecs per iteration (total: 51, iterations: 2048)
PASS : TestElfMap::benchRegisterElfExpanding(1000)
RESULT : TestElfMap::benchRegisterElfExpanding():"1000":
1.3 msecs per iteration (total: 85, iterations: 64)
PASS : TestElfMap::benchRegisterElfExpanding(2000)
RESULT : TestElfMap::benchRegisterElfExpanding():"2000":
5.6 msecs per iteration (total: 91, iterations: 16)
PASS : TestElfMap::benchFindElfDisjunct(10)
RESULT : TestElfMap::benchFindElfDisjunct():"10":
0.0028 msecs per iteration (total: 92, iterations: 32768)
PASS : TestElfMap::benchFindElfDisjunct(100)
RESULT : TestElfMap::benchFindElfDisjunct():"100":
0.031 msecs per iteration (total: 64, iterations: 2048)
PASS : TestElfMap::benchFindElfDisjunct(1000)
RESULT : TestElfMap::benchFindElfDisjunct():"1000":
0.38 msecs per iteration (total: 98, iterations: 256)
PASS : TestElfMap::benchFindElfDisjunct(2000)
RESULT : TestElfMap::benchFindElfDisjunct():"2000":
0.789 msecs per iteration (total: 101, iterations: 128)
PASS : TestElfMap::benchFindElfOverlapping(10)
RESULT : TestElfMap::benchFindElfOverlapping():"10":
0.0029 msecs per iteration (total: 98, iterations: 32768)
PASS : TestElfMap::benchFindElfOverlapping(100)
RESULT : TestElfMap::benchFindElfOverlapping():"100":
0.035 msecs per iteration (total: 72, iterations: 2048)
PASS : TestElfMap::benchFindElfOverlapping(1000)
RESULT : TestElfMap::benchFindElfOverlapping():"1000":
0.40 msecs per iteration (total: 52, iterations: 128)
PASS : TestElfMap::benchFindElfOverlapping(2000)
RESULT : TestElfMap::benchFindElfOverlapping():"2000":
0.82 msecs per iteration (total: 53, iterations: 64)
PASS : TestElfMap::benchFindElfExpanding(10)
RESULT : TestElfMap::benchFindElfExpanding():"10":
0.0034 msecs per iteration (total: 57, iterations: 16384)
PASS : TestElfMap::benchFindElfExpanding(100)
RESULT : TestElfMap::benchFindElfExpanding():"100":
0.11 msecs per iteration (total: 59, iterations: 512)
PASS : TestElfMap::benchFindElfExpanding(1000)
RESULT : TestElfMap::benchFindElfExpanding():"1000":
10 msecs per iteration (total: 80, iterations: 8)
PASS : TestElfMap::benchFindElfExpanding(2000)
RESULT : TestElfMap::benchFindElfExpanding():"2000":
52.0 msecs per iteration (total: 104, iterations: 2)
Totals: 33 passed, 0 failed, 0 skipped, 0 blacklisted, 7198ms
********* Finished testing of TestElfMap *********
Without this patch, the numbers on my machine here were:
********* Start testing of TestElfMap *********
Config: Using QtTest library 5.8.0, Qt 5.8.0 (x86_64-little_endian-lp64 shared (dynamic) release build; by GCC 6.3.1 20170109)
PASS : TestElfMap::benchRegisterElfDisjunct(10)
RESULT : TestElfMap::benchRegisterElfDisjunct():"10":
0.0018 msecs per iteration (total: 59, iterations: 32768)
PASS : TestElfMap::benchRegisterElfDisjunct(100)
RESULT : TestElfMap::benchRegisterElfDisjunct():"100":
0.038 msecs per iteration (total: 79, iterations: 2048)
PASS : TestElfMap::benchRegisterElfDisjunct(1000)
RESULT : TestElfMap::benchRegisterElfDisjunct():"1000":
3.7 msecs per iteration (total: 60, iterations: 16)
PASS : TestElfMap::benchRegisterElfDisjunct(2000)
RESULT : TestElfMap::benchRegisterElfDisjunct():"2000":
20 msecs per iteration (total: 80, iterations: 4)
PASS : TestElfMap::benchRegisterElfOverlapping(10)
RESULT : TestElfMap::benchRegisterElfOverlapping():"10":
0.0037 msecs per iteration (total: 61, iterations: 16384)
PASS : TestElfMap::benchRegisterElfOverlapping(100)
RESULT : TestElfMap::benchRegisterElfOverlapping():"100":
0.085 msecs per iteration (total: 88, iterations: 1024)
PASS : TestElfMap::benchRegisterElfOverlapping(1000)
RESULT : TestElfMap::benchRegisterElfOverlapping():"1000":
9.3 msecs per iteration (total: 75, iterations: 8)
PASS : TestElfMap::benchRegisterElfOverlapping(2000)
RESULT : TestElfMap::benchRegisterElfOverlapping():"2000":
34.5 msecs per iteration (total: 138, iterations: 4)
PASS : TestElfMap::benchRegisterElfExpanding(10)
RESULT : TestElfMap::benchRegisterElfExpanding():"10":
0.0018 msecs per iteration (total: 60, iterations: 32768)
PASS : TestElfMap::benchRegisterElfExpanding(100)
RESULT : TestElfMap::benchRegisterElfExpanding():"100":
0.042 msecs per iteration (total: 87, iterations: 2048)
PASS : TestElfMap::benchRegisterElfExpanding(1000)
RESULT : TestElfMap::benchRegisterElfExpanding():"1000":
4.1 msecs per iteration (total: 67, iterations: 16)
PASS : TestElfMap::benchRegisterElfExpanding(2000)
RESULT : TestElfMap::benchRegisterElfExpanding():"2000":
21 msecs per iteration (total: 86, iterations: 4)
PASS : TestElfMap::benchFindElfDisjunct(10)
RESULT : TestElfMap::benchFindElfDisjunct():"10":
0.0027 msecs per iteration (total: 91, iterations: 32768)
PASS : TestElfMap::benchFindElfDisjunct(100)
RESULT : TestElfMap::benchFindElfDisjunct():"100":
0.031 msecs per iteration (total: 64, iterations: 2048)
PASS : TestElfMap::benchFindElfDisjunct(1000)
RESULT : TestElfMap::benchFindElfDisjunct():"1000":
0.39 msecs per iteration (total: 51, iterations: 128)
PASS : TestElfMap::benchFindElfDisjunct(2000)
RESULT : TestElfMap::benchFindElfDisjunct():"2000":
0.82 msecs per iteration (total: 53, iterations: 64)
PASS : TestElfMap::benchFindElfOverlapping(10)
RESULT : TestElfMap::benchFindElfOverlapping():"10":
0.0031 msecs per iteration (total: 51, iterations: 16384)
PASS : TestElfMap::benchFindElfOverlapping(100)
RESULT : TestElfMap::benchFindElfOverlapping():"100":
0.039 msecs per iteration (total: 81, iterations: 2048)
PASS : TestElfMap::benchFindElfOverlapping(1000)
RESULT : TestElfMap::benchFindElfOverlapping():"1000":
0.46 msecs per iteration (total: 60, iterations: 128)
PASS : TestElfMap::benchFindElfOverlapping(2000)
RESULT : TestElfMap::benchFindElfOverlapping():"2000":
1.0 msecs per iteration (total: 64, iterations: 64)
PASS : TestElfMap::benchFindElfExpanding(10)
RESULT : TestElfMap::benchFindElfExpanding():"10":
0.0059 msecs per iteration (total: 98, iterations: 16384)
PASS : TestElfMap::benchFindElfExpanding(100)
RESULT : TestElfMap::benchFindElfExpanding():"100":
0.67 msecs per iteration (total: 87, iterations: 128)
PASS : TestElfMap::benchFindElfExpanding(1000)
RESULT : TestElfMap::benchFindElfExpanding():"1000":
131 msecs per iteration (total: 131, iterations: 1)
PASS : TestElfMap::benchFindElfExpanding(2000)
RESULT : TestElfMap::benchFindElfExpanding():"2000":
685 msecs per iteration (total: 685, iterations: 1)
Totals: 33 passed, 0 failed, 0 skipped, 0 blacklisted, 9217ms
********* Finished testing of TestElfMap *********
Change-Id: I7b39275960cbb709b60b2b441751077117ccc304
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Because the mmap events are now added in time order, we can simplify
the implementation of registerElf as we do not longer need to account
for older events getting added.
Change-Id: I131ca75fcb52e6e1f4238470f276f34a13bea537
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
According to the perf file format documentation, events can
arrive in out-of-time-order. But no time-reordering should occur across
a FINISHED_ROUND event. This pseudo-event was added to perf in Linux
v2.6.35, released in 2010.
But, sadly, it is only being used for non-tracing perf data files since
Linux 3.17 from around 2014. Additionally, time order violations across
the FINISHED_ROUND event can be observed even for data files obtained
today.
As such, we cannot always use FINISHED_ROUND to order our events. Instead,
we use the following heuristic:
- we buffer both samples and mmap events
- when the combined size of both buffers exceeds a certain threshold,
defined via the new --buffer-size CLI argument which defaults to 10MB,
we flush the buffers:
-- we sort samples and events by time
-- then we iterate over the samples and handle all mmap events before
the sample time
-- we stop flushing the buffers when we handled half of the buffers
by size, i.e. when we only have 5MB left
- then we continue to buffer until we reach the threshold the next time
- when we finish, we flush the full buffer
This heuristic is only applied until we know that we can actually rely on
FINISHED_ROUND events. This is the case when:
- the user passed the CLI argument --buffer-size 0
- when we encountered a FINISHED_ROUND event
- when we encountered a PerfFeatures event that tells us the perf that
was used to record the data file is newer than 3.17
When we rely on FINISHED_ROUND events, we still only ever analyze half of
our buffers, to work-around the upstream issues in perf that lead to
time order violations across FINISHED_ROUND events.
While somewhat complicated, this patch allows us to simplify the elf map
significantly. This is done in the follow-up commits.
The statistics mode is further extended, which shows us how this new
behavior plays out in terms of memory consumption.
For a normal perf file where we rely on the finished round events we
observe the following values:
~~~~~~~~~~
samples: 20260
mmaps: 24331
rounds: 957
buffer flushes: 958
samples time violations: 0
mmaps time violations: 0
max samples per round: 75
max mmaps per round: 474
max samples per flush: 51
max mmaps per flush: 429
max buffer size: 847328
max total event size per round: 647040
max time: 1143246068673300
max time between rounds: 739096862
max reorder time: 738811755
~~~~~~~~~~
For a perf file that was recorded with `-m 8192`, we instead see the
following statistics. Note how the buffer size automatically is sized
to fit the actual round size, and no time violations occur:
~~~~~~~~~~
samples: 20068
mmaps: 24341
rounds: 24
buffer flushes: 25
samples time violations: 0
mmaps time violations: 0
max samples per round: 3577
max mmaps per round: 6103
max samples per flush: 1893
max mmaps per flush: 2910
max buffer size: 31946624
max total event size per round: 30106704
max time: 1141236144717550
max time between rounds: 965150979
max reorder time: 964957358
~~~~~~~~~~
When we parse a file in the streaming format, we cannot know at the
beginning that we should rely on the finished round events. Then
the statistics for a normal record look like this:
~~~~~~~~~~
samples: 20303
mmaps: 24339
rounds: 1029
buffer flushes: 1029
samples time violations: 0
mmaps time violations: 0
max samples per round: 98
max mmaps per round: 489
max samples per flush: 61
max mmaps per flush: 610
max buffer size: 1144496
max total event size per round: 863472
max time: 1143595838784853
max time between rounds: 554771976
max reorder time: 13834481
~~~~~~~~~~
If the record was done with `-m 8192` we instead observe some time
order violations at the beginning which could be work-arounded by
passing a larger buffer size. Once we encounter the first finished round
event, we follow those and do not suffer from time violations anymore:
~~~~~~~~~~
samples: 19854
mmaps: 24338
rounds: 21
buffer flushes: 24
samples time violations: 465
mmaps time violations: 395
max samples per round: 5115
max mmaps per round: 6316
max samples per flush: 2700
max mmaps per flush: 5364
max buffer size: 46458392
max total event size per round: 43698384
max time: 1143585885779204
max time between rounds: 912757434
max reorder time: 908114269
~~~~~~~~~~
When we parse a perf.data file without FINISHED_ROUND events, we get
for a normal file:
~~~~~~~~~~
samples: 20303
mmaps: 24339
rounds: 1
buffer flushes: 33
samples time violations: 0
mmaps time violations: 0
max samples per round: 20303
max mmaps per round: 24339
max samples per flush: 654
max mmaps per flush: 3004
max buffer size: 10494104
max total event size per round: 173458680
max time: 1143595838784853
max time between rounds: 0
max reorder time: 13834481
~~~~~~~~~~
If the file has huge buffers, i.e. again `-m 8192` was passed to perf record,
we instead see:
~~~~~~~~~~
samples: 19854
mmaps: 24338
rounds: 1
buffer flushes: 32
samples time violations: 4859
mmaps time violations: 3586
max samples per round: 19854
max mmaps per round: 24338
max samples per flush: 817
max mmaps per flush: 3089
max buffer size: 10493280
max total event size per round: 169569288
max time: 1143585885779204
max time between rounds: 0
max reorder time: 908114269
~~~~~~~~~~
Change-Id: I756c4cccf75b4ce0179e965996f8b821bf60e3dd
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This silences the warnings on the command line:
unhandled event type 68
Additionally, these events are used to compute some more statistic
values that are useful as a baseline for values we can use in
future heuristics:
~~~~~~~~~~
samples: 20993
mmaps: 24331
rounds: 347
max samples per round: 311
max mmaps per round: 781
max buffer size: 1057128
max total event size per round: 2632552
max time: 629013777317625
max time between rounds: 453996043
max reorder time: 375052737
~~~~~~~~~~
Change-Id: I7e087410ee5551ce66d2bcee223ec57530bcf58d
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
To find fitting values for some magic heuristic values we will need
in the follow-up commits to sort events by time, this mode analyzes
a data file and computes some statistics. Unwinding and similarly
expensive operations are disabled in the `--print-stats` mode. The
output of this mode is e.g.:
~~~~~~~~~~~~~~
samples: 20210
mmaps: 24330
max buffer size: 1057128
max time: 628782799569406
max reorder time: 376374129
~~~~~~~~~~~~~~
Change-Id: I5d1344618925502b08ba303239a75d9945d965e7
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|\|
| |
| |
| | |
Change-Id: I210e819f30185a0f8d4ad3bc7d35e8d4d7593cbd
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
So far, only the streamed PerfRecordAttr events updated the mapping
of the attribute ids to the internal ids. Now, we also do this
for the global attributes contained within the PerfFeatures.
This fixes the resolution of the attribute ids for samples in a
non-streamed perf.data file.
Change-Id: I1fabb99727d70e3a1c237691ecd4b7421d76a44e
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Instead of leaking the implementation details such as the QMap
iterators in the API, always return an ElfInfo struct. This
requires us to add the address to the ElfInfo, essentially duplicating
the data that is already used for the QMap key. But this only
marginally increases memory consumption, and does not decrease
performance significantly. Also, future commits to improve the
performance will probably require this anyways.
The code in the symbol table that so far relied on accessing the
mapping internals directly via the iterators is moved into the elfmap
directly. This also allows us to test this part of the code, and
enables us to hide the internals. One exception is the iteration
over all elf infos, which is required to guess the target
architecture. This is still possible, but now uses a simpler foreach
loop over all elf infos, which also makes it possible to change the
underlying data type seamlessly in the future.
To simplify the testing process, ElfInfo also gets a proper
QDebug streaming operator as well as a QTest::toString overload.
The test is updated accordingly, to leverage this new API.
Furthermore, the testing code is simplified by removing the
boolean found parameter in the ElfInfo ctor. Instead, it is now
initialized by calling `file.isFile()` internally.
To show that the performance impact of this change is negleglible,
please compare the following benchmark results to those of the
previous two commits:
PASS : TestElfMap::benchRegisterElfDisjunct(10)
RESULT : TestElfMap::benchRegisterElfDisjunct():"10":
0.0017 msecs per iteration (total: 58, iterations: 32768)
PASS : TestElfMap::benchRegisterElfDisjunct(100)
RESULT : TestElfMap::benchRegisterElfDisjunct():"100":
0.037 msecs per iteration (total: 77, iterations: 2048)
PASS : TestElfMap::benchRegisterElfDisjunct(1000)
RESULT : TestElfMap::benchRegisterElfDisjunct():"1000":
3.8 msecs per iteration (total: 61, iterations: 16)
PASS : TestElfMap::benchRegisterElfDisjunct(2000)
RESULT : TestElfMap::benchRegisterElfDisjunct():"2000":
21 msecs per iteration (total: 85, iterations: 4)
PASS : TestElfMap::benchRegisterElfOverlapping(10)
RESULT : TestElfMap::benchRegisterElfOverlapping():"10":
0.0040 msecs per iteration (total: 66, iterations: 16384)
PASS : TestElfMap::benchRegisterElfOverlapping(100)
RESULT : TestElfMap::benchRegisterElfOverlapping():"100":
0.086 msecs per iteration (total: 89, iterations: 1024)
PASS : TestElfMap::benchRegisterElfOverlapping(1000)
RESULT : TestElfMap::benchRegisterElfOverlapping():"1000":
9.7 msecs per iteration (total: 78, iterations: 8)
PASS : TestElfMap::benchRegisterElfOverlapping(2000)
RESULT : TestElfMap::benchRegisterElfOverlapping():"2000":
35.2 msecs per iteration (total: 141, iterations: 4)
PASS : TestElfMap::benchRegisterElfExpanding(10)
RESULT : TestElfMap::benchRegisterElfExpanding():"10":
0.0019 msecs per iteration (total: 63, iterations: 32768)
PASS : TestElfMap::benchRegisterElfExpanding(100)
RESULT : TestElfMap::benchRegisterElfExpanding():"100":
0.043 msecs per iteration (total: 90, iterations: 2048)
PASS : TestElfMap::benchRegisterElfExpanding(1000)
RESULT : TestElfMap::benchRegisterElfExpanding():"1000":
4.6 msecs per iteration (total: 74, iterations: 16)
PASS : TestElfMap::benchRegisterElfExpanding(2000)
RESULT : TestElfMap::benchRegisterElfExpanding():"2000":
22 msecs per iteration (total: 91, iterations: 4)
PASS : TestElfMap::benchFindElfDisjunct(10)
RESULT : TestElfMap::benchFindElfDisjunct():"10":
0.0029 msecs per iteration (total: 98, iterations: 32768)
PASS : TestElfMap::benchFindElfDisjunct(100)
RESULT : TestElfMap::benchFindElfDisjunct():"100":
0.031 msecs per iteration (total: 65, iterations: 2048)
PASS : TestElfMap::benchFindElfDisjunct(1000)
RESULT : TestElfMap::benchFindElfDisjunct():"1000":
0.40 msecs per iteration (total: 52, iterations: 128)
PASS : TestElfMap::benchFindElfDisjunct(2000)
RESULT : TestElfMap::benchFindElfDisjunct():"2000":
0.85 msecs per iteration (total: 55, iterations: 64)
PASS : TestElfMap::benchFindElfOverlapping(10)
RESULT : TestElfMap::benchFindElfOverlapping():"10":
0.0032 msecs per iteration (total: 53, iterations: 16384)
PASS : TestElfMap::benchFindElfOverlapping(100)
RESULT : TestElfMap::benchFindElfOverlapping():"100":
0.041 msecs per iteration (total: 85, iterations: 2048)
PASS : TestElfMap::benchFindElfOverlapping(1000)
RESULT : TestElfMap::benchFindElfOverlapping():"1000":
0.48 msecs per iteration (total: 62, iterations: 128)
PASS : TestElfMap::benchFindElfOverlapping(2000)
RESULT : TestElfMap::benchFindElfOverlapping():"2000":
1.0 msecs per iteration (total: 65, iterations: 64)
PASS : TestElfMap::benchFindElfExpanding(10)
RESULT : TestElfMap::benchFindElfExpanding():"10":
0.0060 msecs per iteration (total: 99, iterations: 16384)
PASS : TestElfMap::benchFindElfExpanding(100)
RESULT : TestElfMap::benchFindElfExpanding():"100":
0.67 msecs per iteration (total: 87, iterations: 128)
PASS : TestElfMap::benchFindElfExpanding(1000)
RESULT : TestElfMap::benchFindElfExpanding():"1000":
120 msecs per iteration (total: 120, iterations: 1)
PASS : TestElfMap::benchFindElfExpanding(2000)
RESULT : TestElfMap::benchFindElfExpanding():"2000":
696 msecs per iteration (total: 696, iterations: 1)
Change-Id: Id48eb38cc8615b6fa08e84bc4bb6d342b58290b4
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The previous default of .debug/ tried to look for a folder called
.debug in the current working directory. That usually does not exist,
the .debug cache folder usually resides in your home folder. This
path is now searched by default.
Change-Id: I86f86743d2b3bde00dd210ef7802c8079d27b5ce
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| | |
Change-Id: I30b1cf22e69989ac96b69c0497d0ba211bfc4a13
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|\|
| |
| |
| | |
Change-Id: I63447a15bc912c00dfd1f003ffb955a96fae77d2
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
They are called "$a" or "$t", not "$at". Also, dwfl might detect
more or less valid frames underneath the veneer. The LR method is
generally better, though. So, try both and use the one that results
in a longer stack.
Change-Id: I3c60640649d200bd9db7744f3a5d6610784a4d28
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
|\|
| |
| |
| | |
Change-Id: I34eda923b93e7c0e755b4bc210a1d7d2402a221c
|
| |
| |
| |
| |
| | |
Change-Id: Ic838d60269159f792f38e87322e84ab3c1be886d
Reviewed-by: Joerg Bornemann <joerg.bornemann@qt.io>
|