| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Change-Id: I796ad4cb92de828d999f35b55ca8d94879230d2f
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: I6e9453a15cdd71e07c52677c707e6a02737e70a5
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
|
|
|
|
|
|
|
| |
Fix that compilation of the app did not find Qt headers pulled
in via library headers
Change-Id: I21c5104ae9ae58b7c5f55fe16e6723a9df9ebce8
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: I9ec73226ba0309f244038708cb85d2ae9f3aab30
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The symbol table isn't necessarily sorted, and thus repeated lookups
in there can be expensive when a DSO has many entries in its symtab.
For example, the librustc_driver from rustc 1.40.0 has about 202594
symbols. A single call to dwfl_module_addrinfo can take milliseconds
on my laptop. Every time we get a sample at a so far unknown address,
we have to find the corresponding symbol. So we called this function
a lot, which can add up to a significant amount of time. Now, we
cache the symbol name and its offset and size information in a sorted
list and try to lookup the symbol there quickly. The impact of this
patch on the overall time required to analyze a ~1GB perf.data file
for a `cargo build` process (and it's child processes) is huge:
before:
```
447.681,66 msec task-clock:u # 0,989 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
45.214 page-faults:u # 0,101 K/sec
1.272.289.956.854 cycles:u # 2,842 GHz
3.497.255.264.964 instructions:u # 2,75 insn per cycle
863.671.557.196 branches:u # 1929,209 M/sec
2.666.320.642 branch-misses:u # 0,31% of all branches
452,806895428 seconds time elapsed
441,996666000 seconds user
2,557237000 seconds sys
```
after:
```
63.770,08 msec task-clock:u # 0,995 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
35.102 page-faults:u # 0,550 K/sec
191.267.750.628 cycles:u # 2,999 GHz
501.316.536.714 instructions:u # 2,62 insn per cycle
122.234.405.333 branches:u # 1916,799 M/sec
443.671.470 branch-misses:u # 0,36% of all branches
64,063443896 seconds time elapsed
62,188041000 seconds user
1,136533000 seconds sys
```
That means we are now roughly 7x faster than before.
Fixes: https://github.com/KDAB/hotspot/issues/225
Change-Id: Ib7dbc800c9372044a847de68a8459dd7f7b0d3da
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we profile a multi-process ensemble, it will often happen that
we encounter samples at the relative address of a DSO. In such cases,
we can leverage a central cache to store the information, instead of
recomputing the same data for every process.
As an example, I wrote a shell script that runs the same process four
times in parallel. When I parse the resulting perf.data file, the perf
stat results are as follows:
before:
```
Performance counter stats for '/home/milian/projects/compiled/other/lib/libexec/hotspot-perfparser --input ./perf.data --output /dev/null':
4.240,50 msec task-clock:u # 0,956 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
17.389 page-faults:u # 0,004 M/sec
11.195.771.907 cycles:u # 2,640 GHz
26.585.168.652 instructions:u # 2,37 insn per cycle
6.234.491.027 branches:u # 1470,227 M/sec
35.149.387 branch-misses:u # 0,56% of all branches
4,435152034 seconds time elapsed
3,732758000 seconds user
0,490148000 seconds sys
```
after:
```
Performance counter stats for '/home/milian/projects/compiled/other/lib/libexec/hotspot-perfparser --input ./perf.data --output /dev/null':
4.160,90 msec task-clock:u # 0,979 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
15.476 page-faults:u # 0,004 M/sec
10.635.798.451 cycles:u # 2,556 GHz
16.616.035.720 instructions:u # 1,56 insn per cycle
3.838.148.777 branches:u # 922,433 M/sec
24.902.558 branch-misses:u # 0,65% of all branches
4,249408917 seconds time elapsed
3,612442000 seconds user
0,533933000 seconds sys
```
Note that the overall elapsed time doesn't change that much here,
but the amount of instructions required is massively reduced. I bet
there are other situations where this patch will bring a more tangible
improvement to the overall time requirement.
Change-Id: I4531ec648af40dd44b9e4290fab7bbd2a89609da
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
| |
When the ElfInfo is invalid, addr - pgoff is 0 and then the fallback
will fail anyways, as no module is mapped at that address ever.
Change-Id: I04bc372a2e29888b9aa9acf16c74cd27cfce9046
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This feels hacky, but it seems to work! We finally get proper symbols
for Qt signal/slot functors and other inlined lambdas etc.
The new test converts a perf.data for a statically-build test
application into textual form and then verifies that it matches
what we want. Sadly, QTestLib doesn't provide an easy way to diff,
but this is good enough I guess. Esp. note how we write the actual
text to disk too, so if something fails it's easy to run diff on
the command line if needed.
In checking that this patch actually helps, I noticed that only the
test binary compiled with gcc produces symbol names that are
unexpected, whereas the clang-compiled binary produces good results
even without this patch!
This test also uncovers two bugs in the DWARF emitted by clang and
gcc, see the following upstream bug reports for more information:
GCC is missing inline frames for sin/cos calls:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91929
Clang produces invalid mangled DW_AT_linkage_name values for sin/cos:
https://bugs.llvm.org/show_bug.cgi?id=43491
Change-Id: I0acbf57d191f09383c60bdab9e6664f9a74db42f
Fixes: https://github.com/KDAB/hotspot/issues/210
Reviewed-by: Paul Wicking <paul.wicking@qt.io>
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|\
| |
| |
| | |
Change-Id: I67b5de6323f1cc7d1cc46513b1502110175b2899
|
| |
| |
| |
| |
| | |
Change-Id: I222cf66c4555c99bdb56e48fd1383c9ebfb7def7
Reviewed-by: Eike Ziller <eike.ziller@qt.io>
|
| |
| |
| |
| |
| | |
Change-Id: I11984e87469b0b13caae3a0e0c9258d93d21193a
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Not all CU DIEs have a contiguous range, esp. when compiled with
-ffunction-sections or when the linker deduplicates equal functions
across compilation units.
Change-Id: Ie22939e550b4b502a16c6e266740a885407f50f1
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We do not need to recurse the full DIE tree to build the mapping.
Instead, it's enough to find the CU DIE and its bias only. To do that,
we can simply look at the {low,high} PC value of the CU DIEs. If
either of these two values is missing, we look at the dwarf_ranges for
the CU DIE and use the min/max PC values of those to set the CU DIE
range.
In my tests this produces the same results as before. Note how the
inline frames etc. are later on found via dwarf_getscopes{,_die}.
Also, when we do have a .debug_aranges section, then the call to
dwfl_module_addrdie will return us also a CU DIE, not an inner DIE.
The simplified code is easier to understand, will consume less memory
and should also be faster to run.
This patch is based on feedback by David Blakie.
Change-Id: I97767e99b2957aa430b65589b32ec66a9479ff7d
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: I03147ae4a7b17584add02006a5a2281006dbae25
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
| |
Amends commit 3508f6297a829027694a8d141945797e4923c748.
Change-Id: I3ae9df18f6340171189cb150f4fd82ec891037a3
Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous approach to handle the broken DWARF emitter from clang
was to parse all CUs in the hope to find one that contains the
requested address. This was done repeatedly for every address and
often had to go through most of the CUs in a binary, which is a
costly process. To reproduce, one can do the following:
- build perfparser with clang
- profile this build of perfparser while parsing some given workload
- inspect the profile and notice the large overhead from
find_fundie_by_pc
Instead, the first time dwfl_module_addrdie fails, we now query for
all CUs of an ELF and build a custom range map and use that for lookup.
The initial build of this range map is roughly as costly as querying
for two addresses using the old slow code path. As such, once we
query three or more addresses in a given ELF this new approach is
already yielding better performance. Some numbers from my test:
Before:
Performance counter stats for './perfparser --input perf.data --output /dev/null':
46576.925866 task-clock:u (msec) # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
60,868 page-faults:u # 0.001 M/sec
144,506,727,842 cycles:u # 3.103 GHz
472,211,018,102 instructions:u # 3.27 insn per cycle
116,423,898,539 branches:u # 2499.605 M/sec
488,663,345 branch-misses:u # 0.42% of all branches
46.611237448 seconds time elapsed
After:
Performance counter stats for './perfparser --input perf.data --output /dev/null':
17447.629837 task-clock:u (msec) # 0.995 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
60,142 page-faults:u # 0.003 M/sec
53,150,847,387 cycles:u # 3.046 GHz
158,503,454,039 instructions:u # 2.98 insn per cycle
38,711,860,671 branches:u # 2218.746 M/sec
190,209,418 branch-misses:u # 0.49% of all branches
17.543916999 seconds time elapsed
Change-Id: I9ca45ad7c8f77f91d0376f6dcae2f73c6e868404
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clang's DWARF emitter creates data that is not supported
by elfutils. Binutils has a fallback, which backward-cpp
implements too. This code here is based on the MIT licensed
backward-cpp code.
See also:
https://sourceware.org/bugzilla/show_bug.cgi?id=21247
https://sourceware.org/ml/elfutils-devel/2017-q2/msg00190.html
Before:
7fd4aace6590 libm-2.29.so /usr/lib/libm-2.29.so
560070d37512 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/cmath:343:12
560070d371d0 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
7fd4aab16ee2
7fd4aab16df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so
560070d370fd
560070d370d0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
After:
7fd4aace6590 libm-2.29.so /usr/lib/libm-2.29.so
560070d37512 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/cmath:343:12
560070d3750b /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/cmath:342:0 std::log(long double) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
560070d3750b /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.tcc:3328:-1
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.tcc:3318:0 double std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:179:-1
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:177:0 std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1862:-1
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1857:0 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&, std::uniform_real_distribution<double>::param_type const&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1853:-1
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1852:0 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
560070d36000 ../tests/test-clients/cpp-inlining/main.cpp:39:-1
560070d371d0 ../tests/test-clients/cpp-inlining/main.cpp:33:0 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
7fd4aab16ee2
7fd4aab16df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so
560070d370fd
560070d370d0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
Change-Id: Ie678685595311fae72f713a75d5b29ff959cd05d
Fixes: https://github.com/KDAB/hotspot/issues/51
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the same approach as eu-addr2line to find inline frames.
I.e. use dwarf_getscopes_die instead of manually parsing the
dwarf tree. This allows us to get full inline backtraces for
both, gcc 9 and clang 8. Previously, we only got partial inline
backtraces for gcc, and no inline traces for clang at all.
similar to how eu-addr2line
Before:
560b37713a54 /usr/include/c++/9.1.0/bits/random.h:139:6
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:135:2 std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:147:48
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:146:7 unsigned long std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:349:50
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:347:7 std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713a3b /usr/include/c++/9.1.0/bits/random.tcc:3336:29
560b37713b80 /usr/include/c++/9.1.0/bits/random.h:1852:2 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713b80 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39:19
560b377139a0 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:33:5 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
7f7a30bb7ee2
7f7a30bb7df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so
560b37713cdd
560b37713cb0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
After:
560b37713a54 /usr/include/c++/9.1.0/bits/random.h:139:6
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:135:2 std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:147:48
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:146:7 unsigned long std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:349:50
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:347:7 std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713a3b /usr/include/c++/9.1.0/bits/random.tcc:3336:29
560b37713000 /usr/include/c++/9.1.0/bits/random.tcc:3318:5 double std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713000 /usr/include/c++/9.1.0/bits/random.h:181:38
560b37713000 /usr/include/c++/9.1.0/bits/random.h:177:2 std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713000 /usr/include/c++/9.1.0/bits/random.h:1862:19
560b37713000 /usr/include/c++/9.1.0/bits/random.h:1857:2 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&, std::uniform_real_distribution<double>::param_type const&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713000 /usr/include/c++/9.1.0/bits/random.h:1853:51
560b37713b80 /usr/include/c++/9.1.0/bits/random.h:1852:2 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713b80 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39:19
560b377139a0 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:33:5 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
7f7a30bb7ee2
7f7a30bb7df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so
560b37713cdd
560b37713cb0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
Change-Id: I5d8d5d3f29f659e092c2270c105ef48eae3d99c4
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the content size is 4, we actually parse more than that.
Take this into account instead of triggering a misleading warning:
QWARN : TestPerfData::testTracingData(stream stats)
warning: PerfData::processEvents[../../../../app/perfdata.cpp:197]?[0m: Event not fully parsed 66 4 2836
QWARN : TestPerfData::testTracingData(stream stats)
warning: unknown[unknown:0]?[0m: QIODevice::skip (QFile, ":/probe.data.stream"): Called with maxSize < 0
Change-Id: I43aab019a5f34a20e890c87f5d53120e977e293f
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: I8dca8eb4cc3aa728f033ff27679ef65b5a2fbee2
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
| |
Similar to PerfRecordMmap, the PerfSampleId pid and tid are always
zero, while the struct-specific pid and tid carry the actual pid
and tid. By shadowing the pid and tid, the correct values get
forwarded in PerfUnwind::comm.
Change-Id: I84ef564076e26f779cb902eed603d0a500dd4825
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
| |
We cannot read the position on those.
Change-Id: Ife92c3bc8537353b3c880c53b203d8007656e96f
Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
|
|
|
|
|
|
|
|
| |
Now that we don't maintain backwards compatibility in either QtCreator
or hotspot anymore, we need to drop the old event IDs.
Change-Id: If8855aab72576c375146b53ff45d06c7343259cf
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
|
|
|
|
|
|
| |
Detected by GCC9.
Change-Id: I30b1cc0bd91eecc973d2a64d3acd648a486c203e
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|\
| |
| |
| | |
Change-Id: I08cbfa6e80c8e75f756e22ea57e39c61f8678c49
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Otherwise we receive junk. Apparently there are even ways to close stdin
from the outside.
Fixes: QTCREATORBUG-21971
Change-Id: I87978ca3d0001026f63fd948db8cd95da674a953
Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
|
| |
| |
| |
| |
| | |
Change-Id: I72f461596d559347da48e983d2bba7ea2568f50d
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| | |
Change-Id: I9c33cc643152675c97522ad5985c03e06ece6a7f
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Prevent infinite looping when we access a stale base map. This
could happen when we encounter bogus mmap lists as happens in
https://github.com/KDAB/hotspot/issues/164
Verify that the base map actually corresponds to the expected
elf map and only use that then. Otherwise don't use the base
map and continue with the original mapping, hoping for the best.
While this fixes the stack overflow of the initial bug report,
it doesn't solve the fundamental issue of dealing with broken
data... We'll have to figure that one out separately.
Fixes: https://github.com/KDAB/hotspot/issues/164
Change-Id: Iaebddbfbc891784a7fcc05df47aba761b75cc587
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Supports perf reports from MIPS targets.
MIPS/perf isn't really supported on the mainstream Linux kernel, but
there are a number of patches around which can be used to build
working perf support:
https://lkml.org/lkml/2016/4/1/162
This change just adds the definitions required by hotspot to
support this.
Change-Id: Ifa569c6e33e743c4d239b1ae0448b28aa026d051
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| | |
by 0
Change-Id: Ic0e57b93991c03f9842b74d911697899ca35f7c0
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes parsing of data files generated with `perf record -s`.
In such a case, we do have non-empty attributes but the samples don't
reference any attribute by id. Perf uses the first available attribute
in such cases, cf. `perf_evlist__id2evsel`.
Change-Id: I1e5a9b59cd82dbe0d4eb1d140fb9b1e7768284fa
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When a branch stack is available (i.e. `perf record -g lbr`),
then resolve the callchain stored therein. In such cases, the
fp callchain can potentially contain kernel frames, so still
resolve those but skip any user frames therein.
The branch stack then contains the user frames: The "to"
register is the callee, the "from" register is the caller. That
means the callchain can be build by combining the first entry's
"to" register (the tail), with all "from" registers. See also
`callchain__lbr_callstack_printf` in perf's `session.c` for more
information.
Change-Id: I0e060e158859eac6c130c073255af87c365679bf
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When we record perf data files with `--call-graph [fp,lbr]`, we
only see mmap events for the executable code segments. These are
usually offset. Without the non-offset mmap event, we would not
report anything to dwfl so far. Now, we guess the base address and
use that before reporting the elf to dwfl. This seems to work, but
could fail once a gap would exist at the start.
Change-Id: I9d837e8bed650d6574e84401da67beeddaf7ff57
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Fixes parsing of perf data files with LBR stacks, i.e. files
generated with `perf record --call-graph lbr`.
Change-Id: I7e75d655e8a09f484fe207298ce44a871c1f4903
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Additionally, print a warning. This uncovers an issue in the sample
parsing for perf data files with LBR callchains.
Change-Id: Ib87e54dc33013974691da155aae2037332bc82b8
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The old code only checked for the folder layout in the ~/.debug
cache, where the files get cached by their build-id. If that does
not happen, then we failed to find the debug link file when it
lies next to the actual file. This patch fixes that.
Change-Id: I7411a6d8803c1787a3c76db77aeec30400208565
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On older Ubuntu, I sometimes see the case where we are looking
for a debug link file that is not cached. In such cases, we should
fall-back to the build-id file, which contains the debug information.
Change-Id: Icd6d7bc13a9d3d74c3bb237366e04e7ffa6e195e
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This will remove the bogus error message
"Could not find ELF file for [kernel.kallsyms]_text". An ELF file for
this is not needed, as unwinding in the kernel is treated specially.
Fixes KDAB/hotspot#123
Change-Id: I79a31564dd971932aeafaad58715e702b253d600
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
kallsyms files with all addresses being 0 can happen. In this case,
ignore the mapping, as otherwise an address would resolve to a random
kernel symbol.
In addition, now perfparser shows a correct error message about being
unable to open the mapping.
Fixes KDAB/hotspot#117
Change-Id: Ib7ccf2f405f3a0a2149fa747175fb9feb11dfb07
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| | |
This gives a developer the chance to attach a debugger.
Change-Id: I80ca1246e06727f511a326ab87a7fc32c8462fd2
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Apparently some kallsyms report the obfuscated address as (null)
instead of all-zero. Handle this scenario instead of giving up
parsing the file.
Change-Id: I108c20d1845933a429ee5cd26217b707f6aac4cc
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is required to find e.g. the debug file for ld-2.26.so
on Ubuntu 17.10. That one can only be found in the /usr/lib/debug
folder. It itself resides in /lib/x86_64-linux-gnu/ld-2.26.so.
The debuglink is ld-2.26.so. The debug info file is found in
/usr/lib/debug/lib/x86_64-linux-gnu/ld-2.26.so.
Change-Id: I7c4c67873761a70a7b4f72f5adafea3023b08c12
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The path is useful for filtering. It enables us to exclude frames
that point to system libraries for example.
Change-Id: Ic181e8498ffb237727d6176094bd3724a5d6ed0d
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
First read the CPU, then the reserved rest/waste. This ensures
we get non-zero CPU IDs for switch events.
Change-Id: I54c60e8902a1fd3e9ab5dbd63b46734c551deacd
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This data is only meaningful (i.e. non-zero) when perf record
was run with --sample-cpu.
Change-Id: I36a7c7d9cba6a7e334ff89aacc62b05392e51b26
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
So far, only the context switches where buffered and then handled
in time-ordered fashion. Now we do this also for Command, ThreadStart,
ThreadEnd and Lost events. Additionally, we properly handle these
events when no more samples are available, which can happen for
applications that basically only sleep and don't trigger any
significant CPU load.
Change-Id: I4b1c8a1cfc91737a75a48f38dba04d6742f7c3a3
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The attributes available in the features struct contains more
information. We used to overwrite that by the less-interesting
data available in the attributes list.
Instead of doing this, check whether a given attribute was
encountered already. If so, don't report it a second time.
Change-Id: I308c3cf7ceeb8e6f0ca33de52ecfc1a63f62477d
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is required for files recorded with e.g.
perf record -e '{cycles,instructions}:uPS'
Otherwise, we only get garbadge values for the cycles.
Change-Id: I65de37e81392b714c0dd65ddbefad64f4d2c353d
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Sometimes we encounter mmap events from perf that span a larger
region than dwfl will parse. In these situations, we would invalidate
the cache and then rebuild it exactly as before - i.e. we would
just waste time for no gain.
This patch checks whether this is happening and prevents the
cache invalidation to speed up the parse process. In one of my
cases this helps significantly:
Before:
Performance counter stats for './perfparser --input perf.data --output /dev/null':
19553.281950 task-clock:u (msec) # 0.998 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
59,997 page-faults:u # 0.003 M/sec
58,567,488,383 cycles:u # 2.995 GHz
158,503,674,356 instructions:u # 2.71 insn per cycle
38,712,268,219 branches:u # 1979.835 M/sec
193,851,125 branch-misses:u # 0.50% of all branches
19.593726259 seconds time elapsed
After:
Performance counter stats for './perfparser --input perf.data --output /dev/null':
11775.076614 task-clock:u (msec) # 0.997 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
22,747 page-faults:u # 0.002 M/sec
35,463,053,591 cycles:u # 3.012 GHz
94,541,494,541 instructions:u # 2.67 insn per cycle
23,038,811,370 branches:u # 1956.574 M/sec
120,291,747 branch-misses:u # 0.52% of all branches
11.807795955 seconds time elapsed
Change-Id: I3efdf5941c5f66cb2d38fecc8ef824c6aef245da
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|