| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Change-Id: I11984e87469b0b13caae3a0e0c9258d93d21193a
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
| |
Not all CU DIEs have a contiguous range, esp. when compiled with
-ffunction-sections or when the linker deduplicates equal functions
across compilation units.
Change-Id: Ie22939e550b4b502a16c6e266740a885407f50f1
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We do not need to recurse the full DIE tree to build the mapping.
Instead, it's enough to find the CU DIE and its bias only. To do that,
we can simply look at the {low,high} PC value of the CU DIEs. If
either of these two values is missing, we look at the dwarf_ranges for
the CU DIE and use the min/max PC values of those to set the CU DIE
range.
In my tests this produces the same results as before. Note how the
inline frames etc. are later on found via dwarf_getscopes{,_die}.
Also, when we do have a .debug_aranges section, then the call to
dwfl_module_addrdie will return us also a CU DIE, not an inner DIE.
The simplified code is easier to understand, will consume less memory
and should also be faster to run.
This patch is based on feedback by David Blakie.
Change-Id: I97767e99b2957aa430b65589b32ec66a9479ff7d
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: I03147ae4a7b17584add02006a5a2281006dbae25
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
| |
Amends commit 3508f6297a829027694a8d141945797e4923c748.
Change-Id: I3ae9df18f6340171189cb150f4fd82ec891037a3
Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous approach to handle the broken DWARF emitter from clang
was to parse all CUs in the hope to find one that contains the
requested address. This was done repeatedly for every address and
often had to go through most of the CUs in a binary, which is a
costly process. To reproduce, one can do the following:
- build perfparser with clang
- profile this build of perfparser while parsing some given workload
- inspect the profile and notice the large overhead from
find_fundie_by_pc
Instead, the first time dwfl_module_addrdie fails, we now query for
all CUs of an ELF and build a custom range map and use that for lookup.
The initial build of this range map is roughly as costly as querying
for two addresses using the old slow code path. As such, once we
query three or more addresses in a given ELF this new approach is
already yielding better performance. Some numbers from my test:
Before:
Performance counter stats for './perfparser --input perf.data --output /dev/null':
46576.925866 task-clock:u (msec) # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
60,868 page-faults:u # 0.001 M/sec
144,506,727,842 cycles:u # 3.103 GHz
472,211,018,102 instructions:u # 3.27 insn per cycle
116,423,898,539 branches:u # 2499.605 M/sec
488,663,345 branch-misses:u # 0.42% of all branches
46.611237448 seconds time elapsed
After:
Performance counter stats for './perfparser --input perf.data --output /dev/null':
17447.629837 task-clock:u (msec) # 0.995 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
60,142 page-faults:u # 0.003 M/sec
53,150,847,387 cycles:u # 3.046 GHz
158,503,454,039 instructions:u # 2.98 insn per cycle
38,711,860,671 branches:u # 2218.746 M/sec
190,209,418 branch-misses:u # 0.49% of all branches
17.543916999 seconds time elapsed
Change-Id: I9ca45ad7c8f77f91d0376f6dcae2f73c6e868404
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clang's DWARF emitter creates data that is not supported
by elfutils. Binutils has a fallback, which backward-cpp
implements too. This code here is based on the MIT licensed
backward-cpp code.
See also:
https://sourceware.org/bugzilla/show_bug.cgi?id=21247
https://sourceware.org/ml/elfutils-devel/2017-q2/msg00190.html
Before:
7fd4aace6590 libm-2.29.so /usr/lib/libm-2.29.so
560070d37512 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/cmath:343:12
560070d371d0 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
7fd4aab16ee2
7fd4aab16df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so
560070d370fd
560070d370d0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
After:
7fd4aace6590 libm-2.29.so /usr/lib/libm-2.29.so
560070d37512 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/cmath:343:12
560070d3750b /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/cmath:342:0 std::log(long double) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
560070d3750b /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.tcc:3328:-1
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.tcc:3318:0 double std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:179:-1
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:177:0 std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1862:-1
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1857:0 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&, std::uniform_real_distribution<double>::param_type const&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1853:-1
560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1852:0 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
560070d36000 ../tests/test-clients/cpp-inlining/main.cpp:39:-1
560070d371d0 ../tests/test-clients/cpp-inlining/main.cpp:33:0 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
7fd4aab16ee2
7fd4aab16df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so
560070d370fd
560070d370d0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining
Change-Id: Ie678685595311fae72f713a75d5b29ff959cd05d
Fixes: https://github.com/KDAB/hotspot/issues/51
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the same approach as eu-addr2line to find inline frames.
I.e. use dwarf_getscopes_die instead of manually parsing the
dwarf tree. This allows us to get full inline backtraces for
both, gcc 9 and clang 8. Previously, we only got partial inline
backtraces for gcc, and no inline traces for clang at all.
similar to how eu-addr2line
Before:
560b37713a54 /usr/include/c++/9.1.0/bits/random.h:139:6
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:135:2 std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:147:48
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:146:7 unsigned long std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:349:50
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:347:7 std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713a3b /usr/include/c++/9.1.0/bits/random.tcc:3336:29
560b37713b80 /usr/include/c++/9.1.0/bits/random.h:1852:2 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713b80 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39:19
560b377139a0 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:33:5 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
7f7a30bb7ee2
7f7a30bb7df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so
560b37713cdd
560b37713cb0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
After:
560b37713a54 /usr/include/c++/9.1.0/bits/random.h:139:6
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:135:2 std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:147:48
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:146:7 unsigned long std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:349:50
560b37713a3b /usr/include/c++/9.1.0/bits/random.h:347:7 std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713a3b /usr/include/c++/9.1.0/bits/random.tcc:3336:29
560b37713000 /usr/include/c++/9.1.0/bits/random.tcc:3318:5 double std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713000 /usr/include/c++/9.1.0/bits/random.h:181:38
560b37713000 /usr/include/c++/9.1.0/bits/random.h:177:2 std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713000 /usr/include/c++/9.1.0/bits/random.h:1862:19
560b37713000 /usr/include/c++/9.1.0/bits/random.h:1857:2 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&, std::uniform_real_distribution<double>::param_type const&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713000 /usr/include/c++/9.1.0/bits/random.h:1853:51
560b37713b80 /usr/include/c++/9.1.0/bits/random.h:1852:2 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
560b37713b80 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39:19
560b377139a0 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:33:5 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
7f7a30bb7ee2
7f7a30bb7df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so
560b37713cdd
560b37713cb0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining
Change-Id: I5d8d5d3f29f659e092c2270c105ef48eae3d99c4
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the content size is 4, we actually parse more than that.
Take this into account instead of triggering a misleading warning:
QWARN : TestPerfData::testTracingData(stream stats)
warning: PerfData::processEvents[../../../../app/perfdata.cpp:197]?[0m: Event not fully parsed 66 4 2836
QWARN : TestPerfData::testTracingData(stream stats)
warning: unknown[unknown:0]?[0m: QIODevice::skip (QFile, ":/probe.data.stream"): Called with maxSize < 0
Change-Id: I43aab019a5f34a20e890c87f5d53120e977e293f
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: I8dca8eb4cc3aa728f033ff27679ef65b5a2fbee2
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
| |
Similar to PerfRecordMmap, the PerfSampleId pid and tid are always
zero, while the struct-specific pid and tid carry the actual pid
and tid. By shadowing the pid and tid, the correct values get
forwarded in PerfUnwind::comm.
Change-Id: I84ef564076e26f779cb902eed603d0a500dd4825
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
| |
We cannot read the position on those.
Change-Id: Ife92c3bc8537353b3c880c53b203d8007656e96f
Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
|
|
|
|
|
|
|
|
| |
Now that we don't maintain backwards compatibility in either QtCreator
or hotspot anymore, we need to drop the old event IDs.
Change-Id: If8855aab72576c375146b53ff45d06c7343259cf
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
|
|
|
|
|
|
| |
Detected by GCC9.
Change-Id: I30b1cc0bd91eecc973d2a64d3acd648a486c203e
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|\
| |
| |
| | |
Change-Id: I08cbfa6e80c8e75f756e22ea57e39c61f8678c49
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Otherwise we receive junk. Apparently there are even ways to close stdin
from the outside.
Fixes: QTCREATORBUG-21971
Change-Id: I87978ca3d0001026f63fd948db8cd95da674a953
Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
|
| |
| |
| |
| |
| | |
Change-Id: I72f461596d559347da48e983d2bba7ea2568f50d
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| | |
Change-Id: I9c33cc643152675c97522ad5985c03e06ece6a7f
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Prevent infinite looping when we access a stale base map. This
could happen when we encounter bogus mmap lists as happens in
https://github.com/KDAB/hotspot/issues/164
Verify that the base map actually corresponds to the expected
elf map and only use that then. Otherwise don't use the base
map and continue with the original mapping, hoping for the best.
While this fixes the stack overflow of the initial bug report,
it doesn't solve the fundamental issue of dealing with broken
data... We'll have to figure that one out separately.
Fixes: https://github.com/KDAB/hotspot/issues/164
Change-Id: Iaebddbfbc891784a7fcc05df47aba761b75cc587
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Supports perf reports from MIPS targets.
MIPS/perf isn't really supported on the mainstream Linux kernel, but
there are a number of patches around which can be used to build
working perf support:
https://lkml.org/lkml/2016/4/1/162
This change just adds the definitions required by hotspot to
support this.
Change-Id: Ifa569c6e33e743c4d239b1ae0448b28aa026d051
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| | |
by 0
Change-Id: Ic0e57b93991c03f9842b74d911697899ca35f7c0
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes parsing of data files generated with `perf record -s`.
In such a case, we do have non-empty attributes but the samples don't
reference any attribute by id. Perf uses the first available attribute
in such cases, cf. `perf_evlist__id2evsel`.
Change-Id: I1e5a9b59cd82dbe0d4eb1d140fb9b1e7768284fa
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When a branch stack is available (i.e. `perf record -g lbr`),
then resolve the callchain stored therein. In such cases, the
fp callchain can potentially contain kernel frames, so still
resolve those but skip any user frames therein.
The branch stack then contains the user frames: The "to"
register is the callee, the "from" register is the caller. That
means the callchain can be build by combining the first entry's
"to" register (the tail), with all "from" registers. See also
`callchain__lbr_callstack_printf` in perf's `session.c` for more
information.
Change-Id: I0e060e158859eac6c130c073255af87c365679bf
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When we record perf data files with `--call-graph [fp,lbr]`, we
only see mmap events for the executable code segments. These are
usually offset. Without the non-offset mmap event, we would not
report anything to dwfl so far. Now, we guess the base address and
use that before reporting the elf to dwfl. This seems to work, but
could fail once a gap would exist at the start.
Change-Id: I9d837e8bed650d6574e84401da67beeddaf7ff57
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Fixes parsing of perf data files with LBR stacks, i.e. files
generated with `perf record --call-graph lbr`.
Change-Id: I7e75d655e8a09f484fe207298ce44a871c1f4903
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Additionally, print a warning. This uncovers an issue in the sample
parsing for perf data files with LBR callchains.
Change-Id: Ib87e54dc33013974691da155aae2037332bc82b8
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The old code only checked for the folder layout in the ~/.debug
cache, where the files get cached by their build-id. If that does
not happen, then we failed to find the debug link file when it
lies next to the actual file. This patch fixes that.
Change-Id: I7411a6d8803c1787a3c76db77aeec30400208565
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On older Ubuntu, I sometimes see the case where we are looking
for a debug link file that is not cached. In such cases, we should
fall-back to the build-id file, which contains the debug information.
Change-Id: Icd6d7bc13a9d3d74c3bb237366e04e7ffa6e195e
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This will remove the bogus error message
"Could not find ELF file for [kernel.kallsyms]_text". An ELF file for
this is not needed, as unwinding in the kernel is treated specially.
Fixes KDAB/hotspot#123
Change-Id: I79a31564dd971932aeafaad58715e702b253d600
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
kallsyms files with all addresses being 0 can happen. In this case,
ignore the mapping, as otherwise an address would resolve to a random
kernel symbol.
In addition, now perfparser shows a correct error message about being
unable to open the mapping.
Fixes KDAB/hotspot#117
Change-Id: Ib7ccf2f405f3a0a2149fa747175fb9feb11dfb07
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| | |
This gives a developer the chance to attach a debugger.
Change-Id: I80ca1246e06727f511a326ab87a7fc32c8462fd2
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Apparently some kallsyms report the obfuscated address as (null)
instead of all-zero. Handle this scenario instead of giving up
parsing the file.
Change-Id: I108c20d1845933a429ee5cd26217b707f6aac4cc
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is required to find e.g. the debug file for ld-2.26.so
on Ubuntu 17.10. That one can only be found in the /usr/lib/debug
folder. It itself resides in /lib/x86_64-linux-gnu/ld-2.26.so.
The debuglink is ld-2.26.so. The debug info file is found in
/usr/lib/debug/lib/x86_64-linux-gnu/ld-2.26.so.
Change-Id: I7c4c67873761a70a7b4f72f5adafea3023b08c12
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The path is useful for filtering. It enables us to exclude frames
that point to system libraries for example.
Change-Id: Ic181e8498ffb237727d6176094bd3724a5d6ed0d
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
First read the CPU, then the reserved rest/waste. This ensures
we get non-zero CPU IDs for switch events.
Change-Id: I54c60e8902a1fd3e9ab5dbd63b46734c551deacd
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This data is only meaningful (i.e. non-zero) when perf record
was run with --sample-cpu.
Change-Id: I36a7c7d9cba6a7e334ff89aacc62b05392e51b26
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
So far, only the context switches where buffered and then handled
in time-ordered fashion. Now we do this also for Command, ThreadStart,
ThreadEnd and Lost events. Additionally, we properly handle these
events when no more samples are available, which can happen for
applications that basically only sleep and don't trigger any
significant CPU load.
Change-Id: I4b1c8a1cfc91737a75a48f38dba04d6742f7c3a3
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The attributes available in the features struct contains more
information. We used to overwrite that by the less-interesting
data available in the attributes list.
Instead of doing this, check whether a given attribute was
encountered already. If so, don't report it a second time.
Change-Id: I308c3cf7ceeb8e6f0ca33de52ecfc1a63f62477d
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is required for files recorded with e.g.
perf record -e '{cycles,instructions}:uPS'
Otherwise, we only get garbadge values for the cycles.
Change-Id: I65de37e81392b714c0dd65ddbefad64f4d2c353d
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Sometimes we encounter mmap events from perf that span a larger
region than dwfl will parse. In these situations, we would invalidate
the cache and then rebuild it exactly as before - i.e. we would
just waste time for no gain.
This patch checks whether this is happening and prevents the
cache invalidation to speed up the parse process. In one of my
cases this helps significantly:
Before:
Performance counter stats for './perfparser --input perf.data --output /dev/null':
19553.281950 task-clock:u (msec) # 0.998 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
59,997 page-faults:u # 0.003 M/sec
58,567,488,383 cycles:u # 2.995 GHz
158,503,674,356 instructions:u # 2.71 insn per cycle
38,712,268,219 branches:u # 1979.835 M/sec
193,851,125 branch-misses:u # 0.50% of all branches
19.593726259 seconds time elapsed
After:
Performance counter stats for './perfparser --input perf.data --output /dev/null':
11775.076614 task-clock:u (msec) # 0.997 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
22,747 page-faults:u # 0.002 M/sec
35,463,053,591 cycles:u # 3.012 GHz
94,541,494,541 instructions:u # 2.67 insn per cycle
23,038,811,370 branches:u # 1956.574 M/sec
120,291,747 branch-misses:u # 0.52% of all branches
11.807795955 seconds time elapsed
Change-Id: I3efdf5941c5f66cb2d38fecc8ef824c6aef245da
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We previously cleared the cache whenever an mmap event overlapped
a previously encountered mmap event. But often, these mmaps events
where not (yet) used for dwfl. As such, we do not have anything cached
from them which would be invalidated now. So keep the rest of the cache
alive to speed up the process slightly for some scenarios. One of my
files now gets parsed in ~5s instead of ~6s before. I see that the
cache is now only cleared 500 times instead of 2500 times.
Change-Id: If9c29946f6fb12a6b6d64888fe34d8add8eaebb4
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Instead of using one large shared address cache that needs to be
cleared every now and then, cache the address location information
per elf file using relative addresses. This means we don't need to
clear this cache at all, as the lookup for a relative address into
a given elf file will always return the same data.
This greatly improves the performance under some situations where
the cache is cleared often. For one of my files, it drops the time
from 55s down to 18s.
Additionally, this patch paves the way to share (parts of) this cache
in PerfUnwind. Most of its contents are not PID specific anymore.
Change-Id: I79616fbb5c45a2543845df2d05d9936e49401627
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This enables accurate off-CPU profiling based on context switch data.
When perf data is recorded via
perf record --switch-events ...
then the switch in/out events of every thread is included. The time
differences allow us to measure how long a thread was off-CPU. Paired
with call stacks recorded for the switch-out event, we can attribute
the wait time to individual source code locations, i.e.:
perf record --switch-events -e sched:sched_switch
This then gives us event data such as:
lab_mandelbrot_ 6974 [003] 1888.320564: sched:sched_switch: lab_mandelbrot_:6974 [120] S ==> swapper/3:0 [120]
857516 __sched_text_start (/lib/modules/4.12.3-1-ARCH/build/vmlinux)
857b0d schedule (/lib/modules/4.12.3-1-ARCH/build/vmlinux)
85bdc7 schedule_hrtimeout_range_clock (/lib/modules/4.12.3-1-ARCH/build/vmlinux)
85bdf3 schedule_hrtimeout_range (/lib/modules/4.12.3-1-ARCH/build/vmlinux)
434d85 poll_schedule_timeout (/lib/modules/4.12.3-1-ARCH/build/vmlinux)
436150 do_sys_poll (/lib/modules/4.12.3-1-ARCH/build/vmlinux)
436aa0 sys_poll (/lib/modules/4.12.3-1-ARCH/build/vmlinux)
85d0f7 entry_SYSCALL_64_fastpath (/lib/modules/4.12.3-1-ARCH/build/vmlinux)
e3e80 __poll_nocancel (/usr/lib/libc-2.25.so)
...
lab_mandelbrot_ 6974 1888.320568: PERF_RECORD_SWITCH OUT
lab_mandelbrot_ 6974 1888.320593: PERF_RECORD_SWITCH IN
Note how you don't get the sched:sched_switch trace event for the
switch-in event, as that happens in the context of another process.
Additionally to enabling such kind of off-CPU profiling, we can use
the switch events for creating a cpu usage timeline, similar to what
other tools offer. I.e. a thread that is scheduled consumes one CPU
for the given timespan. Paired with other threads and based on the
zoom level of the timeline, this then leads to fractional CPU loads.
Change-Id: Ia2f4d07692e68a4c20244be6327791c6ceaed85c
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is a workaround until we get this functionality into elfutils
upstream. The symbols are looked up manually, which requires quite
some jumping around the ELF file:
- first map the address into the module space
- check whether the address lies in the .plt section (by name)
- if so, find the index of the .plt entry we just found
- then find the dynamic section and dynamic string table
- in the dynamic section, find the address of the GOT via DT_PLTGOT
- find the section containing the GOT address, this is the GOT
- find the GOT entry corresponding to the .plt index, offset by 2
- find the REL/RELA section containing the entry matching the address
of the GOT entry
- find the (dynamic) symbol for the REL/RELA entry
- find the string for that symbol, demangle it, and append @plt
Change-Id: I67d05f1c728b943317853bb98fa96dba48b58a3c
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| | |
Fixes issue where we mapped all .plt entries to the same entry.
Change-Id: Iaa5f70455ffd4749793c295b7b79101d479620c8
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This allows us to reconstruct the sample period for data files
generated with the following command:
perf record -c 100000 ...
Without the explicit `-P` flag, the above data file has no periods
for the samples. By forwarding the event attribute configuration,
clients can reconstruct the data as needed.
Change-Id: I098e1f14cc66e97daaceffb288080868693c2d95
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
|/
|
|
|
|
|
|
|
|
| |
This is not as good as the fallback that binutils has. But any proper
solution should be put into elfutils itself. See also:
http://www.mail-archive.com/elfutils-devel@sourceware.org/msg00019.html
Change-Id: Ief4d6450f97f0e25874a6163f442a8c1c257748e
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
|
|
|
|
| |
Change-Id: Ic85106d7069e55ccb2ab887d5b308b23807920a8
Reviewed-by: Eike Ziller <eike.ziller@qt.io>
|
|\
| |
| |
| | |
Change-Id: Ie74691e3153027bc951d3d469adb8ff58ae5c51c
|
| |
| |
| |
| |
| |
| | |
Change-Id: Ia29e55f9c6e73874de4b48d3dfa07741a1be00ae
Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
Reviewed-by: Eike Ziller <eike.ziller@qt.io>
|