summaryrefslogtreecommitdiffstats
path: root/app
Commit message (Collapse)AuthorAgeFilesLines
* Mark CuDieRange as movable typeMilian Wolff2019-09-301-1/+3
| | | | | Change-Id: I11984e87469b0b13caae3a0e0c9258d93d21193a Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Properly map discontiguous CU DIE rangesMilian Wolff2019-09-301-28/+11
| | | | | | | | | Not all CU DIEs have a contiguous range, esp. when compiled with -ffunction-sections or when the linker deduplicates equal functions across compilation units. Change-Id: Ie22939e550b4b502a16c6e266740a885407f50f1 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Simplify dwfl_module_addrdie fall-back when .debug_aranges is missingMilian Wolff2019-09-262-127/+46
| | | | | | | | | | | | | | | | | | | | | | We do not need to recurse the full DIE tree to build the mapping. Instead, it's enough to find the CU DIE and its bias only. To do that, we can simply look at the {low,high} PC value of the CU DIEs. If either of these two values is missing, we look at the dwarf_ranges for the CU DIE and use the min/max PC values of those to set the CU DIE range. In my tests this produces the same results as before. Note how the inline frames etc. are later on found via dwarf_getscopes{,_die}. Also, when we do have a .debug_aranges section, then the call to dwfl_module_addrdie will return us also a CU DIE, not an inner DIE. The simplified code is easier to understand, will consume less memory and should also be faster to run. This patch is based on feedback by David Blakie. Change-Id: I97767e99b2957aa430b65589b32ec66a9479ff7d Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Fix build with namespaced QtChristian Stenger2019-09-021-0/+6
| | | | | Change-Id: I03147ae4a7b17584add02006a5a2281006dbae25 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Fix typo in perfsymboltable.cppUlf Hermann2019-08-301-1/+1
| | | | | | | Amends commit 3508f6297a829027694a8d141945797e4923c748. Change-Id: I3ae9df18f6340171189cb150f4fd82ec891037a3 Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
* Speed up perfparser when DWARF ranges are broken/missing in ELFsMilian Wolff2019-08-162-72/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous approach to handle the broken DWARF emitter from clang was to parse all CUs in the hope to find one that contains the requested address. This was done repeatedly for every address and often had to go through most of the CUs in a binary, which is a costly process. To reproduce, one can do the following: - build perfparser with clang - profile this build of perfparser while parsing some given workload - inspect the profile and notice the large overhead from find_fundie_by_pc Instead, the first time dwfl_module_addrdie fails, we now query for all CUs of an ELF and build a custom range map and use that for lookup. The initial build of this range map is roughly as costly as querying for two addresses using the old slow code path. As such, once we query three or more addresses in a given ELF this new approach is already yielding better performance. Some numbers from my test: Before: Performance counter stats for './perfparser --input perf.data --output /dev/null': 46576.925866 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 60,868 page-faults:u # 0.001 M/sec 144,506,727,842 cycles:u # 3.103 GHz 472,211,018,102 instructions:u # 3.27 insn per cycle 116,423,898,539 branches:u # 2499.605 M/sec 488,663,345 branch-misses:u # 0.42% of all branches 46.611237448 seconds time elapsed After: Performance counter stats for './perfparser --input perf.data --output /dev/null': 17447.629837 task-clock:u (msec) # 0.995 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 60,142 page-faults:u # 0.003 M/sec 53,150,847,387 cycles:u # 3.046 GHz 158,503,454,039 instructions:u # 2.98 insn per cycle 38,711,860,671 branches:u # 2218.746 M/sec 190,209,418 branch-misses:u # 0.49% of all branches 17.543916999 seconds time elapsed Change-Id: I9ca45ad7c8f77f91d0376f6dcae2f73c6e868404 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Add fallback to find DIE for code generated by clangMilian Wolff2019-08-161-33/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clang's DWARF emitter creates data that is not supported by elfutils. Binutils has a fallback, which backward-cpp implements too. This code here is based on the MIT licensed backward-cpp code. See also: https://sourceware.org/bugzilla/show_bug.cgi?id=21247 https://sourceware.org/ml/elfutils-devel/2017-q2/msg00190.html Before: 7fd4aace6590 libm-2.29.so /usr/lib/libm-2.29.so 560070d37512 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/cmath:343:12 560070d371d0 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 7fd4aab16ee2 7fd4aab16df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so 560070d370fd 560070d370d0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining After: 7fd4aace6590 libm-2.29.so /usr/lib/libm-2.29.so 560070d37512 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/cmath:343:12 560070d3750b /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/cmath:342:0 std::log(long double) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 560070d3750b /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.tcc:3328:-1 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.tcc:3318:0 double std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:179:-1 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:177:0 std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1862:-1 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1857:0 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&, std::uniform_real_distribution<double>::param_type const&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1853:-1 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1852:0 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 560070d36000 ../tests/test-clients/cpp-inlining/main.cpp:39:-1 560070d371d0 ../tests/test-clients/cpp-inlining/main.cpp:33:0 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 7fd4aab16ee2 7fd4aab16df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so 560070d370fd 560070d370d0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining Change-Id: Ie678685595311fae72f713a75d5b29ff959cd05d Fixes: https://github.com/KDAB/hotspot/issues/51 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Better support for inlined frames in backtracesMilian Wolff2019-08-162-49/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the same approach as eu-addr2line to find inline frames. I.e. use dwarf_getscopes_die instead of manually parsing the dwarf tree. This allows us to get full inline backtraces for both, gcc 9 and clang 8. Previously, we only got partial inline backtraces for gcc, and no inline traces for clang at all. similar to how eu-addr2line Before: 560b37713a54 /usr/include/c++/9.1.0/bits/random.h:139:6 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:135:2 std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:147:48 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:146:7 unsigned long std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:349:50 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:347:7 std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713a3b /usr/include/c++/9.1.0/bits/random.tcc:3336:29 560b37713b80 /usr/include/c++/9.1.0/bits/random.h:1852:2 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713b80 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39:19 560b377139a0 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:33:5 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 7f7a30bb7ee2 7f7a30bb7df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so 560b37713cdd 560b37713cb0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining After: 560b37713a54 /usr/include/c++/9.1.0/bits/random.h:139:6 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:135:2 std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:147:48 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:146:7 unsigned long std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:349:50 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:347:7 std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713a3b /usr/include/c++/9.1.0/bits/random.tcc:3336:29 560b37713000 /usr/include/c++/9.1.0/bits/random.tcc:3318:5 double std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713000 /usr/include/c++/9.1.0/bits/random.h:181:38 560b37713000 /usr/include/c++/9.1.0/bits/random.h:177:2 std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713000 /usr/include/c++/9.1.0/bits/random.h:1862:19 560b37713000 /usr/include/c++/9.1.0/bits/random.h:1857:2 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&, std::uniform_real_distribution<double>::param_type const&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713000 /usr/include/c++/9.1.0/bits/random.h:1853:51 560b37713b80 /usr/include/c++/9.1.0/bits/random.h:1852:2 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713b80 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39:19 560b377139a0 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:33:5 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 7f7a30bb7ee2 7f7a30bb7df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so 560b37713cdd 560b37713cb0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining Change-Id: I5d8d5d3f29f659e092c2270c105ef48eae3d99c4 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Don't warn when parsing tracing data with odd content sizeMilian Wolff2019-08-131-2/+4
| | | | | | | | | | | | | If the content size is 4, we actually parse more than that. Take this into account instead of triggering a misleading warning: QWARN : TestPerfData::testTracingData(stream stats) warning: PerfData::processEvents[../../../../app/perfdata.cpp:197]?[0m: Event not fully parsed 66 4 2836 QWARN : TestPerfData::testTracingData(stream stats) warning: unknown[unknown:0]?[0m: QIODevice::skip (QFile, ":/probe.data.stream"): Called with maxSize < 0 Change-Id: I43aab019a5f34a20e890c87f5d53120e977e293f Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Ensure we always set a valid pid on locations we sendMilian Wolff2019-08-132-0/+2
| | | | | Change-Id: I8dca8eb4cc3aa728f033ff27679ef65b5a2fbee2 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Forward the non-zero pid and tid members of of PerfRecordCommMilian Wolff2019-06-251-0/+5
| | | | | | | | | | Similar to PerfRecordMmap, the PerfSampleId pid and tid are always zero, while the struct-specific pid and tid carry the actual pid and tid. By shadowing the pid and tid, the correct values get forwarded in PerfUnwind::comm. Change-Id: I84ef564076e26f779cb902eed603d0a500dd4825 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Don't try to "fix" partially read events on sequential devices4.10Ulf Hermann2019-05-161-5/+8
| | | | | | | We cannot read the position on those. Change-Id: Ife92c3bc8537353b3c880c53b203d8007656e96f Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
* Drop backwards compatibility for event typesUlf Hermann2019-05-071-4/+0
| | | | | | | | Now that we don't maintain backwards compatibility in either QtCreator or hotspot anymore, we need to drop the old event IDs. Change-Id: If8855aab72576c375146b53ff45d06c7343259cf Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* PerfParser: Fix signedness mismatch on comparisonOrgad Shaneh2019-05-072-5/+5
| | | | | | | Detected by GCC9. Change-Id: I30b1cc0bd91eecc973d2a64d3acd648a486c203e Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Merge remote-tracking branch 'origin/4.9'Ulf Hermann2019-05-061-2/+6
|\ | | | | | | Change-Id: I08cbfa6e80c8e75f756e22ea57e39c61f8678c49
| * On Windows, set stdin to binary4.9Ulf Hermann2019-02-121-2/+6
| | | | | | | | | | | | | | | | | | Otherwise we receive junk. Apparently there are even ways to close stdin from the outside. Fixes: QTCREATORBUG-21971 Change-Id: I87978ca3d0001026f63fd948db8cd95da674a953 Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
* | Bump the version number to 4.10Ulf Hermann2019-05-031-1/+1
| | | | | | | | | | Change-Id: I72f461596d559347da48e983d2bba7ea2568f50d Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Output all values in hex for better comparability with tools like eu-readelfMilian Wolff2019-05-031-5/+5
| | | | | | | | | | Change-Id: I9c33cc643152675c97522ad5985c03e06ece6a7f Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Validate base mapping before using itMilian Wolff2019-05-031-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent infinite looping when we access a stale base map. This could happen when we encounter bogus mmap lists as happens in https://github.com/KDAB/hotspot/issues/164 Verify that the base map actually corresponds to the expected elf map and only use that then. Otherwise don't use the base map and continue with the original mapping, hoping for the best. While this fixes the stack overflow of the initial bug report, it doesn't solve the fundamental issue of dealing with broken data... We'll have to figure that one out separately. Fixes: https://github.com/KDAB/hotspot/issues/164 Change-Id: Iaebddbfbc891784a7fcc05df47aba761b75cc587 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Add initial MIPS supportLuke Diamand2019-05-032-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Supports perf reports from MIPS targets. MIPS/perf isn't really supported on the mainstream Linux kernel, but there are a number of patches around which can be used to build working perf support: https://lkml.org/lkml/2016/4/1/162 This change just adds the definitions required by hotspot to support this. Change-Id: Ifa569c6e33e743c4d239b1ae0448b28aa026d051 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Check for 0 sized section header plt in fakeSymbolFromSection to prevent div ↵Andrew Somerville2019-05-031-1/+1
| | | | | | | | | | | | | | by 0 Change-Id: Ic0e57b93991c03f9842b74d911697899ca35f7c0 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Always use the first attributes as fallbackMilian Wolff2019-05-031-6/+4
| | | | | | | | | | | | | | | | | | | | This fixes parsing of data files generated with `perf record -s`. In such a case, we do have non-empty attributes but the samples don't reference any attribute by id. Perf uses the first available attribute in such cases, cf. `perf_evlist__id2evsel`. Change-Id: I1e5a9b59cd82dbe0d4eb1d140fb9b1e7768284fa Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Also resolve callchain stored in branch stack, if availableMilian Wolff2019-05-032-16/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a branch stack is available (i.e. `perf record -g lbr`), then resolve the callchain stored therein. In such cases, the fp callchain can potentially contain kernel frames, so still resolve those but skip any user frames therein. The branch stack then contains the user frames: The "to" register is the callee, the "from" register is the caller. That means the callchain can be build by combining the first entry's "to" register (the tail), with all "from" registers. See also `callchain__lbr_callstack_printf` in perf's `session.c` for more information. Change-Id: I0e060e158859eac6c130c073255af87c365679bf Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Try to guess offset base address before reporting elf to dwflMilian Wolff2019-05-031-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | When we record perf data files with `--call-graph [fp,lbr]`, we only see mmap events for the executable code segments. These are usually offset. Without the non-offset mmap event, we would not report anything to dwfl so far. Now, we guess the base address and use that before reporting the elf to dwfl. This seems to work, but could fail once a gap would exist at the start. Change-Id: I9d837e8bed650d6574e84401da67beeddaf7ff57 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Correctly parse BranchFlags in BranchEntryMilian Wolff2019-05-032-0/+12
| | | | | | | | | | | | | | | | Fixes parsing of perf data files with LBR stacks, i.e. files generated with `perf record --call-graph lbr`. Change-Id: I7e75d655e8a09f484fe207298ce44a871c1f4903 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Skip unused data when we fail to fully parse an eventMilian Wolff2019-05-031-0/+8
| | | | | | | | | | | | | | | | Additionally, print a warning. This uncovers an issue in the sample parsing for perf data files with LBR callchains. Change-Id: Ib87e54dc33013974691da155aae2037332bc82b8 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Also check for debug link files next to the executable directlyMilian Wolff2019-05-031-1/+20
| | | | | | | | | | | | | | | | | | | | The old code only checked for the folder layout in the ~/.debug cache, where the files get cached by their build-id. If that does not happen, then we failed to find the debug link file when it lies next to the actual file. This patch fixes that. Change-Id: I7411a6d8803c1787a3c76db77aeec30400208565 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Use build-id file for debug information as fall-backMilian Wolff2019-05-031-0/+6
| | | | | | | | | | | | | | | | | | On older Ubuntu, I sometimes see the case where we are looking for a debug link file that is not cached. In such cases, we should fall-back to the build-id file, which contains the debug information. Change-Id: Icd6d7bc13a9d3d74c3bb237366e04e7ffa6e195e Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Treat [kernel.kallsyms]_text as a special sectionThomas McGuire2019-05-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | This will remove the bogus error message "Could not find ELF file for [kernel.kallsyms]_text". An ELF file for this is not needed, as unwinding in the kernel is treated specially. Fixes KDAB/hotspot#123 Change-Id: I79a31564dd971932aeafaad58715e702b253d600 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Ignore kallsyms mappings with address 0Thomas McGuire2019-05-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | kallsyms files with all addresses being 0 can happen. In this case, ignore the mapping, as otherwise an address would resolve to a random kernel symbol. In addition, now perfparser shows a correct error message about being unable to open the mapping. Fixes KDAB/hotspot#117 Change-Id: Ib7ccf2f405f3a0a2149fa747175fb9feb11dfb07 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Halt perfparser when PERFPARSER_DEBUG_WAIT is setThomas McGuire2019-05-031-1/+14
| | | | | | | | | | | | | | This gives a developer the chance to attach a debugger. Change-Id: I80ca1246e06727f511a326ab87a7fc32c8462fd2 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Support (null) as address in kallsymsMilian Wolff2019-05-031-1/+1
| | | | | | | | | | | | | | | | | | Apparently some kallsyms report the obfuscated address as (null) instead of all-zero. Handle this scenario instead of giving up parsing the file. Change-Id: I108c20d1845933a429ee5cd26217b707f6aac4cc Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Add fallback search in /usr/lib/debug for debug info fileMilian Wolff2019-05-031-7/+22
| | | | | | | | | | | | | | | | | | | | | | This is required to find e.g. the debug file for ld-2.26.so on Ubuntu 17.10. That one can only be found in the /usr/lib/debug folder. It itself resides in /lib/x86_64-linux-gnu/ld-2.26.so. The debuglink is ld-2.26.so. The debug info file is found in /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.26.so. Change-Id: I7c4c67873761a70a7b4f72f5adafea3023b08c12 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Forward the path to binary for a given symbolMilian Wolff2019-05-034-18/+23
| | | | | | | | | | | | | | | | The path is useful for filtering. It enables us to exclude frames that point to system libraries for example. Change-Id: Ic181e8498ffb237727d6176094bd3724a5d6ed0d Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Correctly parse SAMPLE_CPU in PerfSampleIdMilian Wolff2019-05-031-1/+1
| | | | | | | | | | | | | | | | First read the CPU, then the reserved rest/waste. This ensures we get non-zero CPU IDs for switch events. Change-Id: I54c60e8902a1fd3e9ab5dbd63b46734c551deacd Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Forward the CPU on which an event occurredMilian Wolff2019-05-033-7/+15
| | | | | | | | | | | | | | | | This data is only meaningful (i.e. non-zero) when perf record was run with --sample-cpu. Change-Id: I36a7c7d9cba6a7e334ff89aacc62b05392e51b26 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Send task events in time-ordered fashionMilian Wolff2019-05-022-64/+76
| | | | | | | | | | | | | | | | | | | | | | | | So far, only the context switches where buffered and then handled in time-ordered fashion. Now we do this also for Command, ThreadStart, ThreadEnd and Lost events. Additionally, we properly handle these events when no more samples are available, which can happen for applications that basically only sleep and don't trigger any significant CPU load. Change-Id: I4b1c8a1cfc91737a75a48f38dba04d6742f7c3a3 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Do not override previous attributesMilian Wolff2019-05-021-9/+22
| | | | | | | | | | | | | | | | | | | | | | | | The attributes available in the features struct contains more information. We used to overwrite that by the less-interesting data available in the attributes list. Instead of doing this, check whether a given attribute was encountered already. If so, don't report it a second time. Change-Id: I308c3cf7ceeb8e6f0ca33de52ecfc1a63f62477d Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Forward all costs for grouped sampleMilian Wolff2019-05-023-5/+18
| | | | | | | | | | | | | | | | | | | | | | This is required for files recorded with e.g. perf record -e '{cycles,instructions}:uPS' Otherwise, we only get garbadge values for the cycles. Change-Id: I65de37e81392b714c0dd65ddbefad64f4d2c353d Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Prevent useless cache invalidationMilian Wolff2019-05-021-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes we encounter mmap events from perf that span a larger region than dwfl will parse. In these situations, we would invalidate the cache and then rebuild it exactly as before - i.e. we would just waste time for no gain. This patch checks whether this is happening and prevents the cache invalidation to speed up the parse process. In one of my cases this helps significantly: Before: Performance counter stats for './perfparser --input perf.data --output /dev/null': 19553.281950 task-clock:u (msec) # 0.998 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 59,997 page-faults:u # 0.003 M/sec 58,567,488,383 cycles:u # 2.995 GHz 158,503,674,356 instructions:u # 2.71 insn per cycle 38,712,268,219 branches:u # 1979.835 M/sec 193,851,125 branch-misses:u # 0.50% of all branches 19.593726259 seconds time elapsed After: Performance counter stats for './perfparser --input perf.data --output /dev/null': 11775.076614 task-clock:u (msec) # 0.997 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 22,747 page-faults:u # 0.002 M/sec 35,463,053,591 cycles:u # 3.012 GHz 94,541,494,541 instructions:u # 2.67 insn per cycle 23,038,811,370 branches:u # 1956.574 M/sec 120,291,747 branch-misses:u # 0.52% of all branches 11.807795955 seconds time elapsed Change-Id: I3efdf5941c5f66cb2d38fecc8ef824c6aef245da Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Only invalidate symbol cache when a previously used elf gets invalidatedMilian Wolff2019-05-023-18/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | We previously cleared the cache whenever an mmap event overlapped a previously encountered mmap event. But often, these mmaps events where not (yet) used for dwfl. As such, we do not have anything cached from them which would be invalidated now. So keep the rest of the cache alive to speed up the process slightly for some scenarios. One of my files now gets parsed in ~5s instead of ~6s before. I see that the cache is now only cleared 500 times instead of 2500 times. Change-Id: If9c29946f6fb12a6b6d64888fe34d8add8eaebb4 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Cache address location information per elf fileMilian Wolff2019-05-026-14/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of using one large shared address cache that needs to be cleared every now and then, cache the address location information per elf file using relative addresses. This means we don't need to clear this cache at all, as the lookup for a relative address into a given elf file will always return the same data. This greatly improves the performance under some situations where the cache is cleared often. For one of my files, it drops the time from 55s down to 18s. Additionally, this patch paves the way to share (parts of) this cache in PerfUnwind. Most of its contents are not PID specific anymore. Change-Id: I79616fbb5c45a2543845df2d05d9936e49401627 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Add support for PERF_RECORD_SWITCH events and forward them to clientsMilian Wolff2019-05-024-3/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This enables accurate off-CPU profiling based on context switch data. When perf data is recorded via perf record --switch-events ... then the switch in/out events of every thread is included. The time differences allow us to measure how long a thread was off-CPU. Paired with call stacks recorded for the switch-out event, we can attribute the wait time to individual source code locations, i.e.: perf record --switch-events -e sched:sched_switch This then gives us event data such as: lab_mandelbrot_ 6974 [003] 1888.320564: sched:sched_switch: lab_mandelbrot_:6974 [120] S ==> swapper/3:0 [120] 857516 __sched_text_start (/lib/modules/4.12.3-1-ARCH/build/vmlinux) 857b0d schedule (/lib/modules/4.12.3-1-ARCH/build/vmlinux) 85bdc7 schedule_hrtimeout_range_clock (/lib/modules/4.12.3-1-ARCH/build/vmlinux) 85bdf3 schedule_hrtimeout_range (/lib/modules/4.12.3-1-ARCH/build/vmlinux) 434d85 poll_schedule_timeout (/lib/modules/4.12.3-1-ARCH/build/vmlinux) 436150 do_sys_poll (/lib/modules/4.12.3-1-ARCH/build/vmlinux) 436aa0 sys_poll (/lib/modules/4.12.3-1-ARCH/build/vmlinux) 85d0f7 entry_SYSCALL_64_fastpath (/lib/modules/4.12.3-1-ARCH/build/vmlinux) e3e80 __poll_nocancel (/usr/lib/libc-2.25.so) ... lab_mandelbrot_ 6974 1888.320568: PERF_RECORD_SWITCH OUT lab_mandelbrot_ 6974 1888.320593: PERF_RECORD_SWITCH IN Note how you don't get the sched:sched_switch trace event for the switch-in event, as that happens in the context of another process. Additionally to enabling such kind of off-CPU profiling, we can use the switch events for creating a cpu usage timeline, similar to what other tools offer. I.e. a thread that is scheduled consumes one CPU for the given timespan. Paired with other threads and based on the zoom level of the timeline, this then leads to fractional CPU loads. Change-Id: Ia2f4d07692e68a4c20244be6327791c6ceaed85c Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Add manual symbol resolution for addresses in the plt sectionMilian Wolff2019-05-021-7/+146
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a workaround until we get this functionality into elfutils upstream. The symbols are looked up manually, which requires quite some jumping around the ELF file: - first map the address into the module space - check whether the address lies in the .plt section (by name) - if so, find the index of the .plt entry we just found - then find the dynamic section and dynamic string table - in the dynamic section, find the address of the GOT via DT_PLTGOT - find the section containing the GOT address, this is the GOT - find the GOT entry corresponding to the .plt index, offset by 2 - find the REL/RELA section containing the entry matching the address of the GOT entry - find the (dynamic) symbol for the REL/RELA entry - find the string for that symbol, demangle it, and append @plt Change-Id: I67d05f1c728b943317853bb98fa96dba48b58a3c Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Keep address for frames without known symbolMilian Wolff2019-05-021-1/+0
| | | | | | | | | | | | | | Fixes issue where we mapped all .plt entries to the same entry. Change-Id: Iaa5f70455ffd4749793c295b7b79101d479620c8 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Forward information on sampling frequency/period for eventsMilian Wolff2019-05-023-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows us to reconstruct the sample period for data files generated with the following command: perf record -c 100000 ... Without the explicit `-P` flag, the above data file has no periods for the samples. By forwarding the event attribute configuration, clients can reconstruct the data as needed. Change-Id: I098e1f14cc66e97daaceffb288080868693c2d95 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Create fake symbols for addresses in .plt sectionMilian Wolff2019-05-021-0/+35
|/ | | | | | | | | | This is not as good as the fallback that binutils has. But any proper solution should be put into elfutils itself. See also: http://www.mail-archive.com/elfutils-devel@sourceware.org/msg00019.html Change-Id: Ief4d6450f97f0e25874a6163f442a8c1c257748e Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* Bump version number to 4.9Ulf Hermann2018-12-201-1/+1
| | | | | Change-Id: Ic85106d7069e55ccb2ab887d5b308b23807920a8 Reviewed-by: Eike Ziller <eike.ziller@qt.io>
* Merge remote-tracking branch 'origin/4.8'Ulf Hermann2018-12-201-1/+1
|\ | | | | | | Change-Id: Ie74691e3153027bc951d3d469adb8ff58ae5c51c
| * Bump the version numberv4.8.2v4.8.14.8Ulf Hermann2018-12-201-1/+1
| | | | | | | | | | | | Change-Id: Ia29e55f9c6e73874de4b48d3dfa07741a1be00ae Reviewed-by: Christian Kandeler <christian.kandeler@qt.io> Reviewed-by: Eike Ziller <eike.ziller@qt.io>