summaryrefslogtreecommitdiffstats
path: root/app
Commit message (Collapse)AuthorAgeFilesLines
* Fix build with namespaced QtChristian Stenger2020-02-171-1/+3
| | | | | Change-Id: I796ad4cb92de828d999f35b55ca8d94879230d2f Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Bump application version4.12Ulf Hermann2020-02-131-1/+1
| | | | | Change-Id: I6e9453a15cdd71e07c52677c707e6a02737e70a5 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* Fix CMake buildEike Ziller2020-02-131-2/+5
| | | | | | | | Fix that compilation of the app did not find Qt headers pulled in via library headers Change-Id: I21c5104ae9ae58b7c5f55fe16e6723a9df9ebce8 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* CMake Build: add build support with CMakeCristian Adam2020-01-271-0/+27
| | | | | Change-Id: I9ec73226ba0309f244038708cb85d2ae9f3aab30 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Introduce per-DSO cache for symbol lookup via dwfl_module_addrinfoMilian Wolff2020-01-093-6/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The symbol table isn't necessarily sorted, and thus repeated lookups in there can be expensive when a DSO has many entries in its symtab. For example, the librustc_driver from rustc 1.40.0 has about 202594 symbols. A single call to dwfl_module_addrinfo can take milliseconds on my laptop. Every time we get a sample at a so far unknown address, we have to find the corresponding symbol. So we called this function a lot, which can add up to a significant amount of time. Now, we cache the symbol name and its offset and size information in a sorted list and try to lookup the symbol there quickly. The impact of this patch on the overall time required to analyze a ~1GB perf.data file for a `cargo build` process (and it's child processes) is huge: before: ``` 447.681,66 msec task-clock:u # 0,989 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 45.214 page-faults:u # 0,101 K/sec 1.272.289.956.854 cycles:u # 2,842 GHz 3.497.255.264.964 instructions:u # 2,75 insn per cycle 863.671.557.196 branches:u # 1929,209 M/sec 2.666.320.642 branch-misses:u # 0,31% of all branches 452,806895428 seconds time elapsed 441,996666000 seconds user 2,557237000 seconds sys ``` after: ``` 63.770,08 msec task-clock:u # 0,995 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 35.102 page-faults:u # 0,550 K/sec 191.267.750.628 cycles:u # 2,999 GHz 501.316.536.714 instructions:u # 2,62 insn per cycle 122.234.405.333 branches:u # 1916,799 M/sec 443.671.470 branch-misses:u # 0,36% of all branches 64,063443896 seconds time elapsed 62,188041000 seconds user 1,136533000 seconds sys ``` That means we are now roughly 7x faster than before. Fixes: https://github.com/KDAB/hotspot/issues/225 Change-Id: Ib7dbc800c9372044a847de68a8459dd7f7b0d3da Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Share per-DSO address cache across processesMilian Wolff2020-01-095-21/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we profile a multi-process ensemble, it will often happen that we encounter samples at the relative address of a DSO. In such cases, we can leverage a central cache to store the information, instead of recomputing the same data for every process. As an example, I wrote a shell script that runs the same process four times in parallel. When I parse the resulting perf.data file, the perf stat results are as follows: before: ``` Performance counter stats for '/home/milian/projects/compiled/other/lib/libexec/hotspot-perfparser --input ./perf.data --output /dev/null': 4.240,50 msec task-clock:u # 0,956 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 17.389 page-faults:u # 0,004 M/sec 11.195.771.907 cycles:u # 2,640 GHz 26.585.168.652 instructions:u # 2,37 insn per cycle 6.234.491.027 branches:u # 1470,227 M/sec 35.149.387 branch-misses:u # 0,56% of all branches 4,435152034 seconds time elapsed 3,732758000 seconds user 0,490148000 seconds sys ``` after: ``` Performance counter stats for '/home/milian/projects/compiled/other/lib/libexec/hotspot-perfparser --input ./perf.data --output /dev/null': 4.160,90 msec task-clock:u # 0,979 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 15.476 page-faults:u # 0,004 M/sec 10.635.798.451 cycles:u # 2,556 GHz 16.616.035.720 instructions:u # 1,56 insn per cycle 3.838.148.777 branches:u # 922,433 M/sec 24.902.558 branch-misses:u # 0,65% of all branches 4,249408917 seconds time elapsed 3,612442000 seconds user 0,533933000 seconds sys ``` Note that the overall elapsed time doesn't change that much here, but the amount of instructions required is massively reduced. I bet there are other situations where this patch will bring a more tangible improvement to the overall time requirement. Change-Id: I4531ec648af40dd44b9e4290fab7bbd2a89609da Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Only try the dwfl_addrmodule pgoff fallback for valid ElfInfoMilian Wolff2020-01-091-1/+1
| | | | | | | | When the ElfInfo is invalid, addr - pgoff is 0 and then the fallback will fail anyways, as no module is mapped at that address ever. Change-Id: I04bc372a2e29888b9aa9acf16c74cd27cfce9046 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Build fully qualified identifiers for inlined C++ subroutinesMilian Wolff2019-10-021-14/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This feels hacky, but it seems to work! We finally get proper symbols for Qt signal/slot functors and other inlined lambdas etc. The new test converts a perf.data for a statically-build test application into textual form and then verifies that it matches what we want. Sadly, QTestLib doesn't provide an easy way to diff, but this is good enough I guess. Esp. note how we write the actual text to disk too, so if something fails it's easy to run diff on the command line if needed. In checking that this patch actually helps, I noticed that only the test binary compiled with gcc produces symbol names that are unexpected, whereas the clang-compiled binary produces good results even without this patch! This test also uncovers two bugs in the DWARF emitted by clang and gcc, see the following upstream bug reports for more information: GCC is missing inline frames for sin/cos calls: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91929 Clang produces invalid mangled DW_AT_linkage_name values for sin/cos: https://bugs.llvm.org/show_bug.cgi?id=43491 Change-Id: I0acbf57d191f09383c60bdab9e6664f9a74db42f Fixes: https://github.com/KDAB/hotspot/issues/210 Reviewed-by: Paul Wicking <paul.wicking@qt.io> Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Merge remote-tracking branch 'origin/4.11'Ulf Hermann2019-09-301-1/+1
|\ | | | | | | Change-Id: I67b5de6323f1cc7d1cc46513b1502110175b2899
| * Bump application version to 4.11v4.11.0-beta1Ulf Hermann2019-09-301-1/+1
| | | | | | | | | | Change-Id: I222cf66c4555c99bdb56e48fd1383c9ebfb7def7 Reviewed-by: Eike Ziller <eike.ziller@qt.io>
* | Mark CuDieRange as movable typeMilian Wolff2019-09-301-1/+3
| | | | | | | | | | Change-Id: I11984e87469b0b13caae3a0e0c9258d93d21193a Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* | Properly map discontiguous CU DIE rangesMilian Wolff2019-09-301-28/+11
| | | | | | | | | | | | | | | | | | Not all CU DIEs have a contiguous range, esp. when compiled with -ffunction-sections or when the linker deduplicates equal functions across compilation units. Change-Id: Ie22939e550b4b502a16c6e266740a885407f50f1 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* | Simplify dwfl_module_addrdie fall-back when .debug_aranges is missingMilian Wolff2019-09-262-127/+46
|/ | | | | | | | | | | | | | | | | | | | | | We do not need to recurse the full DIE tree to build the mapping. Instead, it's enough to find the CU DIE and its bias only. To do that, we can simply look at the {low,high} PC value of the CU DIEs. If either of these two values is missing, we look at the dwarf_ranges for the CU DIE and use the min/max PC values of those to set the CU DIE range. In my tests this produces the same results as before. Note how the inline frames etc. are later on found via dwarf_getscopes{,_die}. Also, when we do have a .debug_aranges section, then the call to dwfl_module_addrdie will return us also a CU DIE, not an inner DIE. The simplified code is easier to understand, will consume less memory and should also be faster to run. This patch is based on feedback by David Blakie. Change-Id: I97767e99b2957aa430b65589b32ec66a9479ff7d Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Fix build with namespaced QtChristian Stenger2019-09-021-0/+6
| | | | | Change-Id: I03147ae4a7b17584add02006a5a2281006dbae25 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Fix typo in perfsymboltable.cppUlf Hermann2019-08-301-1/+1
| | | | | | | Amends commit 3508f6297a829027694a8d141945797e4923c748. Change-Id: I3ae9df18f6340171189cb150f4fd82ec891037a3 Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
* Speed up perfparser when DWARF ranges are broken/missing in ELFsMilian Wolff2019-08-162-72/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous approach to handle the broken DWARF emitter from clang was to parse all CUs in the hope to find one that contains the requested address. This was done repeatedly for every address and often had to go through most of the CUs in a binary, which is a costly process. To reproduce, one can do the following: - build perfparser with clang - profile this build of perfparser while parsing some given workload - inspect the profile and notice the large overhead from find_fundie_by_pc Instead, the first time dwfl_module_addrdie fails, we now query for all CUs of an ELF and build a custom range map and use that for lookup. The initial build of this range map is roughly as costly as querying for two addresses using the old slow code path. As such, once we query three or more addresses in a given ELF this new approach is already yielding better performance. Some numbers from my test: Before: Performance counter stats for './perfparser --input perf.data --output /dev/null': 46576.925866 task-clock:u (msec) # 0.999 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 60,868 page-faults:u # 0.001 M/sec 144,506,727,842 cycles:u # 3.103 GHz 472,211,018,102 instructions:u # 3.27 insn per cycle 116,423,898,539 branches:u # 2499.605 M/sec 488,663,345 branch-misses:u # 0.42% of all branches 46.611237448 seconds time elapsed After: Performance counter stats for './perfparser --input perf.data --output /dev/null': 17447.629837 task-clock:u (msec) # 0.995 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 60,142 page-faults:u # 0.003 M/sec 53,150,847,387 cycles:u # 3.046 GHz 158,503,454,039 instructions:u # 2.98 insn per cycle 38,711,860,671 branches:u # 2218.746 M/sec 190,209,418 branch-misses:u # 0.49% of all branches 17.543916999 seconds time elapsed Change-Id: I9ca45ad7c8f77f91d0376f6dcae2f73c6e868404 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Add fallback to find DIE for code generated by clangMilian Wolff2019-08-161-33/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clang's DWARF emitter creates data that is not supported by elfutils. Binutils has a fallback, which backward-cpp implements too. This code here is based on the MIT licensed backward-cpp code. See also: https://sourceware.org/bugzilla/show_bug.cgi?id=21247 https://sourceware.org/ml/elfutils-devel/2017-q2/msg00190.html Before: 7fd4aace6590 libm-2.29.so /usr/lib/libm-2.29.so 560070d37512 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/cmath:343:12 560070d371d0 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 7fd4aab16ee2 7fd4aab16df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so 560070d370fd 560070d370d0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining After: 7fd4aace6590 libm-2.29.so /usr/lib/libm-2.29.so 560070d37512 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/cmath:343:12 560070d3750b /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/cmath:342:0 std::log(long double) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 560070d3750b /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.tcc:3328:-1 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.tcc:3318:0 double std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:179:-1 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:177:0 std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1862:-1 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1857:0 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&, std::uniform_real_distribution<double>::param_type const&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1853:-1 560070d36000 /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/9.1.0/../../../../include/c++/9.1.0/bits/random.h:1852:0 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 560070d36000 ../tests/test-clients/cpp-inlining/main.cpp:39:-1 560070d371d0 ../tests/test-clients/cpp-inlining/main.cpp:33:0 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining 7fd4aab16ee2 7fd4aab16df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so 560070d370fd 560070d370d0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-clang/tests/test-clients/cpp-inlining/cpp-inlining Change-Id: Ie678685595311fae72f713a75d5b29ff959cd05d Fixes: https://github.com/KDAB/hotspot/issues/51 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Better support for inlined frames in backtracesMilian Wolff2019-08-162-49/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the same approach as eu-addr2line to find inline frames. I.e. use dwarf_getscopes_die instead of manually parsing the dwarf tree. This allows us to get full inline backtraces for both, gcc 9 and clang 8. Previously, we only got partial inline backtraces for gcc, and no inline traces for clang at all. similar to how eu-addr2line Before: 560b37713a54 /usr/include/c++/9.1.0/bits/random.h:139:6 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:135:2 std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:147:48 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:146:7 unsigned long std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:349:50 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:347:7 std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713a3b /usr/include/c++/9.1.0/bits/random.tcc:3336:29 560b37713b80 /usr/include/c++/9.1.0/bits/random.h:1852:2 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713b80 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39:19 560b377139a0 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:33:5 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 7f7a30bb7ee2 7f7a30bb7df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so 560b37713cdd 560b37713cb0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining After: 560b37713a54 /usr/include/c++/9.1.0/bits/random.h:139:6 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:135:2 std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:147:48 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:146:7 unsigned long std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul>(unsigned long) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:349:50 560b37713a3b /usr/include/c++/9.1.0/bits/random.h:347:7 std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713a3b /usr/include/c++/9.1.0/bits/random.tcc:3336:29 560b37713000 /usr/include/c++/9.1.0/bits/random.tcc:3318:5 double std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713000 /usr/include/c++/9.1.0/bits/random.h:181:38 560b37713000 /usr/include/c++/9.1.0/bits/random.h:177:2 std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()() cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713000 /usr/include/c++/9.1.0/bits/random.h:1862:19 560b37713000 /usr/include/c++/9.1.0/bits/random.h:1857:2 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&, std::uniform_real_distribution<double>::param_type const&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713000 /usr/include/c++/9.1.0/bits/random.h:1853:51 560b37713b80 /usr/include/c++/9.1.0/bits/random.h:1852:2 double std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >(std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>&) cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 560b37713b80 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39:19 560b377139a0 /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:33:5 main cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining 7f7a30bb7ee2 7f7a30bb7df0 __libc_start_main libc-2.29.so /usr/lib/libc-2.29.so 560b37713cdd 560b37713cb0 _start cpp-inlining /home/milian/projects/kdab/rnd/hotspot/build-debug/tests/test-clients/cpp-inlining/cpp-inlining Change-Id: I5d8d5d3f29f659e092c2270c105ef48eae3d99c4 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Don't warn when parsing tracing data with odd content sizeMilian Wolff2019-08-131-2/+4
| | | | | | | | | | | | | If the content size is 4, we actually parse more than that. Take this into account instead of triggering a misleading warning: QWARN : TestPerfData::testTracingData(stream stats) warning: PerfData::processEvents[../../../../app/perfdata.cpp:197]?[0m: Event not fully parsed 66 4 2836 QWARN : TestPerfData::testTracingData(stream stats) warning: unknown[unknown:0]?[0m: QIODevice::skip (QFile, ":/probe.data.stream"): Called with maxSize < 0 Change-Id: I43aab019a5f34a20e890c87f5d53120e977e293f Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Ensure we always set a valid pid on locations we sendMilian Wolff2019-08-132-0/+2
| | | | | Change-Id: I8dca8eb4cc3aa728f033ff27679ef65b5a2fbee2 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Forward the non-zero pid and tid members of of PerfRecordCommMilian Wolff2019-06-251-0/+5
| | | | | | | | | | Similar to PerfRecordMmap, the PerfSampleId pid and tid are always zero, while the struct-specific pid and tid carry the actual pid and tid. By shadowing the pid and tid, the correct values get forwarded in PerfUnwind::comm. Change-Id: I84ef564076e26f779cb902eed603d0a500dd4825 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Don't try to "fix" partially read events on sequential devices4.10Ulf Hermann2019-05-161-5/+8
| | | | | | | We cannot read the position on those. Change-Id: Ife92c3bc8537353b3c880c53b203d8007656e96f Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
* Drop backwards compatibility for event typesUlf Hermann2019-05-071-4/+0
| | | | | | | | Now that we don't maintain backwards compatibility in either QtCreator or hotspot anymore, we need to drop the old event IDs. Change-Id: If8855aab72576c375146b53ff45d06c7343259cf Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* PerfParser: Fix signedness mismatch on comparisonOrgad Shaneh2019-05-072-5/+5
| | | | | | | Detected by GCC9. Change-Id: I30b1cc0bd91eecc973d2a64d3acd648a486c203e Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* Merge remote-tracking branch 'origin/4.9'Ulf Hermann2019-05-061-2/+6
|\ | | | | | | Change-Id: I08cbfa6e80c8e75f756e22ea57e39c61f8678c49
| * On Windows, set stdin to binary4.9Ulf Hermann2019-02-121-2/+6
| | | | | | | | | | | | | | | | | | Otherwise we receive junk. Apparently there are even ways to close stdin from the outside. Fixes: QTCREATORBUG-21971 Change-Id: I87978ca3d0001026f63fd948db8cd95da674a953 Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
* | Bump the version number to 4.10Ulf Hermann2019-05-031-1/+1
| | | | | | | | | | Change-Id: I72f461596d559347da48e983d2bba7ea2568f50d Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Output all values in hex for better comparability with tools like eu-readelfMilian Wolff2019-05-031-5/+5
| | | | | | | | | | Change-Id: I9c33cc643152675c97522ad5985c03e06ece6a7f Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Validate base mapping before using itMilian Wolff2019-05-031-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent infinite looping when we access a stale base map. This could happen when we encounter bogus mmap lists as happens in https://github.com/KDAB/hotspot/issues/164 Verify that the base map actually corresponds to the expected elf map and only use that then. Otherwise don't use the base map and continue with the original mapping, hoping for the best. While this fixes the stack overflow of the initial bug report, it doesn't solve the fundamental issue of dealing with broken data... We'll have to figure that one out separately. Fixes: https://github.com/KDAB/hotspot/issues/164 Change-Id: Iaebddbfbc891784a7fcc05df47aba761b75cc587 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Add initial MIPS supportLuke Diamand2019-05-032-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Supports perf reports from MIPS targets. MIPS/perf isn't really supported on the mainstream Linux kernel, but there are a number of patches around which can be used to build working perf support: https://lkml.org/lkml/2016/4/1/162 This change just adds the definitions required by hotspot to support this. Change-Id: Ifa569c6e33e743c4d239b1ae0448b28aa026d051 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Check for 0 sized section header plt in fakeSymbolFromSection to prevent div ↵Andrew Somerville2019-05-031-1/+1
| | | | | | | | | | | | | | by 0 Change-Id: Ic0e57b93991c03f9842b74d911697899ca35f7c0 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Always use the first attributes as fallbackMilian Wolff2019-05-031-6/+4
| | | | | | | | | | | | | | | | | | | | This fixes parsing of data files generated with `perf record -s`. In such a case, we do have non-empty attributes but the samples don't reference any attribute by id. Perf uses the first available attribute in such cases, cf. `perf_evlist__id2evsel`. Change-Id: I1e5a9b59cd82dbe0d4eb1d140fb9b1e7768284fa Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Also resolve callchain stored in branch stack, if availableMilian Wolff2019-05-032-16/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a branch stack is available (i.e. `perf record -g lbr`), then resolve the callchain stored therein. In such cases, the fp callchain can potentially contain kernel frames, so still resolve those but skip any user frames therein. The branch stack then contains the user frames: The "to" register is the callee, the "from" register is the caller. That means the callchain can be build by combining the first entry's "to" register (the tail), with all "from" registers. See also `callchain__lbr_callstack_printf` in perf's `session.c` for more information. Change-Id: I0e060e158859eac6c130c073255af87c365679bf Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Try to guess offset base address before reporting elf to dwflMilian Wolff2019-05-031-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | When we record perf data files with `--call-graph [fp,lbr]`, we only see mmap events for the executable code segments. These are usually offset. Without the non-offset mmap event, we would not report anything to dwfl so far. Now, we guess the base address and use that before reporting the elf to dwfl. This seems to work, but could fail once a gap would exist at the start. Change-Id: I9d837e8bed650d6574e84401da67beeddaf7ff57 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Correctly parse BranchFlags in BranchEntryMilian Wolff2019-05-032-0/+12
| | | | | | | | | | | | | | | | Fixes parsing of perf data files with LBR stacks, i.e. files generated with `perf record --call-graph lbr`. Change-Id: I7e75d655e8a09f484fe207298ce44a871c1f4903 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Skip unused data when we fail to fully parse an eventMilian Wolff2019-05-031-0/+8
| | | | | | | | | | | | | | | | Additionally, print a warning. This uncovers an issue in the sample parsing for perf data files with LBR callchains. Change-Id: Ib87e54dc33013974691da155aae2037332bc82b8 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Also check for debug link files next to the executable directlyMilian Wolff2019-05-031-1/+20
| | | | | | | | | | | | | | | | | | | | The old code only checked for the folder layout in the ~/.debug cache, where the files get cached by their build-id. If that does not happen, then we failed to find the debug link file when it lies next to the actual file. This patch fixes that. Change-Id: I7411a6d8803c1787a3c76db77aeec30400208565 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Use build-id file for debug information as fall-backMilian Wolff2019-05-031-0/+6
| | | | | | | | | | | | | | | | | | On older Ubuntu, I sometimes see the case where we are looking for a debug link file that is not cached. In such cases, we should fall-back to the build-id file, which contains the debug information. Change-Id: Icd6d7bc13a9d3d74c3bb237366e04e7ffa6e195e Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Treat [kernel.kallsyms]_text as a special sectionThomas McGuire2019-05-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | This will remove the bogus error message "Could not find ELF file for [kernel.kallsyms]_text". An ELF file for this is not needed, as unwinding in the kernel is treated specially. Fixes KDAB/hotspot#123 Change-Id: I79a31564dd971932aeafaad58715e702b253d600 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Ignore kallsyms mappings with address 0Thomas McGuire2019-05-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | kallsyms files with all addresses being 0 can happen. In this case, ignore the mapping, as otherwise an address would resolve to a random kernel symbol. In addition, now perfparser shows a correct error message about being unable to open the mapping. Fixes KDAB/hotspot#117 Change-Id: Ib7ccf2f405f3a0a2149fa747175fb9feb11dfb07 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Halt perfparser when PERFPARSER_DEBUG_WAIT is setThomas McGuire2019-05-031-1/+14
| | | | | | | | | | | | | | This gives a developer the chance to attach a debugger. Change-Id: I80ca1246e06727f511a326ab87a7fc32c8462fd2 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Support (null) as address in kallsymsMilian Wolff2019-05-031-1/+1
| | | | | | | | | | | | | | | | | | Apparently some kallsyms report the obfuscated address as (null) instead of all-zero. Handle this scenario instead of giving up parsing the file. Change-Id: I108c20d1845933a429ee5cd26217b707f6aac4cc Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Add fallback search in /usr/lib/debug for debug info fileMilian Wolff2019-05-031-7/+22
| | | | | | | | | | | | | | | | | | | | | | This is required to find e.g. the debug file for ld-2.26.so on Ubuntu 17.10. That one can only be found in the /usr/lib/debug folder. It itself resides in /lib/x86_64-linux-gnu/ld-2.26.so. The debuglink is ld-2.26.so. The debug info file is found in /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.26.so. Change-Id: I7c4c67873761a70a7b4f72f5adafea3023b08c12 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Forward the path to binary for a given symbolMilian Wolff2019-05-034-18/+23
| | | | | | | | | | | | | | | | The path is useful for filtering. It enables us to exclude frames that point to system libraries for example. Change-Id: Ic181e8498ffb237727d6176094bd3724a5d6ed0d Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Correctly parse SAMPLE_CPU in PerfSampleIdMilian Wolff2019-05-031-1/+1
| | | | | | | | | | | | | | | | First read the CPU, then the reserved rest/waste. This ensures we get non-zero CPU IDs for switch events. Change-Id: I54c60e8902a1fd3e9ab5dbd63b46734c551deacd Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Forward the CPU on which an event occurredMilian Wolff2019-05-033-7/+15
| | | | | | | | | | | | | | | | This data is only meaningful (i.e. non-zero) when perf record was run with --sample-cpu. Change-Id: I36a7c7d9cba6a7e334ff89aacc62b05392e51b26 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Send task events in time-ordered fashionMilian Wolff2019-05-022-64/+76
| | | | | | | | | | | | | | | | | | | | | | | | So far, only the context switches where buffered and then handled in time-ordered fashion. Now we do this also for Command, ThreadStart, ThreadEnd and Lost events. Additionally, we properly handle these events when no more samples are available, which can happen for applications that basically only sleep and don't trigger any significant CPU load. Change-Id: I4b1c8a1cfc91737a75a48f38dba04d6742f7c3a3 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Do not override previous attributesMilian Wolff2019-05-021-9/+22
| | | | | | | | | | | | | | | | | | | | | | | | The attributes available in the features struct contains more information. We used to overwrite that by the less-interesting data available in the attributes list. Instead of doing this, check whether a given attribute was encountered already. If so, don't report it a second time. Change-Id: I308c3cf7ceeb8e6f0ca33de52ecfc1a63f62477d Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Forward all costs for grouped sampleMilian Wolff2019-05-023-5/+18
| | | | | | | | | | | | | | | | | | | | | | This is required for files recorded with e.g. perf record -e '{cycles,instructions}:uPS' Otherwise, we only get garbadge values for the cycles. Change-Id: I65de37e81392b714c0dd65ddbefad64f4d2c353d Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* | Prevent useless cache invalidationMilian Wolff2019-05-021-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes we encounter mmap events from perf that span a larger region than dwfl will parse. In these situations, we would invalidate the cache and then rebuild it exactly as before - i.e. we would just waste time for no gain. This patch checks whether this is happening and prevents the cache invalidation to speed up the parse process. In one of my cases this helps significantly: Before: Performance counter stats for './perfparser --input perf.data --output /dev/null': 19553.281950 task-clock:u (msec) # 0.998 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 59,997 page-faults:u # 0.003 M/sec 58,567,488,383 cycles:u # 2.995 GHz 158,503,674,356 instructions:u # 2.71 insn per cycle 38,712,268,219 branches:u # 1979.835 M/sec 193,851,125 branch-misses:u # 0.50% of all branches 19.593726259 seconds time elapsed After: Performance counter stats for './perfparser --input perf.data --output /dev/null': 11775.076614 task-clock:u (msec) # 0.997 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 22,747 page-faults:u # 0.002 M/sec 35,463,053,591 cycles:u # 3.012 GHz 94,541,494,541 instructions:u # 2.67 insn per cycle 23,038,811,370 branches:u # 1956.574 M/sec 120,291,747 branch-misses:u # 0.52% of all branches 11.807795955 seconds time elapsed Change-Id: I3efdf5941c5f66cb2d38fecc8ef824c6aef245da Reviewed-by: Milian Wolff <milian.wolff@kdab.com>