qt-creator/perfparser.git - Parser for perf data files, creates output suitable for the QtCreator plugin

	Commit message (Collapse)	Author	Age	Files	Lines
*	Replace a few qAsConst with std::as_const11.0	hjk	2023-05-19	1	-2/+2
\| \| \| \| \|	Change-Id: I73249bf8b96f10baf488acb4bf654415e18535c0 Reviewed-by: Jarek Kobus <jaroslaw.kobus@qt.io>
*	Skip broken test	Christian Stenger	2023-01-27	1	-0/+1
\| \| \| \| \|	Change-Id: I08c9dd152523758aeee994da4e2316f9484bb30e Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Fix qbs build	Christian Stenger	2023-01-26	3	-4/+34
\| \| \| \| \|	Change-Id: Ic5ca99131d64b3a582d66eeab61072ebec486727 Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
*	Build: Use version-less Qt targets	Eike Ziller	2023-01-25	6	-6/+6
\| \| \| \| \| \| \|	Allows to build against Qt 5 or Qt 6 without special target-mapping hacks Change-Id: I562ba71712257570a865c48002e96598b621f08a Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Fix build with QT_NO_CAST_FROM_ASCII	Eike Ziller	2023-01-25	1	-1/+1
\| \| \| \| \| \| \|	Amends 813e5fa8cad97eb1af227bf8bdcd60d7cd8bffa1 Change-Id: I4b936d5c1a41c20ef30595f80210e85ccab27e2f Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Fix compile with -DQT_NO_CAST_FROM_ASCII -DQT_NO_CAST_TO_ASCII	Milian Wolff	2022-12-08	1	-33/+42
\| \| \| \| \|	Change-Id: If53db019f7855128fa705b1f9bc344b4d78dcdc8 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Merge "Merge remote-tracking branch 'origin/9.0'"	The Qt Project	2022-12-05	2	-6/+5
\|\
\| *	Merge remote-tracking branch 'origin/9.0'	Ulf Hermann	2022-12-05	2	-6/+5
\| \|\ \| \| \| \| \| \| \| \| \|	Change-Id: I12d510a4c4166a3938c51c7e2cbcd698903c09a6
\| \| *	Make findDebugInfoFile() accessible from the test9.0	Ulf Hermann	2022-12-05	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If it's static in perfsymboltable.cpp's file scope, we cannot use it from anywhere else. Change-Id: I60ac203120b7c88feff2acb26b224a8761469bf8 Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
\| \| *	Fix compile with QStringBuilder	Milian Wolff	2022-12-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change-Id: I66c034497e23d9a92d779c9ade85e51d49b71fa9 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io> Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
* \| \|	Fix -Wc++17-attribute-extensions	Milian Wolff	2022-12-05	1	-1/+1
\|/ / \| \| \| \| \| \| \| \| \| \| \| \|	[[maybe_unused]] is a C++17 attribute, use Q_DECL_UNUSED instead. Change-Id: I41216648f322c0ff30dda687fa1fab81a8d39ab9 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Fix -Wclazy-old-style-connect	Milian Wolff	2022-12-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Change-Id: I017424fb0c948a3566edc473ce65d52ac19dd8ac Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Silence false-positive clazy-lambda-in-connect	Milian Wolff	2022-12-05	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	Change-Id: Iba8591013ba7f193a318676370529de55e19fa4c Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Fix clazy-range-loop-detach	Milian Wolff	2022-12-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	Change-Id: I72ac1befe4601b90c38cade89f748a270d997e1f Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
* \|	Update expected testdata	Milian Wolff	2022-12-03	5	-0/+0
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I forgot to do that in the previous patch that fixed the path resolution, leading to errors in the test such as: ``` - 2011a4 2011a4 ../sysdeps/x86_64/start.S:115:0 0 0 + 2011a4 2011a4 /build/glibc/src/glibc/csu/../sysdeps/x86_64/start.S:115:0 0 0 ``` Note that the new behavior is correct and desired. Change-Id: I0785c81f6e111a4e617e79c6e54c94996ab7fc7d Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Add missing '::' when hitting cached scope names	Milian Wolff	2022-06-10	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes C++ symbol names in some situations as shown by the changes to the expected test data. Because the code to build the scope names is complex, I added a longer comment in the hope that this better explains the behavior and logic of this code. The test e.g. now has this changed behavior: ``` - 201650 201700 /usr/include/c++/12.1.0/bits/shared_ptr_base.h:611:7 201650 dc std_Sp_counted_ptr_inplace<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<main(int, char)::<lambda()> > >, double>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>_M_dispose parallel_static_gcc /home/milian/projects/kdab/rnd/hotspot/3rdparty/perfparser/tests/manual/clients/parallel_static_gcc + 201650 201700 /usr/include/c++/12.1.0/bits/shared_ptr_base.h:611:7 201650 dc std::_Sp_counted_ptr_inplace<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<main(int, char)::<lambda()> > >, double>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose parallel_static_gcc /home/milian/projects/kdab/rnd/hotspot/3rdparty/perfparser/tests/manual/clients/parallel_static_gcc ``` Change-Id: Iaa82add2c878796890decb4365b3ca783b46f355 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Add parallel_static_gcc and basic test coverage	Milian Wolff	2022-06-10	6	-7/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a somewhat more elaborate example which uses multiple threads, lambdas and std::async which has the tendency to stress our dwarf debug symbol resolution code. More on that in a follow-up patch. For now, this test only uncovered an issue in PerfParserTestClient::convertToText, where we iterated over a QHash to generate text output, which is not going to produce stable results. Instead, we now convert to a stable QMap first and output that. Furthermore, the test harness is updated to also allow us to test never version than 0.5 that we got in the past, i.e. 0.6 is now expected for the new data files I'm adding here. Change-Id: I2de65503b2c853528b301166a5b58a406d34a059 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Use qCompress/qUncompress to reduce the size of the test input files	Milian Wolff	2022-06-10	27	-5078/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All files are now run through qCompress (manually) and then those are tracked in git. When we run the test, the files are uncompressed on the fly to let the rest of the code remain working as before. The resulting folder is now only ~1.3MiB instead of ~5.2MiB before. But obviously developers won't notice as the old data is still included in our git history. But at least for new tests we can prevent such pollutions and keep the gerrit bots happy. Sadly we cannot resort to external tooling for the compression step, so adding new files is a bit tedious but doable. And relying on qCompress means we dont need any other new dependencies. Change-Id: I902a6906f140eed2565df9637cb80cf464143b80 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Unbreak tst_perfdata when run on more modern systems	Milian Wolff	2022-06-08	2	-68/+82
\| \| \| \| \| \| \| \| \| \| \|	Some of the old demangling errors apparently got fixed. Replace the raw mangled names with the expected demangled names and replace them in the actual text to let the tests pass on both, modern and old systems. Change-Id: Ibe4ce4237da859d915319c0e5ca1cdf0b0fc7b93 Reviewed-by: Christian Kandeler <christian.kandeler@qt.io> Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	add unit test for finding debug symbols	Lieven Hey	2022-06-08	7	-2/+138
\| \| \| \| \|	Change-Id: I285127ea9abd41f0aa50333e49ad4174d973f437 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	add support for d symbol demangling	Lieven Hey	2022-06-08	2	-2/+6
\| \| \| \| \| \| \| \|	I added a system that simplifies adding new demanglers and provides a fast path when demangling symbols Change-Id: Ie5ca43632b53e41c0a4214772193af09ca4593cc Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Fix compiler warnings about signedness6.0	Christian Kandeler	2021-08-18	1	-2/+2
\| \| \| \| \|	Change-Id: I7578e582ac7fd3dd2012f783f273080ec2c2b18b Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Add LBR unwinding method proper processing for Disassembly	Darya Knysh	2021-03-30	3	-0/+748
\| \| \| \| \| \| \| \| \| \| \|	There are some required frames not looked up for LBR unwinding method. Added traversal the deepest frames for LBR by callchain to compute then Disassembly events costs within function/method properly. Added test into perfdata/vector_static_gcc for perf.lbr.data recorded with LBR. Change-Id: Ie6413415c573e659505a4715978c65fce135d979 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Add actual file path computation for symbols	Darya Knysh	2020-12-15	2	-1/+2
\| \| \| \| \| \| \| \|	Actual file path can differ from perf recorded. Compute and pass it to hotspot through Symbol struct field actualPath. Change-Id: I556035234cbcffa42497bf02e225d63565e4a0bf Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Add location's relative address computation	Darya Knysh	2020-12-15	6	-3240/+3373
\| \| \| \| \| \| \| \| \|	It is equal to symbol start added by offset inside function. Disassembler can then use them to compute and show events costs locations within function by instructions. Added stream output. Change-Id: Iba32e1764633d7ffc3f0f36088525ed7a3d1c9d0 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Add Disassembler functionality	Darya Knysh	2020-12-15	7	-3240/+3244
\| \| \| \| \| \| \| \| \|	Added start address and size into Symbol and their stream output. A disassembler can then use the relative address and size of a symbol to find the instructions that belong to a function. Change-Id: I96709ca380d0e58cd5cf5a8cc87116147b2754d6 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Allow zero sized symbols in the address cache	dknysh	2020-10-07	3	-3/+6
\| \| \| \| \| \| \| \| \| \| \|	Apparently there are situations where the compiler generates entries in the symbol table with zero size which are still referenced by some callstacks. After commit d8d56b7e we ended up losing the symbol names in such cases. This patch fixes it again and restores the symbol names for such zero-size symbols. Change-Id: If98f68626ab4251ccfed89d791ebd333f6a6a60a Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Transmit the number of lost events in the LostDefinition4.14	Milian Wolff	2020-09-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This allows a more detailed report on the GUI side then. Because the count has a quint64 type, we now use a QVariant to store the task event payload. To reduce the padding overhead, the struct is slightly reordered. Change-Id: I01d16da2ba4d3df9f32d6ae53bcff120355eb2c9 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Fix compiling tests with QMake	Milian Wolff	2020-09-29	2	-4/+10
\| \| \| \| \|	Change-Id: Iddaf07e55eb777d53b9ed992b496939ef93af07a Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	tst_addresscache: Fix qbs build	Christian Kandeler	2020-09-25	1	-2/+6
\| \| \| \| \| \| \| \|	While we're at it, also consider the ELFUTILS_INSTALL_DIR environment variable. Change-Id: Ifeb5cc7df6e29426633d27a840185ba67ed838b6 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Fix autotest build for qbs	Christian Kandeler	2020-09-23	1	-0/+2
\| \| \| \| \|	Change-Id: I9dc6ab44596244fb342495c5d4a5e719e9b6c26b Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Build a sorted symbol map once and use it for lookups	Milian Wolff	2020-09-11	5	-272/+256
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Apparently the symtab isn't necessarily sorted. That means every call to dwfl_module_addrinfo we do may potentially end up iterating over many symbol entries to find a match. For libxul.so, which contains roughly one million symbols, this can be exceedingly slow. Now, we instead just iterate over all symbols once and store them in a sorted array for quick lookups later on. Then when we lookup a symbol, we just need to demangle it on-demand. I believe the following numbers speak for themselves. Both are for a 1.1GB perf.data file profiling firefox with debug symbols in libxul. A hefty 10x speedup! Before: ``` 592.765,37 msec task-clock:u # 0,999 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 587.913 page-faults:u # 0,992 K/sec 2.610.836.174.604 cycles:u # 4,405 GHz (83,33%) 9.249.001.490 stalled-cycles-frontend:u # 0,35% frontend cycles idle (83,33%) 188.323.380.515 stalled-cycles-backend:u # 7,21% backend cycles idle (83,33%) 6.294.821.871.279 instructions:u # 2,41 insn per cycle # 0,03 stalled cycles per insn (83,33%) 1.593.493.508.805 branches:u # 2688,236 M/sec (83,33%) 1.613.875.121 branch-misses:u # 0,10% of all branches (83,34%) 593,078170383 seconds time elapsed 589,590379000 seconds user 1,591781000 seconds sys ``` After: ``` 57.292,74 msec task-clock:u # 0,999 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 598.808 page-faults:u # 0,010 M/sec 246.209.111.444 cycles:u # 4,297 GHz (83,34%) 8.990.996.482 stalled-cycles-frontend:u # 3,65% frontend cycles idle (83,33%) 52.443.604.272 stalled-cycles-backend:u # 21,30% backend cycles idle (83,34%) 583.136.689.772 instructions:u # 2,37 insn per cycle # 0,09 stalled cycles per insn (83,33%) 150.053.278.261 branches:u # 2619,063 M/sec (83,33%) 833.143.959 branch-misses:u # 0,56% of all branches (83,34%) 57,370841100 seconds time elapsed 55,799767000 seconds user 1,291568000 seconds sys ``` Note that this patch also uncovers some broken offset computations. Checking the offsets manually with addr2line indicates that the new offsets we report now are better than the old ones. At least for the cases I compared, e.g.: ``` $ addr2line -C -i -f -e .../fork -a 255800 22c800 0x0000000000255800 __vfwprintf_internal fork.c:? 0x000000000022c800 __cos_fma ??:? $ addr2line -C -i -f -e .../fork -a 252585 229585 0x0000000000252585 printf_positional fork.c:? 0x0000000000229585 main ??:? $ addr2line -C -i -f -e .../vector_static_gcc_v9.1.0 -a 45d3e0 0x000000000045d3e0 __munmap crtstuff.c:? ``` Then, we now resolve symbols like binutils, i.e. we pick the first symbol we find and don't skip weak symbols like eu-addr2line seems to be doing. I.e. for this: ``` 0000000000417a40 w F .text 0000000000000074 hypot 0000000000417a40 w F .text 0000000000000074 hypotf64 0000000000417a40 w F .text 0000000000000074 hypotf32x 0000000000417a40 g F .text 0000000000000074 __hypot ``` We used to get `__hypot`, but now we get `hypot`. I think this is just as good, and as I said - it's also what you'd get from binutils with `addr2line`: ``` $ addr2line -C -i -f -e vector_static_gcc_v9.1.0 -a 417a40 0x0000000000418480 hypot ??:? $ eu-addr2line -C -i -f -e vector_static_gcc_v9.1.0 -a 417a40 0x0000000000418480 __hypot ??:0 ``` Initially, I thought about just skipping all weak symbols, but that's not a feasible approach. There are some symbols that are weak and not overridden by a non-weak symbol, like this one: ``` $ objdump -C -t .../vector_static_gcc_v9.1.0 \| grep 401c70 0000000000401c70 w F .text 0000000000000162 void std::vector<double, std::allocator<double> >::_M_realloc_insert<double>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double&&) ``` And that one even contains a bunch of inlined frames, so we definitely want to keep that in. We could potentially pass that information along and then implement a custom logic to prefer non-weak symbols. Quite frankly, I don't think that effort is worth it. Change-Id: Ic91764aaab36e77be1c4df4a32d4ac2b4c28e7e0 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Show the diff between files when the perfdata test fails	Milian Wolff	2020-09-11	1	-0/+7
\| \| \| \| \| \| \|	Only works when the diff program is available on the system. Change-Id: Id4cd5fe96a1a10b03153900600b3fcb43f755100 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Improve support for applications that rely on fork for parallelism	Milian Wolff	2020-09-11	7	-1/+1386
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A forked process inherits the elfmap of the parent, thus we need to copy that data when we encounter a fork event. This is complicated by the fact that the task events and mmap events need to be sorted first. Thus, we need to find the fork event when we deplete the buffers after sorting and then initialize the child process elf map with the one from the parent process. Furthermore, pass the ppid through the ThreadStart event, to allow client applications to inherit the comm for the newly created process. Fixes: https://github.com/KDAB/hotspot/issues/241 Change-Id: I5de13644e12def6704c5f622428a815fd87d2af4 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Add support for zstd decompression	Milian Wolff	2020-09-11	3	-2/+667
\| \| \| \| \| \| \| \| \| \| \|	When compiled with HAVE_ZSTD=1, transparently decompress perf records contained in PERF_RECORD_COMPRESSED and parse those then. This way, we can finally open data files recorded with `perf record -z`, which are often two orders of magnitude smaller for the common `--call-graph dwarf` case. Change-Id: Ic26f049b955b20038b947d03c7ff1c6c5eb22ba3 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Generalize TestPerfData::testFiles	Milian Wolff	2020-09-09	3	-11/+14
\| \| \| \| \| \| \| \| \|	This patch allows us to more easily add new data files that reuse the same binaries we have already added to our test data, instead of requiring a single copy per directory. Change-Id: Ia635f6d5444a4b92e2a4a684d9c44bce61ad017c Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Update expected results for clang symbols	Milian Wolff	2020-09-09	2	-15/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Apparently newer elfutils or demangling contains a fix for the weird complex symbols encountered in clang-compiled binaries. Now, the symbol is better readable: Before: `doublecomplex ` After: `double _Complex` Update the testdata accordingly, instead of failing. To keep backwards compatibility, replace the old form to the new form in the actual file output. Change-Id: I49bc956f5f2032ae7d71c59e7d6c82bc65d81e81 Reviewed-by: Christian Kandeler <christian.kandeler@qt.io> Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Skip vector_static_clang variant of perfdata test	Ulf Hermann	2020-07-15	1	-0/+2
\| \| \| \| \| \| \|	The results seem to differ depending on platform. Change-Id: I18ca22ffb7e2ead988680963fe39ac7ea9068430 Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
*	Fix Qt 5.15 deprecation warnings	Christian Kandeler	2020-06-26	1	-0/+2
\| \| \| \| \|	Change-Id: I9df420d6ad46249ce6f0091b159dc56a563e93b8 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Fix deprecation warnings	Christian Kandeler	2020-06-16	1	-2/+2
\| \| \| \| \|	Change-Id: I22539a0d5c435649bc1056bc0406583742a5cb23 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Fix TestPerfData::testTracingData when run with older Qt	Milian Wolff	2020-06-15	1	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	While building the hotspot AppImage, the test is failing as the expected messages where not found. Indeed, the format is slightly different: PerfUnwind::ErrorCode(MissingElfFile): Could not find ELF file ... Fix this by using a regex substring match instead. Change-Id: Ida4696014e2b631760fb7b5f4a41d8cae1040762 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Allow address cache lookup for ElfInfo with pgoff / baseAddr	Milian Wolff	2020-06-15	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	I have encountered cases where the ElfInfo we pass to the address cache is the pgoff-mmapped one. In that case, we need to take the baseAddr to compute the relative address, not the absolute address where the offsetted elf was mapped. In my case, the address was actually outside of the pgoff-mmapped one and within a different section apparently, which got handled properly by elfutils but our caching didn't handle this yet - we just asserted and failed. Change-Id: I2cd9d2cebbd60f00353ecbf413e020783374769e Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Unbreak test in debug build	Milian Wolff	2020-06-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Asking for an address that's not within the range specified by the ElfInfo leads to an assert. Don't do that, by giving both Elf maps the same addresses. The cache operates on relative addresses internally anyways, so the absolute addresses are irrelevant. Change-Id: I33afdc762ca74d1ec4243420e4bc886aa4820581 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Fix symbol resolution for pgoff = 0 in executable mapping	Milian Wolff	2020-06-15	1	-11/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we encounter multiple mmap events for a DSO we only used the base address when the following mmaps had pgoff != 0. Apparently that is not always a valid assumption, for example I have run into: ``` cpp-inlining 34357 1865.127721: PERF_RECORD_MMAP2 34357/34357: [0x5605a604e000(0x5000) @ 0 08:04 19007514 1373382085]: r--p ...hotspot/tests/test-clients/cpp-inlining/cpp-inlining cpp-inlining 34357 1865.127728: PERF_RECORD_MMAP2 34357/34357: [0x5605a604f000(0x1000) @ 0 08:04 19007514 1373382085]: r-xp ...hotspot/tests/test-clients/cpp-inlining/cpp-inlining cpp-inlining 34357 1865.127732: PERF_RECORD_MMAP2 34357/34357: [0x5605a6050000(0x2000) @ 0 08:04 19007514 1373382085]: rw-p ...hotspot/tests/test-clients/cpp-inlining/cpp-inlining cpp-inlining 34357 1865.127734: PERF_RECORD_MMAP2 34357/34357: [0x5605a6052000(0x1000) @ 0x1000 08:04 19007514 1373382085]: rw-p ...hotspot/tests/test-clients/cpp-inlining/cpp-inlining ``` Here, the executable part corresponds to the second mmap event. Note that it has pgoff = 0, so we tried to use that directly which means that we use its address 0x5605a604f000 as base. For reporting to elfutils. Then, when we try to resolve the sybol at e.g. 0x5605a604fd72 we didn't find anything. This can be confirmed by {eu-,}addr2line by using the computed difference 0xd72 - nothing can be found there. Instead, the base address 0x5605a604e000 from the very first mmap event has to be used, yielding a difference of 0x1d72 which does show symbols again. Change-Id: Icaa3db310237c6f616dc23659a65e13dd5ff017b Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Fix warnings about type mismatch	Christian Kandeler	2020-04-23	1	-2/+2
\| \| \| \| \|	Change-Id: I5c98b0f427d846d9e5cc987b27fb963176425c5d Reviewed-by: Christian Stenger <christian.stenger@qt.io>
*	CMake Build: add build support with CMake	Cristian Adam	2020-01-27	9	-0/+40
\| \| \| \| \|	Change-Id: I9ec73226ba0309f244038708cb85d2ae9f3aab30 Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Introduce per-DSO cache for symbol lookup via dwfl_module_addrinfo	Milian Wolff	2020-01-09	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The symbol table isn't necessarily sorted, and thus repeated lookups in there can be expensive when a DSO has many entries in its symtab. For example, the librustc_driver from rustc 1.40.0 has about 202594 symbols. A single call to dwfl_module_addrinfo can take milliseconds on my laptop. Every time we get a sample at a so far unknown address, we have to find the corresponding symbol. So we called this function a lot, which can add up to a significant amount of time. Now, we cache the symbol name and its offset and size information in a sorted list and try to lookup the symbol there quickly. The impact of this patch on the overall time required to analyze a ~1GB perf.data file for a `cargo build` process (and it's child processes) is huge: before: ``` 447.681,66 msec task-clock:u # 0,989 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 45.214 page-faults:u # 0,101 K/sec 1.272.289.956.854 cycles:u # 2,842 GHz 3.497.255.264.964 instructions:u # 2,75 insn per cycle 863.671.557.196 branches:u # 1929,209 M/sec 2.666.320.642 branch-misses:u # 0,31% of all branches 452,806895428 seconds time elapsed 441,996666000 seconds user 2,557237000 seconds sys ``` after: ``` 63.770,08 msec task-clock:u # 0,995 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 35.102 page-faults:u # 0,550 K/sec 191.267.750.628 cycles:u # 2,999 GHz 501.316.536.714 instructions:u # 2,62 insn per cycle 122.234.405.333 branches:u # 1916,799 M/sec 443.671.470 branch-misses:u # 0,36% of all branches 64,063443896 seconds time elapsed 62,188041000 seconds user 1,136533000 seconds sys ``` That means we are now roughly 7x faster than before. Fixes: https://github.com/KDAB/hotspot/issues/225 Change-Id: Ib7dbc800c9372044a847de68a8459dd7f7b0d3da Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Share per-DSO address cache across processes	Milian Wolff	2020-01-09	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we profile a multi-process ensemble, it will often happen that we encounter samples at the relative address of a DSO. In such cases, we can leverage a central cache to store the information, instead of recomputing the same data for every process. As an example, I wrote a shell script that runs the same process four times in parallel. When I parse the resulting perf.data file, the perf stat results are as follows: before: ``` Performance counter stats for '/home/milian/projects/compiled/other/lib/libexec/hotspot-perfparser --input ./perf.data --output /dev/null': 4.240,50 msec task-clock:u # 0,956 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 17.389 page-faults:u # 0,004 M/sec 11.195.771.907 cycles:u # 2,640 GHz 26.585.168.652 instructions:u # 2,37 insn per cycle 6.234.491.027 branches:u # 1470,227 M/sec 35.149.387 branch-misses:u # 0,56% of all branches 4,435152034 seconds time elapsed 3,732758000 seconds user 0,490148000 seconds sys ``` after: ``` Performance counter stats for '/home/milian/projects/compiled/other/lib/libexec/hotspot-perfparser --input ./perf.data --output /dev/null': 4.160,90 msec task-clock:u # 0,979 CPUs utilized 0 context-switches:u # 0,000 K/sec 0 cpu-migrations:u # 0,000 K/sec 15.476 page-faults:u # 0,004 M/sec 10.635.798.451 cycles:u # 2,556 GHz 16.616.035.720 instructions:u # 1,56 insn per cycle 3.838.148.777 branches:u # 922,433 M/sec 24.902.558 branch-misses:u # 0,65% of all branches 4,249408917 seconds time elapsed 3,612442000 seconds user 0,533933000 seconds sys ``` Note that the overall elapsed time doesn't change that much here, but the amount of instructions required is massively reduced. I bet there are other situations where this patch will bring a more tangible improvement to the overall time requirement. Change-Id: I4531ec648af40dd44b9e4290fab7bbd2a89609da Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Fix build with namespaced Qt	Christian Stenger	2019-10-22	1	-0/+2
\| \| \| \| \| \| \|	Broke with 5753a53b. Change-Id: Ib6d6132a4cf611faa06143a1e26924bbf21a6a0d Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
*	Build fully qualified identifiers for inlined C++ subroutines	Milian Wolff	2019-10-02	8	-2/+2274
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This feels hacky, but it seems to work! We finally get proper symbols for Qt signal/slot functors and other inlined lambdas etc. The new test converts a perf.data for a statically-build test application into textual form and then verifies that it matches what we want. Sadly, QTestLib doesn't provide an easy way to diff, but this is good enough I guess. Esp. note how we write the actual text to disk too, so if something fails it's easy to run diff on the command line if needed. In checking that this patch actually helps, I noticed that only the test binary compiled with gcc produces symbol names that are unexpected, whereas the clang-compiled binary produces good results even without this patch! This test also uncovers two bugs in the DWARF emitted by clang and gcc, see the following upstream bug reports for more information: GCC is missing inline frames for sin/cos calls: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91929 Clang produces invalid mangled DW_AT_linkage_name values for sin/cos: https://bugs.llvm.org/show_bug.cgi?id=43491 Change-Id: I0acbf57d191f09383c60bdab9e6664f9a74db42f Fixes: https://github.com/KDAB/hotspot/issues/210 Reviewed-by: Paul Wicking <paul.wicking@qt.io> Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>