| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Change-Id: I73249bf8b96f10baf488acb4bf654415e18535c0
Reviewed-by: Jarek Kobus <jaroslaw.kobus@qt.io>
|
|
|
|
|
| |
Change-Id: I08c9dd152523758aeee994da4e2316f9484bb30e
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: Ic5ca99131d64b3a582d66eeab61072ebec486727
Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
|
|
|
|
|
|
|
| |
Allows to build against Qt 5 or Qt 6 without special target-mapping hacks
Change-Id: I562ba71712257570a865c48002e96598b621f08a
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
| |
Amends 813e5fa8cad97eb1af227bf8bdcd60d7cd8bffa1
Change-Id: I4b936d5c1a41c20ef30595f80210e85ccab27e2f
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: If53db019f7855128fa705b1f9bc344b4d78dcdc8
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|\ |
|
| |\
| | |
| | |
| | | |
Change-Id: I12d510a4c4166a3938c51c7e2cbcd698903c09a6
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If it's static in perfsymboltable.cpp's file scope, we cannot use it
from anywhere else.
Change-Id: I60ac203120b7c88feff2acb26b224a8761469bf8
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Change-Id: I66c034497e23d9a92d779c9ade85e51d49b71fa9
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
Reviewed-by: Milian Wolff <milian.wolff@kdab.com>
|
|/ /
| |
| |
| |
| |
| |
| | |
[[maybe_unused]] is a C++17 attribute, use Q_DECL_UNUSED instead.
Change-Id: I41216648f322c0ff30dda687fa1fab81a8d39ab9
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| | |
Change-Id: I017424fb0c948a3566edc473ce65d52ac19dd8ac
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| | |
Change-Id: Iba8591013ba7f193a318676370529de55e19fa4c
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
| |
| |
| |
| |
| | |
Change-Id: I72ac1befe4601b90c38cade89f748a270d997e1f
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I forgot to do that in the previous patch that fixed the path
resolution, leading to errors in the test such as:
```
- 2011a4 2011a4 ../sysdeps/x86_64/start.S:115:0 0 0
+ 2011a4 2011a4 /build/glibc/src/glibc/csu/../sysdeps/x86_64/start.S:115:0 0 0
```
Note that the new behavior is correct and desired.
Change-Id: I0785c81f6e111a4e617e79c6e54c94996ab7fc7d
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes C++ symbol names in some situations as shown by the changes
to the expected test data. Because the code to build the scope
names is complex, I added a longer comment in the hope that this
better explains the behavior and logic of this code.
The test e.g. now has this changed behavior:
```
- 201650 201700 /usr/include/c++/12.1.0/bits/shared_ptr_base.h:611:7 201650 dc std_Sp_counted_ptr_inplace<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<main(int, char**)::<lambda()> > >, double>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>_M_dispose parallel_static_gcc /home/milian/projects/kdab/rnd/hotspot/3rdparty/perfparser/tests/manual/clients/parallel_static_gcc
+ 201650 201700 /usr/include/c++/12.1.0/bits/shared_ptr_base.h:611:7 201650 dc std::_Sp_counted_ptr_inplace<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<main(int, char**)::<lambda()> > >, double>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose parallel_static_gcc /home/milian/projects/kdab/rnd/hotspot/3rdparty/perfparser/tests/manual/clients/parallel_static_gcc
```
Change-Id: Iaa82add2c878796890decb4365b3ca783b46f355
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a somewhat more elaborate example which uses multiple
threads, lambdas and std::async which has the tendency to stress
our dwarf debug symbol resolution code. More on that in a follow-up
patch. For now, this test only uncovered an issue in
PerfParserTestClient::convertToText, where we iterated over a QHash
to generate text output, which is not going to produce stable results.
Instead, we now convert to a stable QMap first and output that.
Furthermore, the test harness is updated to also allow us to test
never version than 0.5 that we got in the past, i.e. 0.6 is now
expected for the new data files I'm adding here.
Change-Id: I2de65503b2c853528b301166a5b58a406d34a059
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All files are now run through qCompress (manually) and then those
are tracked in git. When we run the test, the files are uncompressed
on the fly to let the rest of the code remain working as before.
The resulting folder is now only ~1.3MiB instead of ~5.2MiB before.
But obviously developers won't notice as the old data is still
included in our git history. But at least for new tests we can prevent
such pollutions and keep the gerrit bots happy.
Sadly we cannot resort to external tooling for the compression step,
so adding new files is a bit tedious but doable. And relying on
qCompress means we dont need any other new dependencies.
Change-Id: I902a6906f140eed2565df9637cb80cf464143b80
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the old demangling errors apparently got fixed. Replace
the raw mangled names with the expected demangled names and
replace them in the actual text to let the tests pass on both,
modern and old systems.
Change-Id: Ibe4ce4237da859d915319c0e5ca1cdf0b0fc7b93
Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: I285127ea9abd41f0aa50333e49ad4174d973f437
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
| |
I added a system that simplifies adding new demanglers and provides a
fast path when demangling symbols
Change-Id: Ie5ca43632b53e41c0a4214772193af09ca4593cc
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: I7578e582ac7fd3dd2012f783f273080ec2c2b18b
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
| |
There are some required frames not looked up for LBR unwinding method.
Added traversal the deepest frames for LBR by callchain
to compute then Disassembly events costs within function/method properly.
Added test into perfdata/vector_static_gcc for perf.lbr.data recorded with LBR.
Change-Id: Ie6413415c573e659505a4715978c65fce135d979
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
| |
Actual file path can differ from perf recorded.
Compute and pass it to hotspot through Symbol struct field actualPath.
Change-Id: I556035234cbcffa42497bf02e225d63565e4a0bf
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
| |
It is equal to symbol start added by offset inside function.
Disassembler can then use them to compute and show events costs locations within function by instructions.
Added stream output.
Change-Id: Iba32e1764633d7ffc3f0f36088525ed7a3d1c9d0
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
| |
Added start address and size into Symbol and their stream output.
A disassembler can then use the relative address and size of a symbol
to find the instructions that belong to a function.
Change-Id: I96709ca380d0e58cd5cf5a8cc87116147b2754d6
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
| |
Apparently there are situations where the compiler generates entries
in the symbol table with zero size which are still referenced by some
callstacks. After commit d8d56b7e we ended up losing the symbol names
in such cases. This patch fixes it again and restores the symbol names
for such zero-size symbols.
Change-Id: If98f68626ab4251ccfed89d791ebd333f6a6a60a
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
| |
This allows a more detailed report on the GUI side then. Because the
count has a quint64 type, we now use a QVariant to store the task
event payload. To reduce the padding overhead, the struct is slightly
reordered.
Change-Id: I01d16da2ba4d3df9f32d6ae53bcff120355eb2c9
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: Iddaf07e55eb777d53b9ed992b496939ef93af07a
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
| |
While we're at it, also consider the ELFUTILS_INSTALL_DIR environment
variable.
Change-Id: Ifeb5cc7df6e29426633d27a840185ba67ed838b6
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: I9dc6ab44596244fb342495c5d4a5e719e9b6c26b
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apparently the symtab isn't necessarily sorted. That means
every call to dwfl_module_addrinfo we do may potentially end
up iterating over many symbol entries to find a match. For
libxul.so, which contains roughly one million symbols, this
can be exceedingly slow.
Now, we instead just iterate over all symbols once and store
them in a sorted array for quick lookups later on. Then when
we lookup a symbol, we just need to demangle it on-demand.
I believe the following numbers speak for themselves. Both
are for a 1.1GB perf.data file profiling firefox with debug
symbols in libxul. A hefty 10x speedup!
Before:
```
592.765,37 msec task-clock:u # 0,999 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
587.913 page-faults:u # 0,992 K/sec
2.610.836.174.604 cycles:u # 4,405 GHz (83,33%)
9.249.001.490 stalled-cycles-frontend:u # 0,35% frontend cycles idle (83,33%)
188.323.380.515 stalled-cycles-backend:u # 7,21% backend cycles idle (83,33%)
6.294.821.871.279 instructions:u # 2,41 insn per cycle
# 0,03 stalled cycles per insn (83,33%)
1.593.493.508.805 branches:u # 2688,236 M/sec (83,33%)
1.613.875.121 branch-misses:u # 0,10% of all branches (83,34%)
593,078170383 seconds time elapsed
589,590379000 seconds user
1,591781000 seconds sys
```
After:
```
57.292,74 msec task-clock:u # 0,999 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
598.808 page-faults:u # 0,010 M/sec
246.209.111.444 cycles:u # 4,297 GHz (83,34%)
8.990.996.482 stalled-cycles-frontend:u # 3,65% frontend cycles idle (83,33%)
52.443.604.272 stalled-cycles-backend:u # 21,30% backend cycles idle (83,34%)
583.136.689.772 instructions:u # 2,37 insn per cycle
# 0,09 stalled cycles per insn (83,33%)
150.053.278.261 branches:u # 2619,063 M/sec (83,33%)
833.143.959 branch-misses:u # 0,56% of all branches (83,34%)
57,370841100 seconds time elapsed
55,799767000 seconds user
1,291568000 seconds sys
```
Note that this patch also uncovers some broken offset computations.
Checking the offsets manually with addr2line indicates that the new
offsets we report now are better than the old ones. At least for
the cases I compared, e.g.:
```
$ addr2line -C -i -f -e .../fork -a 255800 22c800
0x0000000000255800
__vfwprintf_internal
fork.c:?
0x000000000022c800
__cos_fma
??:?
$ addr2line -C -i -f -e .../fork -a 252585 229585
0x0000000000252585
printf_positional
fork.c:?
0x0000000000229585
main
??:?
$ addr2line -C -i -f -e .../vector_static_gcc_v9.1.0 -a 45d3e0
0x000000000045d3e0
__munmap
crtstuff.c:?
```
Then, we now resolve symbols like binutils, i.e. we pick the first
symbol we find and don't skip weak symbols like eu-addr2line seems
to be doing. I.e. for this:
```
0000000000417a40 w F .text 0000000000000074 hypot
0000000000417a40 w F .text 0000000000000074 hypotf64
0000000000417a40 w F .text 0000000000000074 hypotf32x
0000000000417a40 g F .text 0000000000000074 __hypot
```
We used to get `__hypot`, but now we get `hypot`. I think this is
just as good, and as I said - it's also what you'd get from binutils
with `addr2line`:
```
$ addr2line -C -i -f -e vector_static_gcc_v9.1.0 -a 417a40
0x0000000000418480
hypot
??:?
$ eu-addr2line -C -i -f -e vector_static_gcc_v9.1.0 -a 417a40
0x0000000000418480
__hypot
??:0
```
Initially, I thought about just skipping all weak symbols, but that's
not a feasible approach. There are some symbols that are weak and not
overridden by a non-weak symbol, like this one:
```
$ objdump -C -t .../vector_static_gcc_v9.1.0 | grep 401c70
0000000000401c70 w F .text 0000000000000162 void std::vector<double, std::allocator<double> >::_M_realloc_insert<double>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double&&)
```
And that one even contains a bunch of inlined frames, so we definitely
want to keep that in. We could potentially pass that information along
and then implement a custom logic to prefer non-weak symbols. Quite
frankly, I don't think that effort is worth it.
Change-Id: Ic91764aaab36e77be1c4df4a32d4ac2b4c28e7e0
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
| |
Only works when the diff program is available on the system.
Change-Id: Id4cd5fe96a1a10b03153900600b3fcb43f755100
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A forked process inherits the elfmap of the parent, thus we need to
copy that data when we encounter a fork event. This is complicated by
the fact that the task events and mmap events need to be sorted first.
Thus, we need to find the fork event when we deplete the buffers after
sorting and then initialize the child process elf map with the one
from the parent process.
Furthermore, pass the ppid through the ThreadStart event, to allow
client applications to inherit the comm for the newly created process.
Fixes: https://github.com/KDAB/hotspot/issues/241
Change-Id: I5de13644e12def6704c5f622428a815fd87d2af4
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
| |
When compiled with HAVE_ZSTD=1, transparently decompress perf
records contained in PERF_RECORD_COMPRESSED and parse those then.
This way, we can finally open data files recorded with
`perf record -z`, which are often two orders of magnitude smaller
for the common `--call-graph dwarf` case.
Change-Id: Ic26f049b955b20038b947d03c7ff1c6c5eb22ba3
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
| |
This patch allows us to more easily add new data files that reuse
the same binaries we have already added to our test data, instead
of requiring a single copy per directory.
Change-Id: Ia635f6d5444a4b92e2a4a684d9c44bce61ad017c
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apparently newer elfutils or demangling contains a fix for the weird
complex symbols encountered in clang-compiled binaries. Now, the
symbol is better readable:
Before: `doublecomplex `
After: `double _Complex`
Update the testdata accordingly, instead of failing. To keep backwards
compatibility, replace the old form to the new form in the actual file
output.
Change-Id: I49bc956f5f2032ae7d71c59e7d6c82bc65d81e81
Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
| |
The results seem to differ depending on platform.
Change-Id: I18ca22ffb7e2ead988680963fe39ac7ea9068430
Reviewed-by: Christian Kandeler <christian.kandeler@qt.io>
|
|
|
|
|
| |
Change-Id: I9df420d6ad46249ce6f0091b159dc56a563e93b8
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: I22539a0d5c435649bc1056bc0406583742a5cb23
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While building the hotspot AppImage, the test is failing as the
expected messages where not found. Indeed, the format is slightly
different:
PerfUnwind::ErrorCode(MissingElfFile): Could not find ELF file ...
Fix this by using a regex substring match instead.
Change-Id: Ida4696014e2b631760fb7b5f4a41d8cae1040762
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I have encountered cases where the ElfInfo we pass to the address
cache is the pgoff-mmapped one. In that case, we need to take the
baseAddr to compute the relative address, not the absolute address
where the offsetted elf was mapped. In my case, the address was
actually outside of the pgoff-mmapped one and within a different
section apparently, which got handled properly by elfutils but our
caching didn't handle this yet - we just asserted and failed.
Change-Id: I2cd9d2cebbd60f00353ecbf413e020783374769e
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
| |
Asking for an address that's not within the range specified
by the ElfInfo leads to an assert. Don't do that, by giving both
Elf maps the same addresses. The cache operates on relative addresses
internally anyways, so the absolute addresses are irrelevant.
Change-Id: I33afdc762ca74d1ec4243420e4bc886aa4820581
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we encounter multiple mmap events for a DSO we only used
the base address when the following mmaps had pgoff != 0. Apparently
that is not always a valid assumption, for example I have run into:
```
cpp-inlining 34357 1865.127721: PERF_RECORD_MMAP2 34357/34357:
[0x5605a604e000(0x5000) @ 0 08:04 19007514 1373382085]:
r--p ...hotspot/tests/test-clients/cpp-inlining/cpp-inlining
cpp-inlining 34357 1865.127728: PERF_RECORD_MMAP2 34357/34357:
[0x5605a604f000(0x1000) @ 0 08:04 19007514 1373382085]:
r-xp ...hotspot/tests/test-clients/cpp-inlining/cpp-inlining
cpp-inlining 34357 1865.127732: PERF_RECORD_MMAP2 34357/34357:
[0x5605a6050000(0x2000) @ 0 08:04 19007514 1373382085]:
rw-p ...hotspot/tests/test-clients/cpp-inlining/cpp-inlining
cpp-inlining 34357 1865.127734: PERF_RECORD_MMAP2 34357/34357:
[0x5605a6052000(0x1000) @ 0x1000 08:04 19007514 1373382085]:
rw-p ...hotspot/tests/test-clients/cpp-inlining/cpp-inlining
```
Here, the executable part corresponds to the second mmap event.
Note that it has pgoff = 0, so we tried to use that directly
which means that we use its address 0x5605a604f000 as base. For
reporting to elfutils.
Then, when we try to resolve the sybol at e.g. 0x5605a604fd72
we didn't find anything. This can be confirmed by {eu-,}addr2line
by using the computed difference 0xd72 - nothing can be found there.
Instead, the base address 0x5605a604e000 from the very first
mmap event has to be used, yielding a difference of 0x1d72 which
does show symbols again.
Change-Id: Icaa3db310237c6f616dc23659a65e13dd5ff017b
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
| |
Change-Id: I5c98b0f427d846d9e5cc987b27fb963176425c5d
Reviewed-by: Christian Stenger <christian.stenger@qt.io>
|
|
|
|
|
| |
Change-Id: I9ec73226ba0309f244038708cb85d2ae9f3aab30
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The symbol table isn't necessarily sorted, and thus repeated lookups
in there can be expensive when a DSO has many entries in its symtab.
For example, the librustc_driver from rustc 1.40.0 has about 202594
symbols. A single call to dwfl_module_addrinfo can take milliseconds
on my laptop. Every time we get a sample at a so far unknown address,
we have to find the corresponding symbol. So we called this function
a lot, which can add up to a significant amount of time. Now, we
cache the symbol name and its offset and size information in a sorted
list and try to lookup the symbol there quickly. The impact of this
patch on the overall time required to analyze a ~1GB perf.data file
for a `cargo build` process (and it's child processes) is huge:
before:
```
447.681,66 msec task-clock:u # 0,989 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
45.214 page-faults:u # 0,101 K/sec
1.272.289.956.854 cycles:u # 2,842 GHz
3.497.255.264.964 instructions:u # 2,75 insn per cycle
863.671.557.196 branches:u # 1929,209 M/sec
2.666.320.642 branch-misses:u # 0,31% of all branches
452,806895428 seconds time elapsed
441,996666000 seconds user
2,557237000 seconds sys
```
after:
```
63.770,08 msec task-clock:u # 0,995 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
35.102 page-faults:u # 0,550 K/sec
191.267.750.628 cycles:u # 2,999 GHz
501.316.536.714 instructions:u # 2,62 insn per cycle
122.234.405.333 branches:u # 1916,799 M/sec
443.671.470 branch-misses:u # 0,36% of all branches
64,063443896 seconds time elapsed
62,188041000 seconds user
1,136533000 seconds sys
```
That means we are now roughly 7x faster than before.
Fixes: https://github.com/KDAB/hotspot/issues/225
Change-Id: Ib7dbc800c9372044a847de68a8459dd7f7b0d3da
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we profile a multi-process ensemble, it will often happen that
we encounter samples at the relative address of a DSO. In such cases,
we can leverage a central cache to store the information, instead of
recomputing the same data for every process.
As an example, I wrote a shell script that runs the same process four
times in parallel. When I parse the resulting perf.data file, the perf
stat results are as follows:
before:
```
Performance counter stats for '/home/milian/projects/compiled/other/lib/libexec/hotspot-perfparser --input ./perf.data --output /dev/null':
4.240,50 msec task-clock:u # 0,956 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
17.389 page-faults:u # 0,004 M/sec
11.195.771.907 cycles:u # 2,640 GHz
26.585.168.652 instructions:u # 2,37 insn per cycle
6.234.491.027 branches:u # 1470,227 M/sec
35.149.387 branch-misses:u # 0,56% of all branches
4,435152034 seconds time elapsed
3,732758000 seconds user
0,490148000 seconds sys
```
after:
```
Performance counter stats for '/home/milian/projects/compiled/other/lib/libexec/hotspot-perfparser --input ./perf.data --output /dev/null':
4.160,90 msec task-clock:u # 0,979 CPUs utilized
0 context-switches:u # 0,000 K/sec
0 cpu-migrations:u # 0,000 K/sec
15.476 page-faults:u # 0,004 M/sec
10.635.798.451 cycles:u # 2,556 GHz
16.616.035.720 instructions:u # 1,56 insn per cycle
3.838.148.777 branches:u # 922,433 M/sec
24.902.558 branch-misses:u # 0,65% of all branches
4,249408917 seconds time elapsed
3,612442000 seconds user
0,533933000 seconds sys
```
Note that the overall elapsed time doesn't change that much here,
but the amount of instructions required is massively reduced. I bet
there are other situations where this patch will bring a more tangible
improvement to the overall time requirement.
Change-Id: I4531ec648af40dd44b9e4290fab7bbd2a89609da
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
| |
Broke with 5753a53b.
Change-Id: Ib6d6132a4cf611faa06143a1e26924bbf21a6a0d
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This feels hacky, but it seems to work! We finally get proper symbols
for Qt signal/slot functors and other inlined lambdas etc.
The new test converts a perf.data for a statically-build test
application into textual form and then verifies that it matches
what we want. Sadly, QTestLib doesn't provide an easy way to diff,
but this is good enough I guess. Esp. note how we write the actual
text to disk too, so if something fails it's easy to run diff on
the command line if needed.
In checking that this patch actually helps, I noticed that only the
test binary compiled with gcc produces symbol names that are
unexpected, whereas the clang-compiled binary produces good results
even without this patch!
This test also uncovers two bugs in the DWARF emitted by clang and
gcc, see the following upstream bug reports for more information:
GCC is missing inline frames for sin/cos calls:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91929
Clang produces invalid mangled DW_AT_linkage_name values for sin/cos:
https://bugs.llvm.org/show_bug.cgi?id=43491
Change-Id: I0acbf57d191f09383c60bdab9e6664f9a74db42f
Fixes: https://github.com/KDAB/hotspot/issues/210
Reviewed-by: Paul Wicking <paul.wicking@qt.io>
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
|