aboutsummaryrefslogtreecommitdiffstats
path: root/doc/qtcreator/src/analyze/cpu-usage-analyzer.qdoc
blob: 2e931ee049953d83b32fe19b6de9d12306893bdd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
/****************************************************************************
**
** Copyright (C) 2019 The Qt Company Ltd.
** Contact: https://www.qt.io/licensing/
**
** This file is part of the Qt Creator documentation.
**
** Commercial License Usage
** Licensees holding valid commercial Qt licenses may use this file in
** accordance with the commercial license agreement provided with the
** Software or, alternatively, in accordance with the terms contained in
** a written agreement between you and The Qt Company. For licensing terms
** and conditions see https://www.qt.io/terms-conditions. For further
** information use the contact form at https://www.qt.io/contact-us.
**
** GNU Free Documentation License Usage
** Alternatively, this file may be used under the terms of the GNU Free
** Documentation License version 1.3 as published by the Free Software
** Foundation and appearing in the file included in the packaging of
** this file. Please review the following information to ensure
** the GNU Free Documentation License version 1.3 requirements
** will be met: https://www.gnu.org/licenses/fdl-1.3.html.
**
****************************************************************************/

// **********************************************************************
// NOTE: the sections are not ordered by their logical order to avoid
// reshuffling the file each time the index order changes (i.e., often).
// Run the fixnavi.pl script to adjust the links to the index order.
// **********************************************************************

/*!
    \previouspage creator-heob.html
    \page creator-cpu-usage-analyzer.html
    \nextpage creator-cppcheck.html

    \title Analyzing CPU Usage

    \QC is integrated with the Linux Perf tool that can be
    used to analyze the CPU and memory usage of an application on embedded
    devices and, to a limited extent, on Linux desktop platforms. The
    Performance Analyzer uses the Perf tool bundled with the Linux kernel to
    take periodic snapshots of the call chain of an application and visualizes
    them in a timeline view or as a flame graph.

    \section1 Using the Performance Analyzer

    The Performance Analyzer usually needs to be able to locate debug symbols for
    the binaries involved.

    Profile builds produce optimized binaries with separate debug symbols and
    should generally be used for profiling.

    To manually set up a build configuration to provide separate debug symbols,
    edit the project build settings:

    \list 1
        \li To generate debug symbols also for applications compiled in release
            mode, select \uicontrol {Projects}, and then select
            \uicontrol Enable in the \uicontrol {Separate debug info} field.

        \li Select \uicontrol Yes to recompile the project.

    \endlist

    You can start the Performance Analyzer in the following ways:

    \list
        \li Select \uicontrol Analyze > \uicontrol {Performance Analyzer} to
            profile the current application.

        \li Select the
            \inlineimage qtcreator-analyze-start-button.png
            (\uicontrol Start) button to start the application from the
            Performance Analyzer.

    \endlist

    \note If data collection does not start automatically, select the
    \inlineimage recordfill.png
    (\uicontrol {Collect profile data}) button.

    When you start analyzing an application, the application is launched, and
    the Performance Analyzer immediately begins to collect data. This is indicated
    by the time running in the \uicontrol Recorded field. However, as the data
    is passed through the Perf tool and an extra helper program bundled with
    \QC, and both buffer and process it on the fly, data may arrive in \QC
    several seconds after it was generated. An estimate for this delay is given
    in the \uicontrol {Processing delay} field.

    Data is collected until you select the
    \uicontrol {Stop collecting profile data} button or terminate the
    application.

    Select the \uicontrol {Stop collecting profile data} button to disable the
    automatic start of the data collection when an application is launched.
    Profile data will still be generated, but \QC will discard it until you
    select the button again.

    \section1 Profiling Memory Usage on Devices

    To create trace points for profiling memory usage on a target device, select
    \uicontrol Analyze > \uicontrol {Performance Analyzer Options} >
    \uicontrol {Create Memory Trace Points}.

    To add events for the trace points, see \l{Choosing Event Types}

    You can record a memory trace to view usage graphs in the samples rows of
    the timeline and to view memory allocations, peaks, and releases in the
    flame graph.

    \section1 Specifying Performance Analyzer Settings

    To specify global settings for the Performance Analyzer, select
    \uicontrol Tools > \uicontrol Options > \uicontrol Analyzer >
    \uicontrol {CPU Usage}. For each run configuration, you can also
    use specialized settings. Select \uicontrol Projects > \uicontrol Run, and
    then select \uicontrol Details next to
    \uicontrol {Performance Analyzer Settings}.

    \image qtcreator-performance-analyzer-settings.png

    To edit the settings for the current run configuration, you can also select
    the dropdown menu next to the \uicontrol {Collect profile data} button.

    \section2 Choosing Event Types

    In the \uicontrol Events table, you can specify which events should trigger
    the Performance Analyzer to take a sample. The most common way of analyzing
    CPU usage involves periodic sampling, driven by hardware performance
    counters that react to the number of instructions or CPU cycles executed.
    Alternatively, a software counter that uses the CPU clock can be chosen.

    Select \uicontrol {Add Event} to add events to the table.
    In the \uicontrol {Event Type} column, you can choose the general type of
    event to be sampled, most commonly \uicontrol {hardware} or
    \uicontrol {software}. In the \uicontrol {Counter} column, you can choose
    which specific counter should be used for the sampling. For example,
    \uicontrol {instructions} in the \uicontrol {hardware} group or
    \uicontrol {cpu-clock} in the \uicontrol {software} group.

    More specialized sampling, for example by cache misses or cache hits, is
    possible. However, support for it depends on specific features of the CPU
    involved. For those specialized events, you can give more detailed sampling
    instructions in the \uicontrol {Operation} and \uicontrol {Result} columns.
    For example, you can choose a \uicontrol {cache} event for
    \uicontrol {L1-dcache} on the \uicontrol {load} operation with a result
    of \uicontrol {misses}. That would sample L1-dcache misses on reading.

    Select \uicontrol {Remove Event} to remove the selected event from the
    table.

    Select \uicontrol {Use Trace Points} to replace the current selection of
    events with trace points defined on the target device and set the
    \uicontrol {Sample mode} to \uicontrol {event count} and the
    \uicontrol {Sample period} to \c {1}. If the trace points on the target
    were defined using the \uicontrol {Create Trace Points} option, the
    Performance Analyzer will automatically use them to profile memory usage.

    Select \uicontrol {Reset} to revert the selection of events, as well as the
    \uicontrol {Sample mode} and \uicontrol {Sample period} to the default
    values.

    \section2 Choosing a Sampling Mode and Period

    In the \uicontrol {Sample mode} and \uicontrol {Sample period} fields, you
    can specify how samples are triggered:

    \list

        \li Sampling by \uicontrol {event count} instructs the kernel to take
            a sample every \c n times one of the chosen events has occurred,
            where \c n is specified in the \uicontrol {Sample period} field.

        \li Sampling by \uicontrol {frequency (Hz)} instructs the kernel to try and
            take a sample \c n times per second, by automatically adjusting the
            sampling period. Specify \c n in the \uicontrol {Sample period}
            field.

    \endlist

    High frequencies or low event counts result in more accurate data, at the
    expense of a higher overhead and a larger volume of data being
    generated. The actual sampling period is determined by the Linux kernel on
    the target device, which takes the period set for Perf merely as advice.
    There may be a significant difference between the sampling period you
    request and the actual result.

    In general, if you configure the Performance Analyzer to collect more data
    than it can transmit over the connection between the target and the host
    device, the application may get blocked while Perf is trying to send the
    data, and the processing delay may grow excessively. You should then change
    the \uicontrol {Sample period} or the \uicontrol {Stack snapshot size}.

    \section2 Selecting Call Graph Mode

    In the \uicontrol {Call graph mode} field, you can specify how the
    Performance Analyzer recovers call chains from your application:

    \list

    \li The \uicontrol {Frame Pointer}, or \c fp, mode relies on frame pointers
    being available in the profiled application and will instruct the kernel on
    the target device to walk the chain of frame pointers in order to retrieve
    a call chain for each sample.

    \li The \uicontrol {Dwarf} mode works also without frame pointers, but
    generates significantly more data. It takes a snapshot of the current
    application stack each time a sample is triggered and transmits that
    snapshot to the host computer for analysis.

    \li The \uicontrol {Last Branch Record} mode does not use a memory buffer.
    It automatically decodes the last 16 taken branches every time execution
    stops. It is supported only on recent Intel CPUs.

    \endlist

    Qt and most system libraries are compiled without frame pointers by
    default, so the frame pointer mode is only useful with customized systems.

    \section2 Setting Stack Snapshot Size

    The Performance Analyzer will analyze and \e unwind the stack snapshots
    generated by Perf in dwarf mode. Set the size of the stack snapshots in the
    \uicontrol {Stack snapshot size} field. Large stack snapshots result in a
    larger volume of data to be transferred and processed. Small stack
    snapshots may fail to capture call chains of highly recursive applications
    or other intense stack usage.

    \section2 Adding Command Line Options For Perf

    You can specify additional command line options to be passed to Perf when
    recording data in the \uicontrol {Additional arguments} field. You may want
    to specify \c{--no-delay} or \c{--no-buffering} to reduce the processing
    delay. However, those options are not supported by all versions of Perf and
    Perf may not start if an unsupported option is given.

    \section2 Resolving Names for JIT-compiled JavaScript Functions

    Since version 5.6.0, Qt can generate perf.map files with information about
    JavaScript functions. The Performance Analyzer will read them and show the
    function names in the \uicontrol Timeline, \uicontrol Statistics, and
    \uicontrol {Flame Graph} views. This only works if the process being
    profiled is running on the host computer, not on the target device. To
    switch on the generation of perf.map files, add the environment variable
    \c QV4_PROFILE_WRITE_PERF_MAP to the \uicontrol {Run Environment} and set
    its value to \c 1.

    \section1 Analyzing Collected Data

    The \uicontrol Timeline view displays a graphical representation of CPU
    usage per thread and a condensed view of all recorded events.

    \image qtcreator-performance-analyzer-timeline.png "Performance Analyzer"

    Each category in the timeline describes a thread in the application. Move
    the cursor on an event (5) on a row to see how long it takes and which
    function in the source it represents. To display the information only when
    an event is selected, disable the
    \uicontrol {View Event Information on Mouseover} button (4).

    The outline (9) summarizes the period for which data was collected. Drag
    the zoom range (7) or click the outline to move on the outline. You can
    also move between events by selecting the
    \uicontrol {Jump to Previous Event} and \uicontrol {Jump to Next Event}
    buttons (1).

    Select the \uicontrol {Show Zoom Slider} button (2) to open a slider that
    you can use to set the zoom level. You can also drag the zoom handles (8).
    To reset the default zoom level, right-click the timeline to open the
    context menu, and select \uicontrol {Reset Zoom}.

    \section2 Selecting Event Ranges

    You can select an event range (6) to view the time it represents or to zoom
    into a specific region of the trace. Select the \uicontrol {Select Range}
    button (3) to activate the selection tool. Then click in the timeline to
    specify the beginning of the event range. Drag the selection handle to
    define the end of the range.

    You can use event ranges also to measure delays between two subsequent
    events. Place a range between the end of the first event and the beginning
    of the second event. The \uicontrol Duration field displays the delay
    between the events in milliseconds.

    To zoom into an event range, double-click it.

    To remove an event range, close the \uicontrol Selection dialog.

    \section2 Understanding the Data

    Generally, events in the timeline view indicate how long a function call
    took. Move the mouse over them to see details. The details always include
    the address of the function, the approximate duration of the call, the ELF
    file the function resides in, the number of samples collected with this
    function call active, the total number of times this function was
    encountered in the thread, and the number of samples this function was
    encountered in at least once.

    For functions with debug information available, the details include the
    location in source code and the name of the function. You can click on such
    events to move the cursor in the code editor to the part of the code the
    event is associated with.

    As the Perf tool only provides periodic samples, the Performance Analyzer
    cannot determine the exact time when a function was called or when it
    returned. You can, however, see exactly when a sample was taken in the
    second row of each thread. The Performance Analyzer assumes that if the same
    function is present at the same place in the call chain in multiple
    consecutive samples, then this represents a single call to the respective
    function. This is, of course, a simplification. Also, there may be other
    functions being called between the samples taken, which do not show up in
    the profile data. However, statistically, the data is likely to show the
    functions that spend the most CPU time most prominently.

    If a function without debug information is encountered, further unwinding
    of the stack may fail. Unwinding will also fail for some symbols
    implemented in assembly language. If unwinding fails, only a part of the
    call chain is displayed, and the surrounding functions may seem to be
    interrupted. This does not necessarily mean they were actually interrupted
    during the execution of the application, but only that they could not be
    found in the stacks where the unwinding failed.

    JavaScript functions from the QML engine running in the JIT mode can be
    unwound. However, their names will only be displayed when
    \c QV4_PROFILE_WRITE_PERF_MAP is set. Compiled JavaScript generated by the
    \l{http://doc.qt.io/QtQuickCompiler/}{Qt Quick Compiler} can also be
    unwound. In this case the C++ names generated by the compiler are shown for
    JavaScript functions, rather than their JavaScript names. When running in
    interpreted mode, stack frames involving QML can also be unwound, showing
    the interpreter itself, rather than the interpreted JavaScript.

    Kernel functions included in call chains are shown on the third row of each
    thread.

    The coloring of the events represents the actual sample rate for the
    specific thread they belong to, across their duration. The Linux kernel
    will only take a sample of a thread if the thread is active. At the same
    time, the kernel tries to honor the requested event period.
    Thus, differences in the sampling frequency between different threads
    indicate that the thread with more samples taken is more likely to be the
    overall bottleneck, and the thread with less samples taken has likely spent
    time waiting for external events such as I/O or a mutex.

    \section1 Viewing Statistics

    \image qtcreator-performance-analyzer-statistics.png

    The \uicontrol Statistics view displays the number of samples each function
    in the timeline was contained in, in total and when on the top of the
    stack (called \c self). This allows you to examine which functions you need
    to optimize. A high number of occurrences might indicate that a function is
    triggered unnecessarily or takes very long to execute.

    Click on a row to move to the respective function in the source code in the
    code editor.

    The \uicontrol Callers and \uicontrol Callees panes show dependencies
    between functions. They allow you to examine the internal functions of the
    application. The \uicontrol Callers pane summarizes the functions that
    called the function selected in the main view.  The \uicontrol Callees pane
    summarizes the functions called from the function selected in the main
    view.

    Click on a row to move to the respective function in the source code in the
    code editor and select it in the main view.

    To copy the contents of one view or row to the clipboard, select
    \uicontrol {Copy Table} or \uicontrol {Copy Row} in the context menu.

    \section2 Visualizing Statistics as Flame Graphs

    \image qtcreator-performance-analyzer-flamegraph.png

    The \uicontrol {Flame Graph} view shows a more concise statistical overview
    of the execution. The horizontal bars show an aspect of the samples
    taken for a certain function, relative to the same aspect of all samples
    together. The nesting shows which functions were called by which other ones.

    The \uicontrol {Visualize} button lets you choose what aspect to show in the
    \uicontrol {Flame Graph}.

    \list

    \li \uicontrol {Samples} is the default visualization. The size of the
    horizontal bars represents the number of samples recorded for the given
    function.

    \li In \uicontrol {Peak Usage} mode, the size of the horizontal bars
    represents the amount of memory allocated by the respective functions, at
    the point in time when the allocation's memory usage was at its peak.

    \li In \uicontrol {Allocations} mode, the size of the horizontal bars
    represents the number of memory allocations triggered by the respective
    functions.

    \li In \uicontrol {Releases} mode, the size of the horizontal bars
    represents the number of memory releases triggered by the respective
    functions.

    \endlist

    The \uicontrol {Peak Usage}, \uicontrol {Allocations}, and
    \uicontrol {Releases} modes will only show any data if samples from memory
    trace points have been recorded.

    \section2 Interaction between the views

    When you select a stack frame in either of the \uicontrol {Timeline},
    \uicontrol {Flame Graph}, or \uicontrol {Statistics} views, information
    about it is displayed in the other two views. To view a time range in the
    \uicontrol {Statistics} and \uicontrol {Flame Graph} views, select
    \uicontrol Analyze > \uicontrol {Performance Analyzer Options} >
    \uicontrol {Limit to the Range Selected in Timeline}. To show the full
    stack frame, select \uicontrol {Show Full Range}.

    \section1 Loading Perf Data Files

    You can load any \c perf.data files generated by recent versions of the
    Linux Perf tool and view them in \QC. Select \uicontrol Analyze >
    \uicontrol {Performance Analyzer Options} > \uicontrol {Load perf.data} to
    load a file.

    \image qtcreator-cpu-usage-analyzer-load-perf-trace.png

    The Performance Analyzer needs to know the context in which the
    data was recorded to find the debug symbols. Therefore, you have to specify
    the kit that the application was built with and the folder where the
    application executable is located.

    The Perf data files are generated by calling \c {perf record}. Make sure to
    generate call graphs when recording data by starting Perf with the
    \c {--call-graph} option. Also check that the necessary debug symbols are
    available to the Performance Analyzer, either at a standard location
    (\c /usr/lib/debug or next to the binaries), or as part of the Qt package
    you are using.

    The Performance Analyzer can read Perf data files generated in either frame
    pointer or dwarf mode. However, to generate the files correctly, numerous
    preconditions have to be met. All system images for the
    \l{http://doc.qt.io/QtForDeviceCreation/qtee-supported-platforms.html}
    {Qt for Device Creation reference devices}, except for Freescale iMX53 Quick
    Start Board and SILICA Architect Tibidabo, are correctly set up for
    profiling in the dwarf mode. For other devices, check whether Perf can read
    back its own data in a sensible way by checking the output of
    \c {perf report} or \c {perf script} for the recorded Perf data files.

    \section1 Loading and Saving Trace Files

    You can save and load trace data in a format specific to the
    Performance Analyzer with the respective entries in \uicontrol Analyze >
    \uicontrol {Performance Analyzer Options}. This format is self-contained, and
    therefore loading it does not require you to specify the recording
    environment. You can transfer such trace files to a different computer
    without any tool chain or debug symbols and analyze them there.

    \section1 Troubleshooting

    The Performance Analyzer might fail to record data for the following reasons:

    \list 1
        \li Perf events may be globally disabled on your system. The
            preconfigured Boot2Qt images come with perf events enabled. For
            a custom configuration you need to make sure that the file
            \c {/proc/sys/kernel/perf_event_paranoid} contains a value smaller
            than \c {2}. For maximum flexibility in recording traces you can
            set the value to \c {-1}. This allows any user to record any kind
            of trace, even using raw kernel trace points.
        \li The connection between the target device and the host may not be
            fast enough to transfer the data produced by Perf. Try modifying
            the values of the \uicontrol {Stack snapshot size} or
            \uicontrol {Sample period} settings.
        \li Perf may be buffering the data forever, never sending it. Add
            \c {--no-delay} or \c {--no-buffering} to the
            \uicontrol {Additional arguments} field.
        \li Some versions of Perf will not start recording unless given a
            certain minimum sampling frequency. Try with a
            \uicontrol {Sample period} value of 1000.
        \li On some devices, in particular various i.MX6 Boards, the hardware
            performance counters are dysfunctional and the Linux kernel may
            randomly fail to record data after some time. Perf can use different
            types of events to trigger samples. You can get a list of available
            event types by running \c {perf list} on the device and then choose
            the respective event types in the settings. The choice of event type
            affects the performance and stability of the sampling. The
            \c {cpu-clock} \c {software} event is a safe but relatively slow
            option as it does not use the hardware performance counters, but
            drives the sampling from software. After the sampling has failed,
            reboot the device. The kernel may have disabled important parts of
            the performance counters system.
    \endlist

    Output from the helper program that processes the data is displayed in the
    \uicontrol {General Messages} output pane.
*/