web/getstats/help_tsbm.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365

<html>

<head>

<style type="text/css">
table { border-collapse:collapse; }
table, th, td { border: 1px solid #aaaaaa; }
th {vertical-align:top; padding:10px}
td {vertical-align:top; padding:10px}
li {padding:5px}
</style>

</head>

<body>

<span style="text-align:center">
<h1>BM2 Documentation</h1>
<h2><em>Benchmark Time Series Page</em></h2>
</span>


<!-- ///////////////////////////////////////////////////////////////// -->
<h2>Overview</h2>
&nbsp;<img src="images/help_overview_anno.png" /><br /><br />

<h3>Main context</h3>

<b>Note:</b> Some of the terms and concepts in the below table are described
in greater detail elsewhere. The additional documentation can often be
accessed through a link.
<br /><br />

<table>
  <tr>
    <td style="text-align:right"><b>Database</b></td>
    <td>
      The database from which results for the time series were extracted.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>Report date</b></td>
    <td>
      The date at which the web page was generated.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>Host</b></td>
    <td>
      The physical computer on which the benchmark producing this time series
      was executed. <b>Note:</b>It is assumed that the HW/SW specifications of
      the host does not change significantly during the time span of the time
      series. (The principle here being that significant changes in the time
      series should be caused by changes in the product (i.e. Qt) only!)
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>Platform</b></td>
    <td>
      The general environment used for building and executing the product
      being measured (typically an OS/compiler combination).
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>Branch</b></td>
    <td>
      Essentially the version of the product being measured. The branch is
      normally made up of two components: The <i>git repository</i> and the
      <i>git branch</i>.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>Target snapshots</b></td>
    <td>
      The requested subsequence of snapshots for which results for this
      host/platform/branch/benchmark/metric combination potentially exist in
      the database.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>Difference tolerance</b></td>
    <td>
      A real value &ge; 1 that decides whether a <a href="#changes">change</a>
      between two median observations in the time series is considered
      significant or not.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>Minimum durability tolerance</b></td>
    <td>
      The minimum length a contiguous sequence of significantly equal median
      observations must have for it to achieve a durability score greater
      than zero. Once the sequence is at least this long, the durability
      score grows linearly to 1 at a rate that depends on the maximum
      durability tolerance.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>Maximum durability tolerance</b></td>
    <td>
      The length of a contiguous sequence of significantly equal median
      observations that is sufficient to achieve the maximum durability score
      of 1. The durability score for shorter sequences falls linearly to 0 at
      a rate that depends on the minimum durability tolerance.
    </td>
  </tr>
</table>


<h3>Time series statistics</h3>

<b>Note:</b> Some of the terms and concepts in the below table are described
in greater detail elsewhere. The additional documentation can often be
accessed through a link.
<br /><br />

<table>
  <tr>
    <td style="text-align:right"><b>MS</b></td>
    <td>
      Missing snapshots, i.e. the number of target snapshots for which no
      results exist.
      <br /><br />A high value might indicate unstable execution.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>LSD</b></td>
    <td>
      Last Snapshot Distance, i.e. distance between the last target snapshot
      and the last snapshot in the time series.
      <br /><br />If the last target snapshot is the last one available in the
      database, a high value might indicate that the benchmark currently fails
      to produce results.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>NI</b></td>
    <td>
      Total number of observations explicitly flagged as invalid.
      <br /><br />An invalid observation is typically caused by a failed
      QVERIFY() etc.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>NZ</b></td>
    <td>
      Total number of non-positive observations.
      <br /><br />Normally an observation must be positive to be valid.
      <br /><b>Note:</b> A non-positive observation is not necessarily flagged
      as <i>invalid</i> (see <b>NI</b>).
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>NC</b></td>
    <td>
      Number of <a href="#changes">significant changes</a>.
      <br /><br />A high value might indicate unstable or fluctuating results.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>MDRSE</b></td>
    <td>
      Median of the
      valid <a href="http://en.wikipedia.org/wiki/Standard_error_%28statistics%29#Relative_standard_error">relative
      standard errors</a> of all snapshots.
      <br /><br />
      &nbsp;&nbsp;&nbsp;&nbsp;
      <img src="images/rse.png" />
      <br /><br />A high value might indicate unstable or fluctuating results.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>RSEMD</b></td>
    <td>
      Relative standard error (see above) of the valid median observations of
      all snapshots.
      <br /><br />A high value might indicate either 1) unstable or
      fluctuating results or 2) stable changes of a high magnitude.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>LC</b></td>
    <td>
      Last <a href="#changes">significant change</a>.
      <br /><br />The higher the value is above 1, the more strongly it
      represents an improvement.
      <br />The lower the value is below 1, the more strongly it represents a
      regression.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>LCDA</b></td>
    <td>
      Days ago (relative to the report date) since the first observation for
      the last <a href="#changes">significant change</a> snapshot was uploaded
      to the database.
      <br />The distance (in terms of number of target snapshots) between the
      last significant change snapshot and the last target snapshot is shown
      in parentheses.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>LCMS</b></td>
    <td>
      Magnitude score of the last <a href="#changes">significant
      change</a>. This score indicates the strength of the last signicifant
      change as a value ranging from 0 (weak) to 1 (strong):
      <br /><br />&nbsp;&nbsp;&nbsp;&nbsp;
      <img src="images/lcms.png" />
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>LCGSS</b></td>
    <td>
      Global separation score for the last <a href="#changes">significant
      change</a>. This score indicates how well the median observation at the
      last significant change snapshot are separated from the median
      observations at <u>all preceding</u> snapshots in the time series. The
      median observation at the last <a href="#changes">base snapshot</a> is
      used as the maximum separation reference.

      <br /><br />The score ranges from 0 (weak separation) to 1 (strong
      separation).

      <br /><br />This score roughly measures how close the median observation
      at the last significant change is to represent an "all time high(low)"
      up to this point in the history.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>LCLSS</b></td>
    <td>
      Local separation score for the last <a href="#changes">significant
      change</a>. This score indicates how well the median observations on
      each side of the last significant change snapshot are separated from
      each other. Snapshots before the last <a href="#changes">base
      snapshot</a> are not considered. The median observation at the base
      snapshot is used as the maximum separation reference.

      <br /><br />The score ranges from 0 (weak separation) to 1 (strong
      separation).
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>LCDS1</b></td>
    <td>
      Durability score 1 for the last <a href="#changes">significant
      change</a>. This score indicates the distance (in terms of number of
      snapshots) from the last significant change to
      its <a href="#changes">base snapshot</a>.

      <br /><br />The score ranges from 0 (weak durability) to 1 (strong
      durability) and is scaled against the min/max durability tolerances.

      <br /><br />This score measures for how long the median observation
      stayed near the base value until the last significant change occurred.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>LCDS2</b></td>
    <td>
      Durability score 2 for the last <a href="#changes">significant
      change</a>. This score indicates the distance (in terms of number of
      snapshots) from the last significant change to the end of the time
      series.

      <br /><br />The score ranges from 0 (weak durability) to 1 (strong
      durability) and is scaled against the min/max durability tolerances.

      <br /><br />This score measures for how long the median observation at
      the last significant change has stayed essentially the same.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>LCSS</b></td>
    <td>
      Stability score for the last <a href="#changes">significant change</a>:
      <br /><br />
      &nbsp;&nbsp;&nbsp;&nbsp;
      <b>LCMS</b> * <b>LCGSS</b> * <b>LCLSS</b> * <b>LCDS1</b> * <b>LCDS2</b>
      <br /><br />
      The higher this score, the higher the likelihood that the last
      significant change is or will become permanent.
    </td>
  </tr>
  <tr>
    <td style="text-align:right"><b>LCSS1</b></td>
    <td>
      Stability score for the last <a href="#changes">significant change</a>
      that does not consider the history after the latter:
      <br /><br />
      &nbsp;&nbsp;&nbsp;&nbsp;
      <b>LCMS</b> * <b>LCGSS</b> * <b>LCLSS</b> * <b>LCDS1</b>
      <br /><br />
      The higher this score, the higher the likelihood that the last
      signicifant change is or will become permanent, but since <b>LCDS2</b>
      is omitted from the product, a high <b>LCSS1</b> is more likely to be
      caused by an outlier than a high <b>LCSS</b>!
    </td>
  </tr>
</table>


<h3>Benchmark and metric</h3>
The benchmark name consists of three subnames and is formatted like this:
<br /><br />
&lt;subname 1&gt;<b>:</b>&lt;subname 2&gt;<b>(</b>&lt;subname 3&gt;<b>)</b>
<br /><br />

The subnames can essentially be anything not containing the characters
'<b>:</b>', '<b>(</b>', and '<b>)</b>'. Only subname 3 may contain whitespace.

For benchmark results generated by QTestLib, the subnames always correspond to
<i>test case</i>, <i>test function</i>, and <i>data tag</i> respectively.

<br /><br />
The metric name is one of a set of predefined metric names, each of which is
classified as either "lower is better" (like walltime) or "higher is better"
(like fps).

<!-- ///////////////////////////////////////////////////////////////// -->
<h2>Time Series Plot</h2>

<h3>Snapshots and main graph</h3>
&nbsp;<img src="images/help_plot_overview_anno.png" /><br /><br />


<a name="changes" />
<h3>Significant changes</h3>
&nbsp;<img src="images/help_plot_changes_anno.png" /><br /><br />


<h3>Sample size</h3>
&nbsp;<img src="images/help_plot_samplesize_anno.png" /><br /><br />


<h3>Non-positive observations</h3>
&nbsp;<img src="images/help_plot_nonposobs_anno.png" /><br /><br />


<h3>Invalid observations</h3>
&nbsp;<img src="images/help_plot_invalidobs_anno.png" /><br /><br />


<h3>Statistical dispersion</h3>
Statistical dispersion in a time series is measured in terms of
<a href="http://en.wikipedia.org/wiki/Standard_error_%28statistics%29#Relative_standard_error">relative standard error</a> (RSE).
&nbsp;<img src="images/help_plot_rse_anno.png" /><br /><br />


<h3>Missing data</h3>
&nbsp;<img src="images/help_plot_missing_anno.png" /><br /><br />


<h2>Snapshot details</h2>
When clicking on a snapshot in the plot, the two tables below the plot are
filled with various details about the selected snapshot.
<br />
<br />
&nbsp;<img src="images/help_plot_selected_anno.png" /><br /><br />

</body>

</html>