summaryrefslogtreecommitdiffstats
path: root/doc/debuginfod.8
blob: 084d62aae3d01d473d39a604d4d78a1a1210112b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
'\"! tbl | nroff \-man
'\" t macro stdmacro

.de SAMPLE
.br
.RS 0
.nf
.nh
..
.de ESAMPLE
.hy
.fi
.RE
..

.TH DEBUGINFOD 8
.SH NAME
debuginfod \- debuginfo-related http file-server daemon

.SH SYNOPSIS
.B debuginfod
[\fIOPTION\fP]... [\fIPATH\fP]...

.SH DESCRIPTION
\fBdebuginfod\fP serves debuginfo-related artifacts over HTTP.  It
periodically scans a set of directories for ELF/DWARF files and their
associated source code, as well as archive files containing the above, to
build an index by their buildid.  This index is used when remote
clients use the HTTP webapi, to fetch these files by the same buildid.

If a debuginfod cannot service a given buildid artifact request
itself, and it is configured with information about upstream
debuginfod servers, it queries them for the same information, just as
\fBdebuginfod-find\fP would.  If successful, it locally caches then
relays the file content to the original requester.

Indexing the given PATHs proceeds using multiple threads.  One thread
periodically traverses all the given PATHs logically or physically
(see the \fB\-L\fP option).  Duplicate PATHs are ignored.  You may use
a file name for a PATH, but source code indexing may be incomplete;
prefer using a directory that contains the binaries.  The traversal
thread enumerates all matching files (see the \fB\-I\fP and \fB\-X\fP
options) into a work queue.  A collection of scanner threads (see the
\fB\-c\fP option) wait at the work queue to analyze files in parallel.

If the \fB\-F\fP option is given, each file is scanned as an ELF/DWARF
file.  Source files are matched with DWARF files based on the
AT_comp_dir (compilation directory) attributes inside it.  Caution:
source files listed in the DWARF may be a path \fIanywhere\fP in the
file system, and debuginfod will readily serve their content on
demand.  (Imagine a doctored DWARF file that lists \fI/etc/passwd\fP
as a source file.)  If this is a concern, audit your binaries with
tools such as:

.SAMPLE
% eu-readelf -wline BINARY | sed -n '/^Directory.table/,/^File.name.table/p'
or
% eu-readelf -wline BINARY | sed -n '/^Directory.table/,/^Line.number/p'
or even use debuginfod itself:
% debuginfod -vvv -d :memory: -F BINARY 2>&1 | grep 'recorded.*source'
^C
.ESAMPLE

If the \fB\-R\fP and/or \fB-U\fP option is given, each file is scanned
as an archive file that may contain ELF/DWARF/source files.  If \-R is
given, the will scan RPMs; and/or if \-U is given, they will scan DEB
/ DDEB files.  (The terms RPM and DEB and DDEB are used synonymously
as "archives" in diagnostic messages.)  Because of complications such
as DWZ-compressed debuginfo, may require \fItwo\fP traversal passes to
identify all source code.  Source files for RPMs are only served from
other RPMs, so the caution for \-F does not apply.  Note that due to
Debian/Ubuntu packaging policies & mechanisms, debuginfod cannot
resolve source files for DEB/DDEB at all.

If no PATH is listed, or neither \fB\-F\fP nor \fB\-R\fP nor \fB\-U\fP
option is given, then \fBdebuginfod\fP will simply serve content that
it accumulated into its index in all previous runs.


.SH OPTIONS

.TP
.B "\-F"
Activate ELF/DWARF file scanning.  The default is off.

.TP
.B "\-R"
Activate RPM patterns in archive scanning.  The default is off.

.TP
.B "\-U"
Activate DEB/DDEB patterns in archive scanning.  The default is off.

.TP
.B "\-d FILE" "\-\-database=FILE"
Set the path of the sqlite database used to store the index.  This
file is disposable in the sense that a later rescan will repopulate
data.  It will contain absolute file path names, so it may not be
portable across machines.  It may be frequently read/written, so it
should be on a fast filesytem.  It should not be shared across
machines or users, to maximize sqlite locking performance.  The
default database file is \%$HOME/.debuginfod.sqlite.

.TP
.B "\-D SQL" "\-\-ddl=SQL"
Execute given sqlite statement after the database is opened and
initialized as extra DDL (SQL data definition language).  This may be
useful to tune performance-related pragmas or indexes.  May be
repeated.  The default is nothing extra.

.TP
.B "\-p NUM" "\-\-port=NUM"
Set the TCP port number on which debuginfod should listen, to service
HTTP requests.  Both IPv4 and IPV6 sockets are opened, if possible.
The webapi is documented below.  The default port number is 8002.

.TP
.B "\-I REGEX"  "\-\-include=REGEX"  "\-X REGEX"  "\-\-exclude=REGEX"
Govern the inclusion and exclusion of file names under the search
paths.  The regular expressions are interpreted as unanchored POSIX
extended REs, thus may include alternation.  They are evaluated
against the full path of each file, based on its \fBrealpath(3)\fP
canonicalization.  By default, all files are included and none are
excluded.  A file that matches both include and exclude REGEX is
excluded.  (The \fIcontents\fP of archive files are not subject to
inclusion or exclusion filtering: they are all processed.)

.TP
.B "\-t SECONDS"  "\-\-rescan\-time=SECONDS"
Set the rescan time for the file and archive directories.  This is the
amount of time the traversal thread will wait after finishing a scan,
before doing it again.  A rescan for unchanged files is fast (because
the index also stores the file mtimes).  A time of zero is acceptable,
and means that only one initial scan should performed.  The default
rescan time is 300 seconds.  Receiving a SIGUSR1 signal triggers a new
scan, independent of the rescan time (including if it was zero).

.TP
.B "\-g SECONDS" "\-\-groom\-time=SECONDS"
Set the groom time for the index database.  This is the amount of time
the grooming thread will wait after finishing a grooming pass before
doing it again.  A groom operation quickly rescans all previously
scanned files, only to see if they are still present and current, so
it can deindex obsolete files.  See also the \fIDATA MANAGEMENT\fP
section.  The default groom time is 86400 seconds (1 day).  A time of
zero is acceptable, and means that only one initial groom should be
performed.  Receiving a SIGUSR2 signal triggers a new grooming pass,
independent of the groom time (including if it was zero).

.TP
.B "\-G"
Run an extraordinary maximal-grooming pass at debuginfod startup.
This pass can take considerable time, because it tries to remove any
debuginfo-unrelated content from the archive-related parts of the index.
It should not be run if any recent archive-related indexing operations
were aborted early.  It can take considerable space, because it
finishes up with an sqlite "vacuum" operation, which repacks the
database file by triplicating it temporarily.  The default is not to
do maximal-grooming.  See also the \fIDATA MANAGEMENT\fP section.

.TP
.B "\-c NUM"  "\-\-concurrency=NUM"
Set the concurrency limit for the scanning queue threads, which work
together to process archives & files located by the traversal thread.
This important for controlling CPU-intensive operations like parsing
an ELF file and especially decompressing archives.  The default is the
number of processors on the system; the minimum is 1.

.TP
.B "\-L"
Traverse symbolic links encountered during traversal of the PATHs,
including across devices - as in \fIfind\ -L\fP.  The default is to
traverse the physical directory structure only, stay on the same
device, and ignore symlinks - as in \fIfind\ -P\ -xdev\fP.  Caution: a
loops in the symbolic directory tree might lead to \fIinfinite
traversal\fP.

.TP
.B "\-\-fdcache-fds=NUM"  "\-\-fdcache-mbs=MB"
Configure limits on a cache that keeps recently extracted files from
archives.  Up to NUM files and up to a total of MB megabytes will be
kept extracted, in order to avoid having to decompress their archives
again.  The default NUM and MB values depend on the concurrency of the
system, and on the available disk space on the $TMPDIR or \fB/tmp\fP
filesystem.  This is because that is where the most recently used
extracted files are kept.  Grooming cleans this cache.

.TP
.B "\-v"
Increase verbosity of logging to the standard error file descriptor.
May be repeated to increase details.  The default verbosity is 0.

.SH WEBAPI

.\" Much of the following text is duplicated with debuginfod-find.1

debuginfod's webapi resembles ordinary file service, where a GET
request with a path containing a known buildid results in a file.
Unknown buildid / request combinations result in HTTP error codes.
This file service resemblance is intentional, so that an installation
can take advantage of standard HTTP management infrastructure.

There are three requests.  In each case, the buildid is encoded as a
lowercase hexadecimal string.  For example, for a program \fI/bin/ls\fP,
look at the ELF note GNU_BUILD_ID:

.SAMPLE
% readelf -n /bin/ls | grep -A4 build.id
Note section [ 4] '.note.gnu.buildid' of 36 bytes at offset 0x340:
Owner          Data size  Type
GNU                   20  GNU_BUILD_ID
Build ID: 8713b9c3fb8a720137a4a08b325905c7aaf8429d
.ESAMPLE

Then the hexadecimal BUILDID is simply:

.SAMPLE
8713b9c3fb8a720137a4a08b325905c7aaf8429d
.ESAMPLE

.SS /buildid/\fIBUILDID\fP/debuginfo

If the given buildid is known to the server, this request will result
in a binary object that contains the customary \fB.*debug_*\fP
sections.  This may be a split debuginfo file as created by
\fBstrip\fP, or it may be an original unstripped executable.

.SS /buildid/\fIBUILDID\fP/executable

If the given buildid is known to the server, this request will result
in a binary object that contains the normal executable segments.  This
may be a executable stripped by \fBstrip\fP, or it may be an original
unstripped executable.  \fBET_DYN\fP shared libraries are considered
to be a type of executable.

.SS /buildid/\fIBUILDID\fP/source\fI/SOURCE/FILE\fP

If the given buildid is known to the server, this request will result
in a binary object that contains the source file mentioned.  The path
should be absolute.  Relative path names commonly appear in the DWARF
file's source directory, but these paths are relative to
individual compilation unit AT_comp_dir paths, and yet an executable
is made up of multiple CUs.  Therefore, to disambiguate, debuginfod
expects source queries to prefix relative path names with the CU
compilation-directory, followed by a mandatory "/".

Note: contrary to RFC 3986, the client should not elide \fB../\fP or
\fB/./\fP or extraneous \fB///\fP sorts of path components in the
directory names, because if this is how those names appear in the
DWARF files, that is what debuginfod needs to see too.

For example:
.TS
l l.
#include <stdio.h>	/buildid/BUILDID/source/usr/include/stdio.h
/path/to/foo.c	/buildid/BUILDID/source/path/to/foo.c
\../bar/foo.c AT_comp_dir=/zoo/	/buildid/BUILDID/source/zoo//../bar/foo.c
.TE

.SS /metrics

This endpoint returns a Prometheus formatted text/plain dump of a
variety of statistics about the operation of the debuginfod server.
The exact set of metrics and their meanings may change in future
versions.  Caution: configuration information (path names, versions)
may be disclosed.

.SH DATA MANAGEMENT

debuginfod stores its index in an sqlite database in a densely packed
set of interlinked tables.  While the representation is as efficient
as we have been able to make it, it still takes a considerable amount
of data to record all debuginfo-related data of potentially a great
many files.  This section offers some advice about the implications.

As a general explanation for size, consider that debuginfod indexes
ELF/DWARF files, it stores their names and referenced source file
names, and buildids will be stored.  When indexing archives, it stores
every file name \fIof or in\fP an archive, every buildid, plus every
source file name referenced from a DWARF file.  (Indexing archives
takes more space because the source files often reside in separate
subpackages that may not be indexed at the same pass, so extra
metadata has to be kept.)

Getting down to numbers, in the case of Fedora RPMs (essentially,
gzip-compressed cpio files), the sqlite index database tends to be
from 0.5% to 3% of their size.  It's larger for binaries that are
assembled out of a great many source files, or packages that carry
much debuginfo-unrelated content.  It may be even larger during the
indexing phase due to temporary sqlite write-ahead-logging files;
these are checkpointed (cleaned out and removed) at shutdown.  It may
be helpful to apply tight \-I or \-X regular-expression constraints to
exclude files from scanning that you know have no debuginfo-relevant
content.

As debuginfod runs, it periodically rescans its target directories,
and any new content found is added to the database.  Old content, such
as data for files that have disappeared or that have been replaced
with newer versions is removed at a periodic \fIgrooming\fP pass.
This means that the sqlite files grow fast during initial indexing,
slowly during index rescans, and periodically shrink during grooming.
There is also an optional one-shot \fImaximal grooming\fP pass is
available.  It removes information debuginfo-unrelated data from the
archive content index such as file names found in archives ("archive
sdef" records) that are not referred to as source files from any
binaries find in archives ("archive sref" records).  This can save
considerable disk space.  However, it is slow and temporarily requires
up to twice the database size as free space.  Worse: it may result in
missing source-code info if the archive traversals were interrupted,
so that not all source file references were known.  Use it rarely to
polish a complete index.

You should ensure that ample disk space remains available.  (The flood
of error messages on -ENOSPC is ugly and nagging.  But, like for most
other errors, debuginfod will resume when resources permit.)  If
necessary, debuginfod can be stopped, the database file moved or
removed, and debuginfod restarted.

sqlite offers several performance-related options in the form of
pragmas.  Some may be useful to fine-tune the defaults plus the
debuginfod extras.  The \-D option may be useful to tell debuginfod to
execute the given bits of SQL after the basic schema creation
commands.  For example, the "synchronous", "cache_size",
"auto_vacuum", "threads", "journal_mode" pragmas may be fun to tweak
via \-D, if you're searching for peak performance.  The "optimize",
"wal_checkpoint" pragmas may be useful to run periodically, outside
debuginfod.  The default settings are performance- rather than
reliability-oriented, so a hardware crash might corrupt the database.
In these cases, it may be necessary to manually delete the sqlite
database and start over.

As debuginfod changes in the future, we may have no choice but to
change the database schema in an incompatible manner.  If this
happens, new versions of debuginfod will issue SQL statements to
\fIdrop\fP all prior schema & data, and start over.  So, disk space
will not be wasted for retaining a no-longer-useable dataset.

In summary, if your system can bear a 0.5%-3% index-to-archive-dataset
size ratio, and slow growth afterwards, you should not need to
worry about disk space.  If a system crash corrupts the database,
or you want to force debuginfod to reset and start over, simply
erase the sqlite file before restarting debuginfod.


.SH SECURITY

debuginfod \fBdoes not\fP include any particular security features.
While it is robust with respect to inputs, some abuse is possible.  It
forks a new thread for each incoming HTTP request, which could lead to
a denial-of-service in terms of RAM, CPU, disk I/O, or network I/O.
If this is a problem, users are advised to install debuginfod with a
HTTPS reverse-proxy front-end that enforces site policies for
firewalling, authentication, integrity, authorization, and load
control.  The \fI/metrics\fP webapi endpoint is probably not
appropriate for disclosure to the public.

When relaying queries to upstream debuginfods, debuginfod \fBdoes not\fP
include any particular security features.  It trusts that the binaries
returned by the debuginfods are accurate.  Therefore, the list of
servers should include only trustworthy ones.  If accessed across HTTP
rather than HTTPS, the network should be trustworthy.  Authentication
information through the internal \fIlibcurl\fP library is not currently
enabled.


.SH "ENVIRONMENT VARIABLES"

.TP
.B TMPDIR
This environment variable points to a file system to be used for
temporary files.  The default is /tmp.

.TP
.B DEBUGINFOD_URLS
This environment variable contains a list of URL prefixes for trusted
debuginfod instances.  Alternate URL prefixes are separated by space.
Avoid referential loops that cause a server to contact itself, directly
or indirectly - the results would be hilarious.

.TP
.B DEBUGINFOD_TIMEOUT
This environment variable governs the timeout for each debuginfod HTTP
connection.  A server that fails to respond within this many seconds
is skipped.  The default is 5.

.TP
.B DEBUGINFOD_CACHE_PATH
This environment variable governs the location of the cache where
downloaded files are kept.  It is cleaned periodically as this
program is reexecuted.  The default is \%$HOME/.debuginfod_client_cache.
.\" XXX describe cache eviction policy

.SH FILES
.LP
.PD .1v
.TP 20
.B $HOME/.debuginfod.sqlite
Default database file.
.PD

.TP 20
.B $HOME/.debuginfod_client_cache
Default cache directory for content from upstream debuginfods.
.PD


.SH "SEE ALSO"
.I "debuginfod-find(1)"
.I "sqlite3(1)"
.I \%https://prometheus.io/docs/instrumenting/exporters/