diff options
author | Milian Wolff <milian.wolff@kdab.com> | 2017-03-29 16:33:53 +0200 |
---|---|---|
committer | Milian Wolff <milian.wolff@kdab.com> | 2017-03-29 14:52:53 +0000 |
commit | 62c48e4890eed1e4b4f0697b42a028d7d61b536a (patch) | |
tree | d074e827d97df9f0f70334dbad9f1d00caef7be6 | |
parent | a8ef632713b15a1d11a71b2eca3637e8d604c370 (diff) |
Handle mmap and sample events in time order
According to the perf file format documentation, events can
arrive in out-of-time-order. But no time-reordering should occur across
a FINISHED_ROUND event. This pseudo-event was added to perf in Linux
v2.6.35, released in 2010.
But, sadly, it is only being used for non-tracing perf data files since
Linux 3.17 from around 2014. Additionally, time order violations across
the FINISHED_ROUND event can be observed even for data files obtained
today.
As such, we cannot always use FINISHED_ROUND to order our events. Instead,
we use the following heuristic:
- we buffer both samples and mmap events
- when the combined size of both buffers exceeds a certain threshold,
defined via the new --buffer-size CLI argument which defaults to 10MB,
we flush the buffers:
-- we sort samples and events by time
-- then we iterate over the samples and handle all mmap events before
the sample time
-- we stop flushing the buffers when we handled half of the buffers
by size, i.e. when we only have 5MB left
- then we continue to buffer until we reach the threshold the next time
- when we finish, we flush the full buffer
This heuristic is only applied until we know that we can actually rely on
FINISHED_ROUND events. This is the case when:
- the user passed the CLI argument --buffer-size 0
- when we encountered a FINISHED_ROUND event
- when we encountered a PerfFeatures event that tells us the perf that
was used to record the data file is newer than 3.17
When we rely on FINISHED_ROUND events, we still only ever analyze half of
our buffers, to work-around the upstream issues in perf that lead to
time order violations across FINISHED_ROUND events.
While somewhat complicated, this patch allows us to simplify the elf map
significantly. This is done in the follow-up commits.
The statistics mode is further extended, which shows us how this new
behavior plays out in terms of memory consumption.
For a normal perf file where we rely on the finished round events we
observe the following values:
~~~~~~~~~~
samples: 20260
mmaps: 24331
rounds: 957
buffer flushes: 958
samples time violations: 0
mmaps time violations: 0
max samples per round: 75
max mmaps per round: 474
max samples per flush: 51
max mmaps per flush: 429
max buffer size: 847328
max total event size per round: 647040
max time: 1143246068673300
max time between rounds: 739096862
max reorder time: 738811755
~~~~~~~~~~
For a perf file that was recorded with `-m 8192`, we instead see the
following statistics. Note how the buffer size automatically is sized
to fit the actual round size, and no time violations occur:
~~~~~~~~~~
samples: 20068
mmaps: 24341
rounds: 24
buffer flushes: 25
samples time violations: 0
mmaps time violations: 0
max samples per round: 3577
max mmaps per round: 6103
max samples per flush: 1893
max mmaps per flush: 2910
max buffer size: 31946624
max total event size per round: 30106704
max time: 1141236144717550
max time between rounds: 965150979
max reorder time: 964957358
~~~~~~~~~~
When we parse a file in the streaming format, we cannot know at the
beginning that we should rely on the finished round events. Then
the statistics for a normal record look like this:
~~~~~~~~~~
samples: 20303
mmaps: 24339
rounds: 1029
buffer flushes: 1029
samples time violations: 0
mmaps time violations: 0
max samples per round: 98
max mmaps per round: 489
max samples per flush: 61
max mmaps per flush: 610
max buffer size: 1144496
max total event size per round: 863472
max time: 1143595838784853
max time between rounds: 554771976
max reorder time: 13834481
~~~~~~~~~~
If the record was done with `-m 8192` we instead observe some time
order violations at the beginning which could be work-arounded by
passing a larger buffer size. Once we encounter the first finished round
event, we follow those and do not suffer from time violations anymore:
~~~~~~~~~~
samples: 19854
mmaps: 24338
rounds: 21
buffer flushes: 24
samples time violations: 465
mmaps time violations: 395
max samples per round: 5115
max mmaps per round: 6316
max samples per flush: 2700
max mmaps per flush: 5364
max buffer size: 46458392
max total event size per round: 43698384
max time: 1143585885779204
max time between rounds: 912757434
max reorder time: 908114269
~~~~~~~~~~
When we parse a perf.data file without FINISHED_ROUND events, we get
for a normal file:
~~~~~~~~~~
samples: 20303
mmaps: 24339
rounds: 1
buffer flushes: 33
samples time violations: 0
mmaps time violations: 0
max samples per round: 20303
max mmaps per round: 24339
max samples per flush: 654
max mmaps per flush: 3004
max buffer size: 10494104
max total event size per round: 173458680
max time: 1143595838784853
max time between rounds: 0
max reorder time: 13834481
~~~~~~~~~~
If the file has huge buffers, i.e. again `-m 8192` was passed to perf record,
we instead see:
~~~~~~~~~~
samples: 19854
mmaps: 24338
rounds: 1
buffer flushes: 32
samples time violations: 4859
mmaps time violations: 3586
max samples per round: 19854
max mmaps per round: 24338
max samples per flush: 817
max mmaps per flush: 3089
max buffer size: 10493280
max total event size per round: 169569288
max time: 1143585885779204
max time between rounds: 0
max reorder time: 908114269
~~~~~~~~~~
Change-Id: I756c4cccf75b4ce0179e965996f8b821bf60e3dd
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
-rw-r--r-- | app/main.cpp | 21 | ||||
-rw-r--r-- | app/perfunwind.cpp | 169 | ||||
-rw-r--r-- | app/perfunwind.h | 27 |
3 files changed, 179 insertions, 38 deletions
diff --git a/app/main.cpp b/app/main.cpp index 8dc878c..41be9ab 100644 --- a/app/main.cpp +++ b/app/main.cpp @@ -157,6 +157,17 @@ int main(int argc, char *argv[]) "main", "Print statistics instead of converting the data.")); parser.addOption(printStats); + QCommandLineOption bufferSize(QLatin1String("buffer-size"), + QCoreApplication::translate( + "main", "Size of event buffer in kilobytes. This influences how" + " many events get buffered before they get sorted by time and " + " then analyzed. Increase this value when your perf.data file" + " was recorded with a large buffer value (perf record -m)." + " Pass 0 to buffer events until a FINISHED_ROUND event is " + " encountered. Default value is 10MB"), + QLatin1String("buffer-size"), QLatin1String("10240")); + parser.addOption(bufferSize); + parser.process(app); QScopedPointer<QFile> outfile; @@ -184,12 +195,20 @@ int main(int argc, char *argv[]) infile.reset(new PerfStdin); } + bool ok = false; + uint maxEventBufferSize = parser.value(bufferSize).toUInt(&ok) * 1024; + if (!ok) { + qWarning() << "Failed to parse buffer-size argument. Expected unsigned integer, got:" + << parser.value(bufferSize); + return InvalidOption; + } + PerfUnwind unwind(outfile.data(), parser.value(sysroot), parser.isSet(debug) ? parser.value(debug) : parser.value(sysroot) + parser.value(debug), parser.value(extra), parser.value(appPath), parser.isSet(kallsymsPath) ? parser.value(kallsymsPath) : parser.value(sysroot) + parser.value(kallsymsPath), - parser.isSet(printStats)); + parser.isSet(printStats), maxEventBufferSize); PerfHeader header(infile.data()); PerfAttributes attributes; diff --git a/app/perfunwind.cpp b/app/perfunwind.cpp index af4fc9d..6330799 100644 --- a/app/perfunwind.cpp +++ b/app/perfunwind.cpp @@ -24,6 +24,7 @@ #include <QDebug> #include <QtEndian> +#include <QVersionNumber> #include <cstring> @@ -54,6 +55,9 @@ void PerfUnwind::Stats::addEventTime(quint64 time) void PerfUnwind::Stats::finishedRound() { + numSamples += numSamplesInRound; + numMmaps += numMmapsInRound; + maxSamplesPerRound = std::max(maxSamplesPerRound, numSamplesInRound); maxMmapsPerRound = std::max(maxMmapsPerRound, numMmapsInRound); numSamplesInRound = 0; @@ -72,10 +76,10 @@ void PerfUnwind::Stats::finishedRound() PerfUnwind::PerfUnwind(QIODevice *output, const QString &systemRoot, const QString &debugPath, const QString &extraLibsPath, const QString &appPath, - const QString &kallsymsPath, bool printStats) : + const QString &kallsymsPath, bool printStats, uint maxEventBufferSize) : m_output(output), m_architecture(PerfRegisterInfo::ARCH_INVALID), m_systemRoot(systemRoot), m_extraLibsPath(extraLibsPath), m_appPath(appPath), m_kallsyms(kallsymsPath), - m_sampleBufferSize(0) + m_maxEventBufferSize(maxEventBufferSize), m_eventBufferSize(0), m_lastFlushMaxTime(0) { m_stats.enabled = printStats; m_currentUnwind.unwind = this; @@ -102,9 +106,7 @@ PerfUnwind::PerfUnwind(QIODevice *output, const QString &systemRoot, const QStri PerfUnwind::~PerfUnwind() { finishedRound(); - - foreach (const PerfRecordSample &sample, m_sampleBuffer) - analyze(sample); + flushEventBuffer(0); delete[] m_debugInfoPath; qDeleteAll(m_symbolTables); @@ -114,8 +116,13 @@ PerfUnwind::~PerfUnwind() out << "samples: " << m_stats.numSamples << "\n"; out << "mmaps: " << m_stats.numMmaps << "\n"; out << "rounds: " << m_stats.numRounds << "\n"; + out << "buffer flushes: " << m_stats.numBufferFlushes << "\n"; + out << "samples time violations: " << m_stats.numTimeViolatingSamples << "\n"; + out << "mmaps time violations: " << m_stats.numTimeViolatingMmaps << "\n"; out << "max samples per round: " << m_stats.maxSamplesPerRound << "\n"; out << "max mmaps per round: " << m_stats.maxMmapsPerRound << "\n"; + out << "max samples per flush: " << m_stats.maxSamplesPerFlush << "\n"; + out << "max mmaps per flush: " << m_stats.maxMmapsPerFlush << "\n"; out << "max buffer size: " << m_stats.maxBufferSize << "\n"; out << "max total event size per round: " << m_stats.maxTotalEventSizePerRound << "\n"; out << "max time: " << m_stats.maxTime << "\n"; @@ -139,15 +146,7 @@ Dwfl *PerfUnwind::dwfl(quint32 pid, quint64 timestamp) void PerfUnwind::registerElf(const PerfRecordMmap &mmap) { - if (m_stats.enabled) { - m_stats.totalEventSizePerRound += mmap.size(); - m_stats.addEventTime(mmap.time()); - ++m_stats.numMmapsInRound; - ++m_stats.numMmaps; - return; - } - - symbolTable(mmap.pid())->registerElf(mmap, m_appPath, m_systemRoot, m_extraLibsPath); + bufferEvent(mmap, &m_mmapBuffer, &m_stats.numMmapsInRound); } void PerfUnwind::sendBuffer(const QByteArray &buffer) @@ -226,6 +225,10 @@ void PerfUnwind::features(const PerfFeatures &features) for (const auto &desc : eventDescs) addAttributes(desc.attrs, desc.name, desc.ids); + const auto perfVersion = QVersionNumber::fromString(QString::fromLatin1(features.version())); + if (perfVersion >= QVersionNumber(3, 17)) + m_maxEventBufferSize = 0; + QByteArray buffer; QDataStream(&buffer, QIODevice::WriteOnly) << static_cast<quint8>(FeaturesDefinition) << features.hostName() @@ -365,24 +368,7 @@ void PerfUnwind::resolveCallchain() void PerfUnwind::sample(const PerfRecordSample &sample) { - m_sampleBuffer.append(sample); - m_sampleBufferSize += sample.size(); - - if (m_stats.enabled) { - m_stats.maxBufferSize = std::max(m_sampleBufferSize, m_stats.maxBufferSize); - m_stats.totalEventSizePerRound += sample.size(); - m_stats.addEventTime(sample.time()); - ++m_stats.numSamplesInRound; - ++m_stats.numSamples; - // don't return early, stats should include our buffer behavior - } - - while (m_sampleBufferSize > s_maxSampleBufferSize) { - const PerfRecordSample &sample = m_sampleBuffer.front(); - m_sampleBufferSize -= sample.size(); - analyze(sample); - m_sampleBuffer.removeFirst(); - } + bufferEvent(sample, &m_sampleBuffer, &m_stats.numSamplesInRound); } void PerfUnwind::analyze(const PerfRecordSample &sample) @@ -467,6 +453,15 @@ void PerfUnwind::sendSymbol(qint32 id, const PerfUnwind::Symbol &symbol) sendBuffer(buffer); } +void PerfUnwind::sendError(ErrorCode error, const QString &message) +{ + qWarning() << error << message; + QByteArray buffer; + QDataStream(&buffer, QIODevice::WriteOnly) << static_cast<quint8>(Error) + << error << message; + sendBuffer(buffer); +} + qint32 PerfUnwind::resolveString(const QByteArray& string) { if (string.isEmpty()) @@ -514,4 +509,114 @@ void PerfUnwind::finishedRound() { if (m_stats.enabled) m_stats.finishedRound(); + + // when we parse a perf data stream we may not know whether it contains + // FINISHED_ROUND events. now we know, and thus we set the m_maxEventBufferSize + // to 0 to disable the heuristic there. Instead, we will now rely on the finished + // round events to tell us when to flush the event buffer + if (!m_maxEventBufferSize) { + // we only flush half of the events we got in this round + // this work-arounds bugs in upstream perf which leads to time order violations + // across FINISHED_ROUND events which should in theory never happen + flushEventBuffer(m_eventBufferSize / 2); + } else { + m_maxEventBufferSize = 0; + } +} + +template<typename Event> +void PerfUnwind::bufferEvent(const Event &event, QList<Event> *buffer, int *eventCounter) +{ + buffer->append(event); + m_eventBufferSize += event.size(); + + if (m_stats.enabled) { + *eventCounter += 1; + m_stats.maxBufferSize = std::max(m_eventBufferSize, m_stats.maxBufferSize); + m_stats.totalEventSizePerRound += event.size(); + m_stats.addEventTime(event.time()); + // don't return early, stats should include our buffer behavior + } + + if (m_maxEventBufferSize && m_eventBufferSize > m_maxEventBufferSize) + flushEventBuffer(m_maxEventBufferSize / 2); +} + +void PerfUnwind::flushEventBuffer(uint desiredBufferSize) +{ + auto sortByTime = [](const PerfRecord &lhs, const PerfRecord &rhs) { + return lhs.time() < rhs.time(); + }; + std::sort(m_mmapBuffer.begin(), m_mmapBuffer.end(), sortByTime); + std::sort(m_sampleBuffer.begin(), m_sampleBuffer.end(), sortByTime); + + if (m_stats.enabled) { + for (const auto &sample : m_sampleBuffer) { + if (sample.time() < m_lastFlushMaxTime) + ++m_stats.numTimeViolatingSamples; + else + break; + } + for (const auto &mmap : m_mmapBuffer) { + if (mmap.time() < m_lastFlushMaxTime) + ++m_stats.numTimeViolatingMmaps; + else + break; + } + } + + if (!m_mmapBuffer.isEmpty() && m_mmapBuffer.first().time() < m_lastFlushMaxTime) { + // when an mmap event is not following our desired time order, it can + // severly break our analysis. as such we report a real error in these cases + sendError(TimeOrderViolation, + tr("Time order violation of MMAP event across buffer flush detected. " + "Event time is %1, max time during last buffer flush was %2. " + "This potentially breaks the data analysis.") + .arg(m_mmapBuffer.first().time()).arg(m_lastFlushMaxTime)); + } + + auto mmapIt = m_mmapBuffer.begin(); + auto mmapEnd = m_mmapBuffer.end(); + + auto sampleIt = m_sampleBuffer.begin(); + auto sampleEnd = m_sampleBuffer.end(); + + for (; m_eventBufferSize > desiredBufferSize && sampleIt != sampleEnd; ++sampleIt) { + const auto &sample = *sampleIt; + + if (sample.time() < m_lastFlushMaxTime) { + qWarning() << "Time order violation across buffer flush detected:" + << "Event time =" << sample.time() << "," + << "max time during last buffer flush = " << m_lastFlushMaxTime; + // we don't send an error for samples with broken times, since these + // are usually harmless and actually occur relatively often + // if desired, one can detect these issues on the client side anyways, + // based on the sample times + } else { + m_lastFlushMaxTime = sample.time(); + } + + for (; mmapIt != mmapEnd && mmapIt->time() <= sample.time(); ++mmapIt) { + if (!m_stats.enabled) { + symbolTable(mmapIt->pid())->registerElf(*mmapIt, m_appPath, + m_systemRoot, + m_extraLibsPath); + } + m_eventBufferSize -= mmapIt->size(); + } + + analyze(sample); + m_eventBufferSize -= sample.size(); + } + + if (m_stats.enabled) { + ++m_stats.numBufferFlushes; + const int samples = std::distance(m_sampleBuffer.begin(), sampleIt); + m_stats.maxSamplesPerFlush = std::max(samples, m_stats.maxSamplesPerFlush); + const int mmaps = std::distance(m_mmapBuffer.begin(), mmapIt); + m_stats.maxMmapsPerFlush = std::max(mmaps, m_stats.maxMmapsPerFlush); + } + + m_sampleBuffer.erase(m_sampleBuffer.begin(), sampleIt); + m_mmapBuffer.erase(m_mmapBuffer.begin(), mmapIt); } diff --git a/app/perfunwind.h b/app/perfunwind.h index b077ec8..b4c4859 100644 --- a/app/perfunwind.h +++ b/app/perfunwind.h @@ -58,6 +58,7 @@ public: StringDefinition, LostDefinition, FeaturesDefinition, + Error, InvalidType }; @@ -104,7 +105,7 @@ public: PerfUnwind(QIODevice *output, const QString &systemRoot, const QString &debugInfo, const QString &extraLibs, const QString &appPath, - const QString &kallsymsPath, bool printStats); + const QString &kallsymsPath, bool printStats, uint maxEventBufferSize); ~PerfUnwind(); PerfRegisterInfo::Architecture architecture() const { return m_architecture; } @@ -179,6 +180,7 @@ private: QString m_appPath; QList<PerfRecordSample> m_sampleBuffer; + QList<PerfRecordMmap> m_mmapBuffer; QHash<quint32, PerfSymbolTable *> m_symbolTables; PerfKallsyms m_kallsyms; @@ -188,16 +190,18 @@ private: QHash<quint64, qint32> m_attributeIds; QHash<PerfEventAttributes, qint32> m_attributes; - uint m_sampleBufferSize; - - static const uint s_maxSampleBufferSize = 1024 * 1024; + uint m_maxEventBufferSize; + uint m_eventBufferSize; + quint64 m_lastFlushMaxTime; struct Stats { Stats() - : numSamples(0), numMmaps(0), numRounds(0), + : numSamples(0), numMmaps(0), numRounds(0), numBufferFlushes(0), + numTimeViolatingSamples(0), numTimeViolatingMmaps(0), numSamplesInRound(0), numMmapsInRound(0), maxSamplesPerRound(0), maxMmapsPerRound(0), + maxSamplesPerFlush(0), maxMmapsPerFlush(0), maxBufferSize(0), maxTotalEventSizePerRound(0), maxTime(0), maxTimeBetweenRounds(0), maxReorderTime(0), lastRoundTime(0), totalEventSizePerRound(0), @@ -210,10 +214,15 @@ private: quint64 numSamples; quint64 numMmaps; quint64 numRounds; + quint64 numBufferFlushes; + quint64 numTimeViolatingSamples; + quint64 numTimeViolatingMmaps; int numSamplesInRound; int numMmapsInRound; int maxSamplesPerRound; int maxMmapsPerRound; + int maxSamplesPerFlush; + int maxMmapsPerFlush; uint maxBufferSize; uint maxTotalEventSizePerRound; quint64 maxTime; @@ -233,6 +242,14 @@ private: void sendLocation(qint32 id, const Location &location); void sendSymbol(qint32 id, const Symbol &symbol); void sendAttributes(qint32 id, const PerfEventAttributes &attributes, const QByteArray &name); + enum ErrorCode { + TimeOrderViolation = 1 + }; + void sendError(ErrorCode error, const QString &message); + + template<typename Event> + void bufferEvent(const Event &event, QList<Event> *buffer, int *eventCounter); + void flushEventBuffer(uint desiredBufferSize); }; uint qHash(const PerfUnwind::Location &location, uint seed = 0); |