Handle mmap and sample events in time order

According to the perf file format documentation, events can arrive in out-of-time-order. But no time-reordering should occur across a FINISHED_ROUND event. This pseudo-event was added to perf in Linux v2.6.35, released in 2010. But, sadly, it is only being used for non-tracing perf data files since Linux 3.17 from around 2014. Additionally, time order violations across the FINISHED_ROUND event can be observed even for data files obtained today. As such, we cannot always use FINISHED_ROUND to order our events. Instead, we use the following heuristic: - we buffer both samples and mmap events - when the combined size of both buffers exceeds a certain threshold, defined via the new --buffer-size CLI argument which defaults to 10MB, we flush the buffers: -- we sort samples and events by time -- then we iterate over the samples and handle all mmap events before the sample time -- we stop flushing the buffers when we handled half of the buffers by size, i.e. when we only have 5MB left - then we continue to buffer until we reach the threshold the next time - when we finish, we flush the full buffer This heuristic is only applied until we know that we can actually rely on FINISHED_ROUND events. This is the case when: - the user passed the CLI argument --buffer-size 0 - when we encountered a FINISHED_ROUND event - when we encountered a PerfFeatures event that tells us the perf that was used to record the data file is newer than 3.17 When we rely on FINISHED_ROUND events, we still only ever analyze half of our buffers, to work-around the upstream issues in perf that lead to time order violations across FINISHED_ROUND events. While somewhat complicated, this patch allows us to simplify the elf map significantly. This is done in the follow-up commits. The statistics mode is further extended, which shows us how this new behavior plays out in terms of memory consumption. For a normal perf file where we rely on the finished round events we observe the following values: ~~~~~~~~~~ samples: 20260 mmaps: 24331 rounds: 957 buffer flushes: 958 samples time violations: 0 mmaps time violations: 0 max samples per round: 75 max mmaps per round: 474 max samples per flush: 51 max mmaps per flush: 429 max buffer size: 847328 max total event size per round: 647040 max time: 1143246068673300 max time between rounds: 739096862 max reorder time: 738811755 ~~~~~~~~~~ For a perf file that was recorded with `-m 8192`, we instead see the following statistics. Note how the buffer size automatically is sized to fit the actual round size, and no time violations occur: ~~~~~~~~~~ samples: 20068 mmaps: 24341 rounds: 24 buffer flushes: 25 samples time violations: 0 mmaps time violations: 0 max samples per round: 3577 max mmaps per round: 6103 max samples per flush: 1893 max mmaps per flush: 2910 max buffer size: 31946624 max total event size per round: 30106704 max time: 1141236144717550 max time between rounds: 965150979 max reorder time: 964957358 ~~~~~~~~~~ When we parse a file in the streaming format, we cannot know at the beginning that we should rely on the finished round events. Then the statistics for a normal record look like this: ~~~~~~~~~~ samples: 20303 mmaps: 24339 rounds: 1029 buffer flushes: 1029 samples time violations: 0 mmaps time violations: 0 max samples per round: 98 max mmaps per round: 489 max samples per flush: 61 max mmaps per flush: 610 max buffer size: 1144496 max total event size per round: 863472 max time: 1143595838784853 max time between rounds: 554771976 max reorder time: 13834481 ~~~~~~~~~~ If the record was done with `-m 8192` we instead observe some time order violations at the beginning which could be work-arounded by passing a larger buffer size. Once we encounter the first finished round event, we follow those and do not suffer from time violations anymore: ~~~~~~~~~~ samples: 19854 mmaps: 24338 rounds: 21 buffer flushes: 24 samples time violations: 465 mmaps time violations: 395 max samples per round: 5115 max mmaps per round: 6316 max samples per flush: 2700 max mmaps per flush: 5364 max buffer size: 46458392 max total event size per round: 43698384 max time: 1143585885779204 max time between rounds: 912757434 max reorder time: 908114269 ~~~~~~~~~~ When we parse a perf.data file without FINISHED_ROUND events, we get for a normal file: ~~~~~~~~~~ samples: 20303 mmaps: 24339 rounds: 1 buffer flushes: 33 samples time violations: 0 mmaps time violations: 0 max samples per round: 20303 max mmaps per round: 24339 max samples per flush: 654 max mmaps per flush: 3004 max buffer size: 10494104 max total event size per round: 173458680 max time: 1143595838784853 max time between rounds: 0 max reorder time: 13834481 ~~~~~~~~~~ If the file has huge buffers, i.e. again `-m 8192` was passed to perf record, we instead see: ~~~~~~~~~~ samples: 19854 mmaps: 24338 rounds: 1 buffer flushes: 32 samples time violations: 4859 mmaps time violations: 3586 max samples per round: 19854 max mmaps per round: 24338 max samples per flush: 817 max mmaps per flush: 3089 max buffer size: 10493280 max total event size per round: 169569288 max time: 1143585885779204 max time between rounds: 0 max reorder time: 908114269 ~~~~~~~~~~ Change-Id: I756c4cccf75b4ce0179e965996f8b821bf60e3dd Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
author: Milian Wolff <milian.wolff@kdab.com> 2017-03-29 16:33:53 +0200
committer: Milian Wolff <milian.wolff@kdab.com> 2017-03-29 14:52:53 +0000
commit: 62c48e4890eed1e4b4f0697b42a028d7d61b536a (patch)
tree: d074e827d97df9f0f70334dbad9f1d00caef7be6
parent: a8ef632713b15a1d11a71b2eca3637e8d604c370 (diff)
3 files changed, 179 insertions, 38 deletions
diff --git a/app/main.cpp b/app/main.cpp
index 8dc878c..41be9ab 100644
--- a/app/main.cpp
+++ b/app/main.cpp
@@ -157,6 +157,17 @@ int main(int argc, char *argv[])
                                    "main", "Print statistics instead of converting the data."));
     parser.addOption(printStats);
 
+    QCommandLineOption bufferSize(QLatin1String("buffer-size"),
+                                  QCoreApplication::translate(
+                                   "main", "Size of event buffer in kilobytes. This influences how"
+                                   " many events get buffered before they get sorted by time and "
+                                   " then analyzed. Increase this value when your perf.data file"
+                                   " was recorded with a large buffer value (perf record -m)."
+                                   " Pass 0 to buffer events until a FINISHED_ROUND event is "
+                                   " encountered. Default value is 10MB"),
+                                   QLatin1String("buffer-size"), QLatin1String("10240"));
+    parser.addOption(bufferSize);
+
     parser.process(app);
 
     QScopedPointer<QFile> outfile;
@@ -184,12 +195,20 @@ int main(int argc, char *argv[])
             infile.reset(new PerfStdin);
     }
 
+    bool ok = false;
+    uint maxEventBufferSize = parser.value(bufferSize).toUInt(&ok) * 1024;
+    if (!ok) {
+        qWarning() << "Failed to parse buffer-size argument. Expected unsigned integer, got:"
+                   << parser.value(bufferSize);
+        return InvalidOption;
+    }
+
     PerfUnwind unwind(outfile.data(), parser.value(sysroot), parser.isSet(debug) ?
                           parser.value(debug) : parser.value(sysroot) + parser.value(debug),
                       parser.value(extra), parser.value(appPath), parser.isSet(kallsymsPath)
                         ? parser.value(kallsymsPath)
                         : parser.value(sysroot) + parser.value(kallsymsPath),
-                      parser.isSet(printStats));
+                      parser.isSet(printStats), maxEventBufferSize);
 
     PerfHeader header(infile.data());
     PerfAttributes attributes;
diff --git a/app/perfunwind.cpp b/app/perfunwind.cpp
index af4fc9d..6330799 100644
--- a/app/perfunwind.cpp
+++ b/app/perfunwind.cpp
@@ -24,6 +24,7 @@
 
 #include <QDebug>
 #include <QtEndian>
+#include <QVersionNumber>
 
 #include <cstring>
 
@@ -54,6 +55,9 @@ void PerfUnwind::Stats::addEventTime(quint64 time)
 
 void PerfUnwind::Stats::finishedRound()
 {
+    numSamples += numSamplesInRound;
+    numMmaps += numMmapsInRound;
+
     maxSamplesPerRound = std::max(maxSamplesPerRound, numSamplesInRound);
     maxMmapsPerRound = std::max(maxMmapsPerRound, numMmapsInRound);
     numSamplesInRound = 0;
@@ -72,10 +76,10 @@ void PerfUnwind::Stats::finishedRound()
 
 PerfUnwind::PerfUnwind(QIODevice *output, const QString &systemRoot, const QString &debugPath,
                        const QString &extraLibsPath, const QString &appPath,
-                       const QString &kallsymsPath, bool printStats) :
+                       const QString &kallsymsPath, bool printStats, uint maxEventBufferSize) :
     m_output(output), m_architecture(PerfRegisterInfo::ARCH_INVALID), m_systemRoot(systemRoot),
     m_extraLibsPath(extraLibsPath), m_appPath(appPath), m_kallsyms(kallsymsPath),
-    m_sampleBufferSize(0)
+    m_maxEventBufferSize(maxEventBufferSize), m_eventBufferSize(0), m_lastFlushMaxTime(0)
 {
     m_stats.enabled = printStats;
     m_currentUnwind.unwind = this;
@@ -102,9 +106,7 @@ PerfUnwind::PerfUnwind(QIODevice *output, const QString &systemRoot, const QStri
 PerfUnwind::~PerfUnwind()
 {
     finishedRound();
-
-    foreach (const PerfRecordSample &sample, m_sampleBuffer)
-        analyze(sample);
+    flushEventBuffer(0);
 
     delete[] m_debugInfoPath;
     qDeleteAll(m_symbolTables);
@@ -114,8 +116,13 @@ PerfUnwind::~PerfUnwind()
         out << "samples: " << m_stats.numSamples << "\n";
         out << "mmaps: " << m_stats.numMmaps << "\n";
         out << "rounds: " << m_stats.numRounds << "\n";
+        out << "buffer flushes: " << m_stats.numBufferFlushes << "\n";
+        out << "samples time violations: " << m_stats.numTimeViolatingSamples << "\n";
+        out << "mmaps time violations: " << m_stats.numTimeViolatingMmaps << "\n";
         out << "max samples per round: " << m_stats.maxSamplesPerRound << "\n";
         out << "max mmaps per round: " << m_stats.maxMmapsPerRound << "\n";
+        out << "max samples per flush: " << m_stats.maxSamplesPerFlush << "\n";
+        out << "max mmaps per flush: " << m_stats.maxMmapsPerFlush << "\n";
         out << "max buffer size: " << m_stats.maxBufferSize << "\n";
         out << "max total event size per round: " << m_stats.maxTotalEventSizePerRound << "\n";
         out << "max time: " << m_stats.maxTime << "\n";
@@ -139,15 +146,7 @@ Dwfl *PerfUnwind::dwfl(quint32 pid, quint64 timestamp)
 
 void PerfUnwind::registerElf(const PerfRecordMmap &mmap)
 {
-    if (m_stats.enabled) {
-        m_stats.totalEventSizePerRound += mmap.size();
-        m_stats.addEventTime(mmap.time());
-        ++m_stats.numMmapsInRound;
-        ++m_stats.numMmaps;
-        return;
-    }
-
-    symbolTable(mmap.pid())->registerElf(mmap, m_appPath, m_systemRoot, m_extraLibsPath);
+    bufferEvent(mmap, &m_mmapBuffer, &m_stats.numMmapsInRound);
 }
 
 void PerfUnwind::sendBuffer(const QByteArray &buffer)
@@ -226,6 +225,10 @@ void PerfUnwind::features(const PerfFeatures &features)
     for (const auto &desc : eventDescs)
         addAttributes(desc.attrs, desc.name, desc.ids);
 
+    const auto perfVersion = QVersionNumber::fromString(QString::fromLatin1(features.version()));
+    if (perfVersion >= QVersionNumber(3, 17))
+        m_maxEventBufferSize = 0;
+
     QByteArray buffer;
     QDataStream(&buffer, QIODevice::WriteOnly) << static_cast<quint8>(FeaturesDefinition)
                                                << features.hostName()
@@ -365,24 +368,7 @@ void PerfUnwind::resolveCallchain()
 
 void PerfUnwind::sample(const PerfRecordSample &sample)
 {
-    m_sampleBuffer.append(sample);
-    m_sampleBufferSize += sample.size();
-
-    if (m_stats.enabled) {
-        m_stats.maxBufferSize = std::max(m_sampleBufferSize, m_stats.maxBufferSize);
-        m_stats.totalEventSizePerRound += sample.size();
-        m_stats.addEventTime(sample.time());
-        ++m_stats.numSamplesInRound;
-        ++m_stats.numSamples;
-        // don't return early, stats should include our buffer behavior
-    }
-
-    while (m_sampleBufferSize > s_maxSampleBufferSize) {
-        const PerfRecordSample &sample = m_sampleBuffer.front();
-        m_sampleBufferSize -= sample.size();
-        analyze(sample);
-        m_sampleBuffer.removeFirst();
-    }
+    bufferEvent(sample, &m_sampleBuffer, &m_stats.numSamplesInRound);
 }
 
 void PerfUnwind::analyze(const PerfRecordSample &sample)
@@ -467,6 +453,15 @@ void PerfUnwind::sendSymbol(qint32 id, const PerfUnwind::Symbol &symbol)
     sendBuffer(buffer);
 }
 
+void PerfUnwind::sendError(ErrorCode error, const QString &message)
+{
+    qWarning() << error << message;
+    QByteArray buffer;
+    QDataStream(&buffer, QIODevice::WriteOnly) << static_cast<quint8>(Error)
+                                               << error << message;
+    sendBuffer(buffer);
+}
+
 qint32 PerfUnwind::resolveString(const QByteArray& string)
 {
     if (string.isEmpty())
@@ -514,4 +509,114 @@ void PerfUnwind::finishedRound()
 {
     if (m_stats.enabled)
         m_stats.finishedRound();
+
+    // when we parse a perf data stream we may not know whether it contains
+    // FINISHED_ROUND events. now we know, and thus we set the m_maxEventBufferSize
+    // to 0 to disable the heuristic there. Instead, we will now rely on the finished
+    // round events to tell us when to flush the event buffer
+    if (!m_maxEventBufferSize) {
+        // we only flush half of the events we got in this round
+        // this work-arounds bugs in upstream perf which leads to time order violations
+        // across FINISHED_ROUND events which should in theory never happen
+        flushEventBuffer(m_eventBufferSize / 2);
+    } else {
+        m_maxEventBufferSize = 0;
+    }
+}
+
+template<typename Event>
+void PerfUnwind::bufferEvent(const Event &event, QList<Event> *buffer, int *eventCounter)
+{
+    buffer->append(event);
+    m_eventBufferSize += event.size();
+
+    if (m_stats.enabled) {
+        *eventCounter += 1;
+        m_stats.maxBufferSize = std::max(m_eventBufferSize, m_stats.maxBufferSize);
+        m_stats.totalEventSizePerRound += event.size();
+        m_stats.addEventTime(event.time());
+        // don't return early, stats should include our buffer behavior
+    }
+
+    if (m_maxEventBufferSize && m_eventBufferSize > m_maxEventBufferSize)
+        flushEventBuffer(m_maxEventBufferSize / 2);
+}
+
+void PerfUnwind::flushEventBuffer(uint desiredBufferSize)
+{
+    auto sortByTime = [](const PerfRecord &lhs, const PerfRecord &rhs) {
+        return lhs.time() < rhs.time();
+    };
+    std::sort(m_mmapBuffer.begin(), m_mmapBuffer.end(), sortByTime);
+    std::sort(m_sampleBuffer.begin(), m_sampleBuffer.end(), sortByTime);
+
+    if (m_stats.enabled) {
+        for (const auto &sample : m_sampleBuffer) {
+            if (sample.time() < m_lastFlushMaxTime)
+                ++m_stats.numTimeViolatingSamples;
+            else
+                break;
+        }
+        for (const auto &mmap : m_mmapBuffer) {
+            if (mmap.time() < m_lastFlushMaxTime)
+                ++m_stats.numTimeViolatingMmaps;
+            else
+                break;
+        }
+    }
+
+    if (!m_mmapBuffer.isEmpty() && m_mmapBuffer.first().time() < m_lastFlushMaxTime) {
+        // when an mmap event is not following our desired time order, it can
+        // severly break our analysis. as such we report a real error in these cases
+        sendError(TimeOrderViolation,
+                  tr("Time order violation of MMAP event across buffer flush detected. "
+                     "Event time is %1, max time during last buffer flush was %2. "
+                     "This potentially breaks the data analysis.")
+                    .arg(m_mmapBuffer.first().time()).arg(m_lastFlushMaxTime));
+    }
+
+    auto mmapIt = m_mmapBuffer.begin();
+    auto mmapEnd = m_mmapBuffer.end();
+
+    auto sampleIt = m_sampleBuffer.begin();
+    auto sampleEnd = m_sampleBuffer.end();
+
+    for (; m_eventBufferSize > desiredBufferSize && sampleIt != sampleEnd; ++sampleIt) {
+        const auto &sample = *sampleIt;
+
+        if (sample.time() < m_lastFlushMaxTime) {
+            qWarning() << "Time order violation across buffer flush detected:"
+                       << "Event time =" << sample.time() << ","
+                       << "max time during last buffer flush = " << m_lastFlushMaxTime;
+            // we don't send an error for samples with broken times, since these
+            // are usually harmless and actually occur relatively often
+            // if desired, one can detect these issues on the client side anyways,
+            // based on the sample times
+        } else {
+            m_lastFlushMaxTime = sample.time();
+        }
+
+        for (; mmapIt != mmapEnd && mmapIt->time() <= sample.time(); ++mmapIt) {
+            if (!m_stats.enabled) {
+                symbolTable(mmapIt->pid())->registerElf(*mmapIt, m_appPath,
+                                                        m_systemRoot,
+                                                        m_extraLibsPath);
+            }
+            m_eventBufferSize -= mmapIt->size();
+        }
+
+        analyze(sample);
+        m_eventBufferSize -= sample.size();
+    }
+
+    if (m_stats.enabled) {
+        ++m_stats.numBufferFlushes;
+        const int samples = std::distance(m_sampleBuffer.begin(), sampleIt);
+        m_stats.maxSamplesPerFlush = std::max(samples, m_stats.maxSamplesPerFlush);
+        const int mmaps = std::distance(m_mmapBuffer.begin(), mmapIt);
+        m_stats.maxMmapsPerFlush = std::max(mmaps, m_stats.maxMmapsPerFlush);
+    }
+
+    m_sampleBuffer.erase(m_sampleBuffer.begin(), sampleIt);
+    m_mmapBuffer.erase(m_mmapBuffer.begin(), mmapIt);
 }
diff --git a/app/perfunwind.h b/app/perfunwind.h
index b077ec8..b4c4859 100644
--- a/app/perfunwind.h
+++ b/app/perfunwind.h
@@ -58,6 +58,7 @@ public:
         StringDefinition,
         LostDefinition,
         FeaturesDefinition,
+        Error,
         InvalidType
     };
 
@@ -104,7 +105,7 @@ public:
 
     PerfUnwind(QIODevice *output, const QString &systemRoot, const QString &debugInfo,
                const QString &extraLibs, const QString &appPath,
-               const QString &kallsymsPath, bool printStats);
+               const QString &kallsymsPath, bool printStats, uint maxEventBufferSize);
     ~PerfUnwind();
 
     PerfRegisterInfo::Architecture architecture() const { return m_architecture; }
@@ -179,6 +180,7 @@ private:
     QString m_appPath;
 
     QList<PerfRecordSample> m_sampleBuffer;
+    QList<PerfRecordMmap> m_mmapBuffer;
     QHash<quint32, PerfSymbolTable *> m_symbolTables;
     PerfKallsyms m_kallsyms;
 
@@ -188,16 +190,18 @@ private:
     QHash<quint64, qint32> m_attributeIds;
     QHash<PerfEventAttributes, qint32> m_attributes;
 
-    uint m_sampleBufferSize;
-
-    static const uint s_maxSampleBufferSize = 1024 * 1024;
+    uint m_maxEventBufferSize;
+    uint m_eventBufferSize;
+    quint64 m_lastFlushMaxTime;
 
     struct Stats
     {
         Stats()
-            : numSamples(0), numMmaps(0), numRounds(0),
+            : numSamples(0), numMmaps(0), numRounds(0), numBufferFlushes(0),
+            numTimeViolatingSamples(0), numTimeViolatingMmaps(0),
             numSamplesInRound(0), numMmapsInRound(0),
             maxSamplesPerRound(0), maxMmapsPerRound(0),
+            maxSamplesPerFlush(0), maxMmapsPerFlush(0),
             maxBufferSize(0), maxTotalEventSizePerRound(0),
             maxTime(0), maxTimeBetweenRounds(0), maxReorderTime(0),
             lastRoundTime(0), totalEventSizePerRound(0),
@@ -210,10 +214,15 @@ private:
         quint64 numSamples;
         quint64 numMmaps;
         quint64 numRounds;
+        quint64 numBufferFlushes;
+        quint64 numTimeViolatingSamples;
+        quint64 numTimeViolatingMmaps;
         int numSamplesInRound;
         int numMmapsInRound;
         int maxSamplesPerRound;
         int maxMmapsPerRound;
+        int maxSamplesPerFlush;
+        int maxMmapsPerFlush;
         uint maxBufferSize;
         uint maxTotalEventSizePerRound;
         quint64 maxTime;
@@ -233,6 +242,14 @@ private:
     void sendLocation(qint32 id, const Location &location);
     void sendSymbol(qint32 id, const Symbol &symbol);
     void sendAttributes(qint32 id, const PerfEventAttributes &attributes, const QByteArray &name);
+    enum ErrorCode {
+        TimeOrderViolation = 1
+    };
+    void sendError(ErrorCode error, const QString &message);
+
+    template<typename Event>
+    void bufferEvent(const Event &event, QList<Event> *buffer, int *eventCounter);
+    void flushEventBuffer(uint desiredBufferSize);
 };
 
 uint qHash(const PerfUnwind::Location &location, uint seed = 0);
author	Milian Wolff <milian.wolff@kdab.com>	2017-03-29 16:33:53 +0200
committer	Milian Wolff <milian.wolff@kdab.com>	2017-03-29 14:52:53 +0000
commit	62c48e4890eed1e4b4f0697b42a028d7d61b536a (patch)
tree	d074e827d97df9f0f70334dbad9f1d00caef7be6
parent	a8ef632713b15a1d11a71b2eca3637e8d604c370 (diff)