Clang - Performance

This page tracks the compile time performance of Clang on two interesting benchmarks:

Experiments

Measurements are done by serially processing each file in the respective benchmark, using Clang, gcc, and llvm-gcc as compilers. In order to track the performance of various subsystems the timings have been broken down into separate stages where possible:

This set of stages is chosen to be approximately additive, that is each subsequent stage simply adds some additional processing. The timings measure the delta of the given stage from the previous one. For example, the timings for -fsyntax-only below show the difference of running with -fsyntax-only versus running with -parse-noop (for clang) or -MM with gcc and llvm-gcc. This amounts to a fairly accurate measure of only the time to perform semantic analysis (and parsing, in the case of gcc and llvm-gcc).

These timings are chosen to break down the compilation process for clang as much as possible. The graphs below show these numbers combined so that it is easy to see how the time for a particular task is divided among various components. For example, -S -O0 includes the time of -fsyntax-only and -emit-llvm -O0.

Note that we already know that the LLVM optimizers are substantially (30-40%) faster than the GCC optimizers at a given -O level, so we only focus on -O0 compile time here.

Timing Results

2008-10-31

Sketch

Sketch Timings

This shows Clang's substantial performance improvements in preprocessing and semantic analysis; over 90% faster on -fsyntax-only. As expected, time spent in code generation for this benchmark is relatively small. One caveat, Clang's debug information generation for Objective-C is very incomplete; this means the -S -O0 -g numbers are unfair since Clang is generating substantially less output.

This chart also shows the effect of using precompiled headers (PCH) on compiler time. gcc and llvm-gcc see a large performance improvement with PCH; about 4x in wall time. Unfortunately, Clang does not yet have an implementation of PCH-style optimizations, but we are actively working to address this.

176.gcc

176.gcc Timings

Unlike the Sketch timings, compilation of 176.gcc involves a large amount of code generation. The time spent in Clang's LLVM IR generation and code generation is on par with gcc's code generation time but the improved parsing & semantic analysis performance means Clang still comes in at ~29% faster versus gcc on -S -O0 -g and ~20% faster versus llvm-gcc.

These numbers indicate that Clang still has room for improvement in several areas, notably our LLVM IR generation is significantly slower than that of llvm-gcc, and both Clang and llvm-gcc incur a significantly higher cost for adding debugging information compared to gcc.