On performance in .NET

Formerly, when someone asked me “How to make my C# program faster?”, my go-to advice was:

Improve your computational complexity.
Switch to a native language.

Of these suggestions, step #1 should take care of performance in most situations, if done properly. However, sometimes it is not enough. For example, you are writing a user-facing real-time performance-critical application, such as the C# compiler, and you need to look not only at the average performance, but at the 99^th percentile. This is where step #2 used to take over… but not anymore. Currently, my go-to advice looks like this:

Improve your computational complexity.
1. Read “Writing High-Performance .NET Code”, Roslyn team blog, and the rest of the links provided below.
2. Switch to a native language.

Measurement tools

“Writing High-Performance .NET Code” should be your desk book for any performance-critical .NET application. I found the following knowledge it provides most important:

What profiling tools exist in the .NET world (PerfView, ETW, CLR Profiler, dotTrace, dotMemory, Windbg…), how to use each one of them, how to answer specific questions with a combination of these tools, which metrics should you look at and why.
How exactly .NET GC works and why you should care. How to help .NET GC by managing your allocations and object lifecycle carefully.
JIT mechanics, NGEN, when to use each of them, startup rates.

In other words, when you have a specific problem with your .NET application, this book will show how to use the tools to locate who exactly is responsible for the bottleneck, how often it occurs, and so on. Often you will be surprised, because (assuming you’ve already gone through step #1) your greatest bottleneck is going to be either GC, allocations, JIT, boxing, or incorrect usage of .NET API. All of these subtle engineering details are quite unobvious if you don’t train yourself to keep them in mind.

Performance-minded coding

When you know the consequence of the problem (e.g. “my app spends 20% of the time in GC”, which arises because of “there’s this pinned handle that points to a long retention path that is being walked on every gen-2 collection”), the next step is to eliminate it. The series of links below (along with the accompanying chapters in “Writing High-Performance .NET Code”) explains the techniques for doing it. If you know that this particular path in your code is going to be performance-critical up to the 99^th percentile (never optimize prematurely!), you should really think of these techniques as guiding principles to keep in mind every second during your coding process.

Here are some must-read articles for any .NET programmer that works with performance-critical code:

“Essential Truths Everyone Should Know about Performance in a Large Managed Codebase” — this BUILD 2013 talk was done by Dustin Campbell, a program manager on the Visual Studio/Roslyn team. In it, Dustin discusses the techniques they used when they rewrote the VB.NET/C# compiler from C++ to C# to make the new version perform almost as fast as the native one.
“Roslyn code base – performance lessons” (part 1, part 2) — written by Matt Warren, an enthusiast who looked through the open-sourced Roslyn code, and outlined even more tecnhiques from the codebase in a series of blog articles.
Really, you should read the entire Matt’s blog, it is full of performance wisdom.
“In managed code we trust, our recent battles with the .NET Garbage Collector” — this article is from the StackOverflow team, and it tells a similar performance investigation story. It’s interesting to see how they tracked down the issue and what did they do to eliminate it.

I’m going to list some of the mentioned techniques below, briefly. Again, the book and the sources above have already done it much better than me.

Avoid allocations

For every allocation you make, the .NET GC will spend a little more time on every future collection. If this instance is relatively long-lived, it’s going to eventually move to gen-1 and gen-2, and greatly affect the future GC times. But even if it is short-lived, the performance of gen-0 collection is going to matter in your higher percentiles anyway.

Some of the common sources of hidden allocations are:

LINQ queries: they allocate enumerators and lambda-representing objects that capture references in a closure. Also, eventually you need to convert a query to a list/array, and it’s going to expand itself several times in the process. Instead you should allocate a list/array of a known capacity beforehand, and do all the required work directly, without allocating hidden LINQ enumerators.
Lambdas: if they capture any references from the outer scope, they will be rewritten to compiler-generated objects. If your lambda operates in a generic context this object will not even be cached in place, and will be re-created every time. Consider refactoring your logic in such a way that lambda becomes non-capturing (such lambdas compile down to static methods), or get rid of the lambda altogether.

UPD: This behavior will change in the upcomping C# 6. In Roslyn, lambda functions now always compile down to instance calls, which causes up to 35% performance gain in common scenarios.
Strings: because they are immutable, every string modification allocates a new string object. This is most subtle in calls like string.Trim(' '), which actually returns a new string. Instead use index-based arithmetic on hot paths. For case-insensitive comparison, use the corresponding string.Compare overload instead of converting all your strings to lowercase with string.ToLower. Also, string concatenation for simple cases is much faster than string formatting.
Invoking a method with params always allocates an array, possibly of zero length. Consider creating specialized overloads for most common numbers of parameters.

Use object pooling

If you use some objects often but temporarily, consider allocating a pool of objects of this type once, and reuse them where necessary. For example, the interface that Roslyn uses for StringBuilders look somewhat like this:

internal static class StringBuilderPool {
    public static StringBuilder Allocate() {
        return SharedPools.Default<StringBuilder>().AllocateAndClear();
    }
 
    public static void Free(StringBuilder builder) {
        SharedPools.Default<StringBuilder>().ClearAndFree(builder);
    }
 
    public static string ReturnAndFree(StringBuilder builder) {
        SharedPools.Default<StringBuilder>().ForgetTrackedObject(builder);
        return builder.ToString();
    }
}

Which is then later used in the following manner:

public override string GetTextBetween(SyntaxToken token1, SyntaxToken token2) {
    var builder = StringBuilderPool.Allocate();
    CommonFormattingHelpers.AppendTextBetween(token1, token2, builder);
    return StringBuilderPool.ReturnAndFree(builder);
}

Compile your reflection calls into delegates

Method calling through Reflection is horrendously slow. If you know the signature of your method beforehand (and for some reason cannot agree on a common interface with the third party who implements the external method), you can use LINQ Expression Trees to compile your MethodInfo to a strongly-typed delegate. This needs to be done only once, after that you can keep a reference to the delegate, and call it whenever necessary with standard C# syntax.

The technique is described at Jon Skeet’s blog. Here’s how I used it in one of my recent projects:

public static TDelegate ToDelegate<TDelegate>(this MethodInfo method) {
     var parameterTypes = method.GetParameters().Select(p => p.ParameterType).ToArray();
     MethodInfo invokeMethod = typeof (TDelegate).GetMethod("Invoke");
     var @params = invokeMethod.GetParameters()
                               .Select((p, i) => Expression.Parameter(p.ParameterType, "arg" + i))
                               .ToArray();
     MethodCallExpression methodCall = Expression.Call(method,
         @params.ZipWith(parameterTypes).Select2(Expression.Convert));
     return Expression.Lambda<TDelegate>(Expression.Convert(methodCall, invokeMethod.ReturnType), @params).Compile();
 }

Avoid boxing

Apart from obvious cases where people unnecessarily convert value types into objects (such as using now deprecated System.Collections namespace), there are some subtler ones:

String.Format boxes its arguments
Structs are boxed when used through their implemented interface. In particular, this happens when you use LINQ: List<T>.Enumerator is a struct, but LINQ methods treat it as IEnumerator<T>
Object.Equals(object other) is the source of all evil
Do not ever concatenate a string with anything other than a string
Calling GetHashCode on an enum causes boxing. Convert the enum instance to int first.

Use specialized collections for smaller sizes

An array outperforms a dictionary up to a certain number of elements, even though its lookup is $\mathcal{O}(n)$. The reason is all the boilerplate overhead that accompanies a dictionary (hashtable, bucket exploration, etc.). The exact tripping point depends on your particular code and has to be measured experimentally.

Know when to use structs and when not to

Structs give better performance than classes when they are small, consist of primitive types, and when they are not copied around often. The overhead of a class vtable is noticeable during garbage collection. However, classes are copied faster, because they are represented with just a pointer. On another hand, an array of classes will cause you a nightmare of cache misses (see reference locality). An array of structs neatly lies in memory as a single block, but would be a nightmare to copy as a function result. Bottom line, know your usecase and measure everything.

Note: you should always re-implement Equals, GetHashCode, and IEquatable<T>.Equals for your structs, because the default implementations use reflection to enumerate over the fields.

Compile your regular expressions

If you are going to reuse the same regular expression often, build it with a RegexOptions.Compiled flag. Do not ever use static methods of the Regex class, they are horrendously slow.

If your regular expressions do not require non-regular features such as lookaround or backreferences, consider using re2 instead of a default regex engine. It might yield you up to a 10x speedup. There is a .NET wrapper available, called Re2.Net.

I will stop here because I cannot possibly outline all of the techniques. Again, for the complete picture read the book, watch the talk, read the articles referenced above, know your tools, and measure everything carefully. Also, feel free to comment here if you want to contribute more advice!

Table of Contents