This post is not a propaganda about perks and benefits of immutable persistent data structures. People familiar with the concept already know when it can tremendously help your code and when it cannot. And for those who aren’t familiar I cannot give a better introduction than those three links:
- “Purely Functional Data Structures” by Chris Okasaki
- What’s new in purely functional data structures since Okasaki?
- “Elements of Functional Languages” by Evgeny Kirpichev (in Russian)
In this post I want to talk about my favorite persistent data structure. Or, if you wish, the one I consider most beautiful. For a long time the answer to this question was Zipper1. However, two years ago this question popped up on StackOverflow:
The team who built Roslyn, the new compiler infrastructure for C# and VB.NET languages, had to come up with an AST data structure that would satisfy a set of mutually incompatible requirements. The tree had to (a) have a cheap access to the parent field; (b) maintain an up-to-date character span in the document for each node; (c) persistently reuse most of the nodes on every update. As Eric Lippert describes, they solved this problem by maintaining two trees, one of which is immutable and persistent, and another has parent references and is re-built on-demand for each query, but only partially. The latter tree forms a facade around the former one, and together they solve all of the requirements at once.
The post is a great read, I encourage everyone to look at Eric’s description. Because Roslyn is now open-source, the entire implementation is now available on GitHub (hint: search for
GreenNode). For a long time I was also surprised that the team never published anything that formally describes their idea. However, today I found out that they at least filed a public patent. Its text contains many more interesting explanations and details on how this combination of two parse trees works as one unified persistent parse tree.
What is your favorite persistent data structure?