Benchmarking Haskell graph libraries
The main results are here https://github.com/haskell-perf/graphs/blob/master/results/ with some explanations on the benchmarks themselves.
The following of this article is some comments on these results.
One of the goals of my Google Summer of Code 2018 project was to make a benchmark table of the four main graphs libraries in Haskell:
The results are in https://github.com/haskell-perf/graphs/ where is stored the code of the benchmarking suite.
If you find a result awkward or faulty, feel free to open an issue at https://github.com/haskell-perf/graphs/issues.
Also please don’t hesitate to tweak the benchmarking suite and try yourself to produce some numbers, it is made for it!
These libraries are really different, in their inner concept, and they have to be understood before starting an analysis:
- Alga is based on a new way of viewing graphs, developed by Andrey Mokhov ( thanks to him for its feedback and help concerning the benchmarks ). It is recent: the first version was uploaded on Hackage on March, 24 2017. Alga represents graph in a functional, constructively way.
- Containers is based on the very fast immutable array, it implies that the bound of vertices is immutable.
But everything is immutable in Haskell !
You are right, but even if a list is immutable, you can expand a list (when adding an element). This is not true for immutable arrays. Once it is created you cannot add keys to it: in fact it already contains every vertices between the given bounds; thus you cannot add vertices as you want. It is also the only implementation of graph to support only the
Int type for its vertices.
- Fgl is an older try to make graph more “functional-friendly” (by Martin Erwig) with first Hackage version back in 2006 !
- Hash-Graph is a try by Patrick Dougherty ( thanks to him for its feedback concerning the benchmarks ) to modernize the Fgl interface.
What to expect
In time complexities, we will note:
nfor the number of vertices in the graph.
sfor the algebraic size of the graph. Actually for alga’s graphs in the main results, we have
s = 2 * ewith
ethe number of edges in the graph.
- Alga does not assume any instance for its vertices so it cannot construct traditional efficient data-structures (like int-maps or hash-maps). This leads to a general complexity of
O(s)for most of the functions (this is because of the way graphs are constructed from a list of edges. Using a more “alga” representation (like for
clique) can change this complexity to
- Fgl and Hash-Graph functions are almost always requiring either an
Hashableinstance for vertices, so they can build an efficient representation. This results to an average complexity of
- Containers is working only with
Intvertices so it can build very fast representation.
That being said, lets analyze a bit the results.
Start by the end of the list
Lets starting by the end and take a look at the
creation benchmark. Here there is no compromise:
- Alga and Containers are the fastest.
- Fgl and Hash-Graph are a bit behind.
Alga is building a “blind” representation (it does not check if you are adding an edge already here for example), so it is fast. Containers is fast thanks to its array-based implementation. The last two are building a “clever” representation, so it takes more time. The next part will give them their reward.
Graph inspection comprises functions like
isEmpty is efficient for everyone. Problems are coming with
Because Hash-Graph is representing graphs as an adjacency (hash-)map, its
hasVertex works almost in constant time (real complexity of
O(log n)). Fgl works with int-maps, so it has also a complexity of
O(log n). Alga works in
O(s). Note that
hasVertex for Containers does not have any sense, since it does not make any difference between having a single vertex or not having it (its representation, an array, is a map with consecutive keys comprised between a pair of bounds).
equality is one of the big advantages of Containers which handle the question very quickly. It is one of the problems of Alga which is, behind the scene, translating its representation to adjacency maps and then comparing them.
Graph inspection comprise functions like
As it was the case for creation, Alga is faster than Fgl and Hash-Graph for adding something (for
addVertex). But because removing something implies some inspection, Hash-Graph and Fgl takes the lead for
One of the only comparable algorithm is dff, which produces a forest obtained from a DFS (for Deep First Search) of each vertex. Containers is built for it: it is definitely faster.
A little note about Alga which implement a
transpose faster than Containers.
Others traditional algorithms are often also implemented (like BFS), but because they not produce exactly the same data-structure at the end, they cannot be compared.
You may want to get a list of vertices or edges in the graph, and libraries have fitting functions for this. It is graph-inspection related, so you could guess the result:
Containers is the fastest, then come Hash-Graph, close to Fgl and then Alga.
Because Hash-Graph and Fgl implement almost the same API, they have some common functions related to the theory behind. This is the case for
mergeContext. No surprises, Fgl was built on it, so it is faster than Hash-Graph.
About the benchmarks
The creation time
As said in the README of results, the benchmarks are obtained by constructing and fully evaluating a graph from a list of edges, passing it to the evaluated function and then fully evaluate the results. This clearly favor libraries with a “clever” representation and does not obviously represent every use cases. Thus we provide https://github.com/haskell-perf/graphs/blob/master/results/TIME-creation.md which take into account the creation time. Alga is now becoming a serious competitor!
The representation itself
Alga is the only library for which the function that is creating a graph from a list of edges is not at all optimized; it results that the graph obtained is not very convenient to work with. Things are not the same when graphs are constructed “the alga way”. An example is taken with
clique which is a very “algebraic” structure. The main comment is that it reduces the gap between Alga and the others, but not always fills it. Results can be seen here: https://github.com/haskell-perf/graphs/blob/master/results/TIME-extra.md
A summary: It was predictable
Due to their inner differences, Alga is faster for the creation of a graph. Fgl and Hash-Graph are better for graph inspection (the two libraries are optimized for different things, pick what you need!).
Data.Graph from Containers is almost everywhere faster, but it implies to work with very specific graphs (with an immutable set of
Evaluate your need
There is no an “always better choice” when dealing with graphs libraries in the Haskell ecosystem. Just evaluate your needs! There is some questions that you can ask yourself:
- What is the type of my vertices ? Does they have an
- I am going to modify the graph a lot ?
- I am going to inspect the graph a lot ?
- What standard algorithms will I need ? Are they already efficiently implemented in a library ?