Benchmarking Haskell graph libraries
The results
The main results are here https://github.com/haskell-perf/graphs/blob/master/results/ with some explanations on the benchmarks themselves.
The following of this article is some comments on these results.
One of the goals of my Google Summer of Code 2018 project was to make a benchmark table of the four main graphs libraries in Haskell:
The results are in https://github.com/haskell-perf/graphs/ where is stored the code of the benchmarking suite.
If you find a result awkward or faulty, feel free to open an issue at https://github.com/haskell-perf/graphs/issues.
Also please don’t hesitate to tweak the benchmarking suite and try yourself to produce some numbers, it is made for it!
Preliminary comment
These libraries are really different, in their inner concept, and they have to be understood before starting an analysis:
- Alga is based on a new way of viewing graphs, developed by Andrey Mokhov ( thanks to him for its feedback and help concerning the benchmarks ). It is recent: the first version was uploaded on Hackage on March, 24 2017. Alga represents graph in a functional, constructively way.
- Containers is based on the very fast immutable array, it implies that the bound of vertices is immutable.
But everything is immutable in Haskell !
You are right, but even if a list is immutable, you can expand a list (when adding an element). This is not true for immutable arrays. Once it is created you cannot add keys to it: in fact it already contains every vertices between the given bounds; thus you cannot add vertices as you want. It is also the only implementation of graph to support only the Int
type for its vertices.
- Fgl is an older try to make graph more “functional-friendly” (by Martin Erwig) with first Hackage version back in 2006 !
- Hash-Graph is a try by Patrick Dougherty ( thanks to him for its feedback concerning the benchmarks ) to modernize the Fgl interface.
What to expect
In time complexities, we will note:
n
for the number of vertices in the graph.s
for the algebraic size of the graph. Actually for alga’s graphs in the main results, we haves = 2 * e
withe
the number of edges in the graph.
So:
- Alga does not assume any instance for its vertices so it cannot construct traditional efficient data-structures (like int-maps or hash-maps). This leads to a general complexity of
O(s)
for most of the functions (this is because of the way graphs are constructed from a list of edges. Using a more “alga” representation (like forclique
) can change this complexity toO(n)
). - Fgl and Hash-Graph functions are almost always requiring either an
Ord
or aHashable
instance for vertices, so they can build an efficient representation. This results to an average complexity ofO(log(n))
. - Containers is working only with
Int
vertices so it can build very fast representation.
That being said, lets analyze a bit the results.
Quick Analyze
Start by the end of the list
Lets starting by the end and take a look at the creation
benchmark. Here there is no compromise:
- Alga and Containers are the fastest.
- Fgl and Hash-Graph are a bit behind.
Alga is building a “blind” representation (it does not check if you are adding an edge already here for example), so it is fast. Containers is fast thanks to its array-based implementation. The last two are building a “clever” representation, so it takes more time. The next part will give them their reward.
Inspection !
Graph inspection comprises functions like isEmpty
, hasVertex
, hasEdge
or equality
. isEmpty
is efficient for everyone. Problems are coming with hasVertex
:
Because Hash-Graph is representing graphs as an adjacency (hash-)map, its hasVertex
works almost in constant time (real complexity of O(log n)
). Fgl works with int-maps, so it has also a complexity of O(log n)
. Alga works in O(s)
. Note that hasVertex
for Containers does not have any sense, since it does not make any difference between having a single vertex or not having it (its representation, an array, is a map with consecutive keys comprised between a pair of bounds).
equality
is one of the big advantages of Containers which handle the question very quickly. It is one of the problems of Alga which is, behind the scene, translating its representation to adjacency maps and then comparing them.
Graph modification
Graph inspection comprise functions like addVertex
, removeVertex
, addEdge
or removeEdge
.
As it was the case for creation, Alga is faster than Fgl and Hash-Graph for adding something (for addEdge
and addVertex
). But because removing something implies some inspection, Hash-Graph and Fgl takes the lead for removeVertex
and removeEdge
.
Algorithms
One of the only comparable algorithm is dff, which produces a forest obtained from a DFS (for Deep First Search) of each vertex. Containers is built for it: it is definitely faster.
A little note about Alga which implement a transpose
faster than Containers.
Others traditional algorithms are often also implemented (like BFS), but because they not produce exactly the same data-structure at the end, they cannot be compared.
Un-creation
You may want to get a list of vertices or edges in the graph, and libraries have fitting functions for this. It is graph-inspection related, so you could guess the result:
Containers is the fastest, then come Hash-Graph, close to Fgl and then Alga.
Fgl related
Because Hash-Graph and Fgl implement almost the same API, they have some common functions related to the theory behind. This is the case for mergeContext
. No surprises, Fgl was built on it, so it is faster than Hash-Graph.
About the benchmarks
The creation time
As said in the README of results, the benchmarks are obtained by constructing and fully evaluating a graph from a list of edges, passing it to the evaluated function and then fully evaluate the results. This clearly favor libraries with a “clever” representation and does not obviously represent every use cases. Thus we provide https://github.com/haskell-perf/graphs/blob/master/results/TIME-creation.md which take into account the creation time. Alga is now becoming a serious competitor!
The representation itself
Alga is the only library for which the function that is creating a graph from a list of edges is not at all optimized; it results that the graph obtained is not very convenient to work with. Things are not the same when graphs are constructed “the alga way”. An example is taken with clique
which is a very “algebraic” structure. The main comment is that it reduces the gap between Alga and the others, but not always fills it. Results can be seen here: https://github.com/haskell-perf/graphs/blob/master/results/TIME-extra.md
Conclusion
A summary: It was predictable
Due to their inner differences, Alga is faster for the creation of a graph. Fgl and Hash-Graph are better for graph inspection (the two libraries are optimized for different things, pick what you need!). Data.Graph
from Containers is almost everywhere faster, but it implies to work with very specific graphs (with an immutable set of Int
vertices).
Evaluate your need
There is no an “always better choice” when dealing with graphs libraries in the Haskell ecosystem. Just evaluate your needs! There is some questions that you can ask yourself:
- What is the type of my vertices ? Does they have an
Ord
or aHashable
instance ? - I am going to modify the graph a lot ?
- I am going to inspect the graph a lot ?
- What standard algorithms will I need ? Are they already efficiently implemented in a library ?