# Benchmarking Haskell graph libraries

## The results

The main results are here https://github.com/haskell-perf/graphs/blob/master/results/ with some explanations on the benchmarks themselves.

The following of this article is some comments on these results.

One of the goals of my Google Summer of Code 2018 project was to make a benchmark table of the four main graphs libraries in Haskell:

The results are in https://github.com/haskell-perf/graphs/ where is stored the code of the benchmarking suite.

If you find a result awkward or faulty, feel free to open an issue at https://github.com/haskell-perf/graphs/issues.

Also please don’t hesitate to tweak the benchmarking suite and try yourself to produce some numbers, it is made for it!

## Preliminary comment

These libraries are **really** different, in their inner concept, and they have to be understood before starting an analysis:

- Alga is based on a new way of viewing graphs, developed by Andrey Mokhov ( thanks to him for its feedback and help concerning the benchmarks ). It is recent: the first version was uploaded on Hackage on March, 24 2017. Alga represents graph in a functional, constructively way.
- Containers is based on the very fast immutable array, it implies that the bound of vertices is
**immutable**.

But everything is immutable in Haskell !

You are right, but even if a list is immutable, you can expand a list (when adding an element). This is *not* true for immutable arrays. Once it is created you cannot add keys to it: in fact it already contains every vertices between the given bounds; thus you cannot add vertices as you want. It is also the only implementation of graph to support only the `Int`

type for its vertices.

- Fgl is an older try to make graph more “functional-friendly” (by Martin Erwig) with first Hackage version back in 2006 !
- Hash-Graph is a try by Patrick Dougherty ( thanks to him for its feedback concerning the benchmarks ) to modernize the Fgl interface.

### What to expect

In time complexities, we will note:

`n`

for the number of vertices in the graph.`s`

for the algebraic size of the graph. Actually for alga’s graphs in the main results, we have`s = 2 * e`

with`e`

the number of edges in the graph.

So:

- Alga does not assume any instance for its vertices so it cannot construct traditional efficient data-structures (like int-maps or hash-maps). This leads to a general complexity of
`O(s)`

for most of the functions (this is because of the way graphs are constructed from a list of edges. Using a more “alga” representation (like for`clique`

) can change this complexity to`O(n)`

). - Fgl and Hash-Graph functions are almost always requiring either an
`Ord`

or a`Hashable`

instance for vertices, so they can build an efficient representation. This results to an average complexity of`O(log(n))`

. - Containers is working only with
`Int`

vertices so it can build very fast representation.

That being said, lets analyze a bit the results.

## Quick Analyze

### Start by the end of the list

Lets starting by the end and take a look at the `creation`

benchmark. Here there is no compromise:

- Alga and Containers are the fastest.
- Fgl and Hash-Graph are a bit behind.

Alga is building a “blind” representation (it does not check if you are adding an edge already here for example), so it is fast. Containers is fast thanks to its *array*-based implementation. The last two are building a “clever” representation, so it takes more time. The next part will give them their reward.

### Inspection !

Graph inspection comprises functions like `isEmpty`

, `hasVertex`

, `hasEdge`

or `equality`

. `isEmpty`

is efficient for everyone. Problems are coming with `hasVertex`

:

Because Hash-Graph is representing graphs as an adjacency (hash-)map, its `hasVertex`

works almost in constant time (real complexity of `O(log n)`

). Fgl works with int-maps, so it has also a complexity of `O(log n)`

. Alga works in `O(s)`

. Note that `hasVertex`

for Containers does not have any sense, since it does not make any difference between having a single vertex or not having it (its representation, an array, is a map with consecutive keys comprised between a pair of bounds).

`equality`

is one of the big advantages of Containers which handle the question very quickly. It is one of the problems of Alga which is, behind the scene, translating its representation to adjacency maps and then comparing them.

### Graph modification

Graph inspection comprise functions like `addVertex`

, `removeVertex`

, `addEdge`

or `removeEdge`

.

As it was the case for creation, Alga is faster than Fgl and Hash-Graph for adding something (for `addEdge`

and `addVertex`

). But because removing something implies some inspection, Hash-Graph and Fgl takes the lead for `removeVertex`

and `removeEdge`

.

### Algorithms

One of the only comparable algorithm is dff, which produces a forest obtained from a DFS (for Deep First Search) of each vertex. Containers is built for it: it is definitely faster.

A little note about Alga which implement a `transpose`

faster than Containers.

Others traditional algorithms are often also implemented (like BFS), but because they not produce exactly the same data-structure at the end, they cannot be compared.

### Un-creation

You may want to get a list of vertices or edges in the graph, and libraries have fitting functions for this. It is graph-inspection related, so you could guess the result:

Containers is the fastest, then come Hash-Graph, close to Fgl and then Alga.

### Fgl related

Because Hash-Graph and Fgl implement almost the same API, they have some common functions related to the theory behind. This is the case for `mergeContext`

. No surprises, Fgl was built on it, so it is faster than Hash-Graph.

## About the benchmarks

### The creation time

As said in the README of results, the benchmarks are obtained by constructing and fully evaluating a graph from a list of edges, passing it to the evaluated function and then fully evaluate the results. This clearly favor libraries with a “clever” representation and does not obviously represent every use cases. Thus we provide https://github.com/haskell-perf/graphs/blob/master/results/TIME-creation.md which take into account the creation time. Alga is now becoming a serious competitor!

### The representation itself

Alga is the only library for which the function that is creating a graph from a list of edges is not at all optimized; it results that the graph obtained is not very convenient to work with. Things are not the same when graphs are constructed “the alga way”. An example is taken with `clique`

which is a very “algebraic” structure. The main comment is that it reduces the gap between Alga and the others, but not always fills it. Results can be seen here: https://github.com/haskell-perf/graphs/blob/master/results/TIME-extra.md

## Conclusion

### A summary: It was predictable

Due to their inner differences, Alga is faster for the creation of a graph. Fgl and Hash-Graph are better for graph inspection (the two libraries are optimized for different things, pick what you need!). `Data.Graph`

from Containers is almost everywhere faster, but it implies to work with very specific graphs (with an immutable set of `Int`

vertices).

### Evaluate your need

There is no an “always better choice” when dealing with graphs libraries in the Haskell ecosystem. Just evaluate your needs! There is some questions that you can ask yourself:

- What is the type of my vertices ? Does they have an
`Ord`

or a`Hashable`

instance ? - I am going to modify the graph a lot ?
- I am going to inspect the graph a lot ?
- What standard algorithms will I need ? Are they already efficiently implemented in a library ?