GHC.Base module defines the
Foldable class and several rewrite rules to optimize it (if you are not familiar with rewrite rules, I suggest the reading of my previous article “GHC, one compiler to RULE them all”, or directly the related GHC documentation). Especially, these rules allow
to be rewrited by the compiler into something similar to
These rules improve the expression by allowing a single reading of the list, whereas the first implementation was reading it twice.
Alga, a functional implementation of graphs, defines a foldable structure with a fold (named
foldg) specialized for the graph data.
Can we use the same tricks than
GHC.Base to optimize compositions of
fmap? Spoiler: Yes, and we can do it without any pain!
GHC.Base optimization on lists
The build function
build is a function you can find in
It is mainly used for optimization when working on list.
For the next part of this article, I suggest you to read this great article by David Luposchainsky (alias quchen): https://github.com/quchen/articles/blob/master/build.md, that mainly explain how the
build function works, but not fully how it is used.
I will assume now that you are familiar with
build and with lambda-lists.
FB functions can be seen when searching for
GHC.Base. For example, with
map there is a
c(ons) function and something to convert a generic type
a to an element of the list of type
f), you can append an element of a generic type
a to a lambda-list of type
But what is the purpose of these functions since a clever reader will have seen that they are not exported? Optimization! Take a look at the type of
mapFB if we apply to it the two first arguments: now you have a convenient function to use in a
And why bother to give it a name? Because it will allow us to remember what we were rewriting, and rewrite back function when we was not able to make any optimization.
Let us jump quick in the interesting part. Here is (a part of) the rewrite rules used in
GHC.Base to optimize the code of functions working on lists:
So what is happening ?
- Only in phase
2(the first one), we rewrite every occurrence of
mapby its strange
buildequivalent (which use
mapFB), hoping for fusion to happen.
- This fusion can happen with two rules, always activated:
"fold/build"one, described in the previously mentioned David’s article, that merge a
foldrfollowed by a
build(thus allowing the simplification of a
foldr f n . map gexpression).
- If there was two
mapFB(and thus something like
map f . map gat the beginning of the process), we merge them into a simpler and faster
map (f . g)
- In the meantime,
buildis inlined, allowing the
"mapList"rule to fire.
- If there is any “dumb”
mapFBleft, we rewrite them back to a standard form using the
"mapList"rule (only active from the phase
1to the end).
For more details, see the related comment in the
Is it doing it right ?
Let us take an example.
Imagine I want to make the sum the elements of a list of integers and add the whole length of this list. One approach (rather bad) is to add
1 to each element of the list and then make the sum its elements.
Here is roughly the different phases of the rewriting work made by the compiler if you try to compile such a function:
Et voila. At the beginning we were reading two times the list and at the end, only one.
Back to graphs
The main data we will work with is
We can define a fold specialized for
This useful function allows us to define two key-instances:
Deja Vu ?
Now if we define for convenience:
We cannot use a rule targetting directly
fmap because it does not have any inline pragma attached to its definition and thus can be inlined at any moment.
INLINE pragma guarantee that we can target
mapG in a rewrite rule during the phase
We are almost in the same position than when working with lists! We have a
foldr and a
Can we achieve the same than for lists? Can we optimize
using the same method than in
GHC.Base for lists?
The work is simple, we only need to adapt the type from
GHC.Base to graphs:
A “lambda-graph” of type
a is a function of type of
GraphF a b. So like lambda-lists, applying the four constructors allow to convert it back to a standard
Add FB function
mapGFB will be slightly different:
It is only a composition of function, but it will allow us to remember that it was made for a rewriting of
Now we have all the bricks to build our graph rules:
So, as in
buildGequivalent using the
- Try to optimize what can be optimized with the
- Convert back to standard form what was not optimized using the
Let’s verify if all of this is working. Using the great Criterion library:
we will benchmark different combination of
With rewrite rules
$ ghc -O Test.hs -ddump-rule-firings | grep Algebra.Graph
Rule fired: mapG (Algebra.Graph) Rule fired: foldg/buildG (Algebra.Graph) Rule fired: mapG (Algebra.Graph) Rule fired: mapG (Algebra.Graph) Rule fired: foldg/buildG (Algebra.Graph) Rule fired: mapFB/mapFB (Algebra.Graph) Rule fired: mapG (Algebra.Graph) Rule fired: mapG (Algebra.Graph) Rule fired: foldg/buildG (Algebra.Graph) Rule fired: mapFB/mapFB (Algebra.Graph) Rule fired: foldg/buildG (Algebra.Graph) Rule fired: foldg/mapGFB (Algebra.Graph)
So our rules are doing something… Are they doing it right ?:
benchmarking foldg time 96.62 μs (96.26 μs .. 97.09 μs) [...] benchmarking foldg . mapG time 94.05 μs (93.90 μs .. 94.22 μs) [...] benchmarking foldg . mapG . mapG time 94.57 μs (94.39 μs .. 94.79 μs) [...] benchmarking mapG time 207.6 μs (205.9 μs .. 209.8 μs) [...] benchmarking mapG . mapG time 201.2 μs (199.9 μs .. 203.2 μs) [...]
The rules are working !
foldg . mapG are taking almost the same time to run. The
mapG composition was also optimized away !
Without the rules
But are we just lucky ? Let’s try without the rewrite rules:
$ ghc -O Test.hs -fno-enable-rewrite-rules
benchmarking foldg time 95.36 μs (95.04 μs .. 95.62 μs) [...] benchmarking foldg . mapG time 221.5 μs (221.3 μs .. 221.8 μs) [...] benchmarking foldg . mapG . mapG time 398.3 μs (397.4 μs .. 399.0 μs) [...] benchmarking mapG time 201.5 μs (201.4 μs .. 201.6 μs) [...] benchmarking mapG . mapG time 348.6 μs (348.5 μs .. 348.7 μs) [...]
There is no luck here :) That was effectively the rewrite rules that optimizes the expressions.
build technique can be used to optimize user code with many structure with a custom
fold, for example with alga’s graphs, but certainly for many others data-type. Also note that almost any function based on
foldg and of type
Graph a -> Graph b can be optimized, not only
transpose composition can be optimized to same way).