Rewrite rules and a specific fold: use optimization techniques from GHC.Base
Introduction
The GHC.Base
module defines, in particular, base functions on lists and several rewrite rules to optimize them (if you are not familiar with rewrite rules, I suggest the reading of my previous article “GHC, one compiler to RULE them all”, or directly the related GHC documentation). Especially, these rules allow
to be rewritten by the compiler into something similar to
These rules improve the expression by allowing a single reading of the list, whereas the first implementation was reading it twice.
Alga, a functional implementation of graphs, defines a foldable structure with a fold (named foldg
) specialized for the graph data.
Can we use the same tricks than GHC.Base
to optimize compositions of foldg
with fmap
? Spoiler: Yes, and we can do it without any pain!
GHC.Base optimization on lists
The build function
build
is a function you can find in GHC.Base
module:
It is mainly used for optimization when working on list.
For the next part of this article, I suggest you to read this great article by David Luposchainsky (alias quchen): https://github.com/quchen/articles/blob/master/build.md, that mainly explains how the build
function works, but not fully how it is used.
I will assume now that you are familiar with build
and with lambda-lists.
FB functions
FB
functions can be seen when searching for build
in GHC.Base
. For example, with map
there is a mapFB
:
Given a c(ons)
function and something to convert a generic type a
to an element of the list of type elt
(namely f
), you can append an element of a generic type a
to a lambda-list of type elt
.
For example
will produce
But what is the purpose of these functions since a clever reader will have seen that they are not exported? Optimization! Take a look at the type of mapFB
if we apply to it the two first arguments: now you have a convenient function to use in a foldr
.
And why bother to give it a name? Because it will allow us to remember what we were rewriting, and rewrite back function when we were not able to make any optimization.
The rules
Let us jump quick in the interesting part. Here is (a part of) the rewrite rules used in GHC.Base
to optimize the code of functions working on lists:
So what is happening ?
- Only in phase
2
(the first one), we rewrite every occurrence ofmap
by its strangebuild
equivalent (which usemapFB
), hoping for fusion to happen. - This fusion can happen with two rules, always activated:
- The
"fold/build"
one, described in the previously mentioned David’s article, merging afoldr
followed by abuild
(thus allowing the simplification of afoldr f n . map g
expression). - If there was two
mapFB
(and thus something likemap f . map g
at the beginning of the process), we merge them into a simpler and fastermap (f . g)
- In the meantime,
build
is inlined, allowing the"mapList"
rule to fire.
- The
- If there is any “dumb”
foldr
andmapFB
left, we rewrite them back to a standard form using the"mapList"
rule (only active from the phase1
to the end).
For more details, see the related comment in the GHC.Base
source.
Is it doing it right?
Let us take an example.
Imagine I want to make the sum the elements of a list of integers and add the whole length of this list. One approach (rather bad) is to add 1
to each element of the list and then make the sum its elements.
Here is roughly the different phases of the rewriting work made by the compiler if you try to compile such a function:
Et voila. At the beginning we were reading two times the list and at the end, only one.
Back to graphs
The data
The main data we will work with is Graph
from Algebra.Graph
module:
We can define a fold specialized for Graph
:
This useful function allows us to define two key-instances:
Déjà Vu?
Now if we define for convenience:
We cannot use a rule targetting directly fmap
because it does not have any inline pragma attached to its definition and thus can be inlined at any moment. mapG
’s INLINE
pragma guarantee that we can target mapG
in a rewrite rule during the phase 2
and 1
.
We are almost in the same position as when working with lists! We have a foldr
and a map
.
Can we achieve the same as for lists? Can we optimize
into
using the same method than in GHC.Base
for lists?
Introducing buildG
The work is simple, we only need to adapt the type from GHC.Base
to graphs:
A “lambda-graph” of type a
is a function of type GraphF a b
. So like lambda-lists, applying the four constructors allow us to convert it back to a standard Graph
.
Add FB function
Our mapGFB
will be slightly different:
It is only a composition of function, but it will allow us to remember that it was made for a rewriting of mapG
.
The rules
Now we have all the bricks to build our graph rules:
So, as in GHC.Base
, we
- Rewrite
mapG
into itsbuildG
equivalent using the"mapG"
rule - Try to optimize what can be optimized with the
"foldg/buildG"
and the"mapFB/mapFB"
rules - Convert back to standard form what was not optimized using the
"mapGGraph"
rule
Benchmarks
Let’s verify if all of this is working. Using the great Criterion library:
we will benchmark different combinations of foldg
and mapG
.
With rewrite rules
Compiling with
$ ghc -O Test.hs -ddump-rule-firings | grep Algebra.Graph
Rule fired: mapG (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: mapFB/mapFB (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: mapFB/mapFB (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: foldg/mapGFB (Algebra.Graph)
So our rules are doing something… Are they doing it right ?:
benchmarking foldg
time 96.62 μs (96.26 μs .. 97.09 μs)
[...]
benchmarking foldg . mapG
time 94.05 μs (93.90 μs .. 94.22 μs)
[...]
benchmarking foldg . mapG . mapG
time 94.57 μs (94.39 μs .. 94.79 μs)
[...]
benchmarking mapG
time 207.6 μs (205.9 μs .. 209.8 μs)
[...]
benchmarking mapG . mapG
time 201.2 μs (199.9 μs .. 203.2 μs)
[...]
The rules are working! foldg
and foldg . mapG
are taking almost the same time to run. The mapG
composition was also optimized away!
Without the rules
But are we just lucky ? Let’s try without the rewrite rules:
$ ghc -O Test.hs -fno-enable-rewrite-rules
benchmarking foldg
time 95.36 μs (95.04 μs .. 95.62 μs)
[...]
benchmarking foldg . mapG
time 221.5 μs (221.3 μs .. 221.8 μs)
[...]
benchmarking foldg . mapG . mapG
time 398.3 μs (397.4 μs .. 399.0 μs)
[...]
benchmarking mapG
time 201.5 μs (201.4 μs .. 201.6 μs)
[...]
benchmarking mapG . mapG
time 348.6 μs (348.5 μs .. 348.7 μs)
[...]
There is no luck here :) That was effectively the rewrite rules that optimizes the expressions.
Conclusion
The build
technique can be used to optimize user code with many structures with a custom fold
, for example with alga’s graphs, but certainly for many others data-type. Also note that almost any function based on foldg
and of type Graph a -> Graph b
can be optimized, not only mapG
(induce
and transpose
composition can be optimized to same way).