# Rewrite rules and a specific fold: use optimization techniques from GHC.Base

## Introduction

The `GHC.Base`

module defines the `Foldable`

class and several rewrite rules to optimize it (if you are not familiar with rewrite rules, I suggest the reading of my previous article “GHC, one compiler to RULE them all”, or directly the related GHC documentation). Especially, these rules allow

to be rewrited by the compiler into something similar to

These rules improve the expression by allowing a single reading of the list, whereas the first implementation was reading it twice.

Alga, a functional implementation of graphs, defines a foldable structure with a fold (named `foldg`

) specialized for the graph data.

Can we use the same tricks than `GHC.Base`

to optimize compositions of `foldg`

with `fmap`

? Spoiler: *Yes*, and we can do it without any pain!

## GHC.Base optimization on lists

### The build function

`build`

is a function you can find in `GHC.Base`

module:

It is mainly used for optimization when working on list.

For the next part of this article, I suggest you to read this great article by David Luposchainsky (alias quchen): https://github.com/quchen/articles/blob/master/build.md, that mainly explain how the `build`

function works, but not fully how it is used.

I will assume now that you are familiar with `build`

and with lambda-lists.

### FB functions

`FB`

functions can be seen when searching for `build`

in `GHC.Base`

. For example, with `map`

there is a `mapFB`

:

Given a `c(ons)`

function and something to convert a generic type `a`

to an element of the list of type `elt`

(namely `f`

), you can append an element of a generic type `a`

to a lambda-list of type `elt`

.

For example

will produce

But what is the purpose of these functions since a clever reader will have seen that they are not exported? Optimization! Take a look at the type of `mapFB`

if we apply to it the two first arguments: now you have a convenient function to use in a `foldr`

.

And why bother to give it a name? Because it will allow us to remember what we were rewriting, and rewrite back function when we was not able to make any optimization.

### The rules

Let us jump quick in the interesting part. Here is (a part of) the rewrite rules used in `GHC.Base`

to optimize the code of functions working on lists:

So what is happening ?

- Only in phase
`2`

(the first one), we rewrite every occurrence of`map`

by its strange`build`

equivalent (which use`mapFB`

), hoping for fusion to happen. - This fusion can happen with two rules, always activated:
- The
`"fold/build"`

one, described in the previously mentioned David’s article, that merge a`foldr`

followed by a`build`

(thus allowing the simplification of a`foldr f n . map g`

expression). - If there was two
`mapFB`

(and thus something like`map f . map g`

at the beginning of the process), we merge them into a simpler and faster`map (f . g)`

- In the meantime,
`build`

is inlined, allowing the`"mapList"`

rule to fire.

- The
- If there is any “dumb”
`foldr`

and`mapFB`

left, we rewrite them back to a standard form using the`"mapList"`

rule (only active from the phase`1`

to the end).

For more details, see the related comment in the `GHC.Base`

source.

#### Is it doing it right ?

Let us take an example.

Imagine I want to make the sum the elements of a list of integers *and* add the whole length of this list. One approach (rather bad) is to add `1`

to each element of the list and then make the sum its elements.
Here is roughly the different phases of the rewriting work made by the compiler if you try to compile such a function:

Et voila. At the beginning we were reading two times the list and at the end, only one.

## Back to graphs

### The data

The main data we will work with is `Graph`

from `Algebra.Graph`

module:

We can define a fold specialized for `Graph`

:

This useful function allows us to define two key-instances:

### Deja Vu ?

Now if we define for convenience:

We cannot use a rule targetting directly `fmap`

because it does not have any inline pragma attached to its definition and thus can be inlined at any moment. `mapG`

’s `INLINE`

pragma guarantee that we can target `mapG`

in a rewrite rule during the phase `2`

and `1`

.

We are almost in the same position than when working with lists! We have a `foldr`

and a `map`

.
Can we achieve the same than for lists? Can we optimize

into

using the same method than in `GHC.Base`

for lists?

### Introducing buildG

The work is simple, we only need to adapt the type from `GHC.Base`

to graphs:

A “lambda-graph” of type `a`

is a function of type of `GraphF a b`

. So like lambda-lists, applying the four constructors allow to convert it back to a standard `Graph`

.

### Add FB function

Our `mapGFB`

will be slightly different:

It is only a composition of function, but it will allow us to remember that it was made for a rewriting of `mapG`

.

### The rules

Now we have all the bricks to build our graph rules:

So, as in `GHC.Base`

, we

- Rewrite
`mapG`

into its`buildG`

equivalent using the`"mapG"`

rule - Try to optimize what can be optimized with the
`"foldg/buildG"`

and the`"mapFB/mapFB"`

rules - Convert back to standard form what was not optimized using the
`"mapGGraph"`

rule

### Benchmarks

Let’s verify if all of this is working. Using the great Criterion library:

we will benchmark different combination of `foldg`

and `mapG`

.

#### With rewrite rules

Compiling with
`$ ghc -O Test.hs -ddump-rule-firings | grep Algebra.Graph`

```
Rule fired: mapG (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: mapFB/mapFB (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: mapFB/mapFB (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: foldg/mapGFB (Algebra.Graph)
```

So our rules are doing something… Are they doing it right ?:

```
benchmarking foldg
time 96.62 μs (96.26 μs .. 97.09 μs)
[...]
benchmarking foldg . mapG
time 94.05 μs (93.90 μs .. 94.22 μs)
[...]
benchmarking foldg . mapG . mapG
time 94.57 μs (94.39 μs .. 94.79 μs)
[...]
benchmarking mapG
time 207.6 μs (205.9 μs .. 209.8 μs)
[...]
benchmarking mapG . mapG
time 201.2 μs (199.9 μs .. 203.2 μs)
[...]
```

The rules are working ! `foldg`

and `foldg . mapG`

are taking almost the same time to run. The `mapG`

composition was also optimized away !

#### Without the rules

But are we just lucky ? Let’s try without the rewrite rules:

`$ ghc -O Test.hs -fno-enable-rewrite-rules`

```
benchmarking foldg
time 95.36 μs (95.04 μs .. 95.62 μs)
[...]
benchmarking foldg . mapG
time 221.5 μs (221.3 μs .. 221.8 μs)
[...]
benchmarking foldg . mapG . mapG
time 398.3 μs (397.4 μs .. 399.0 μs)
[...]
benchmarking mapG
time 201.5 μs (201.4 μs .. 201.6 μs)
[...]
benchmarking mapG . mapG
time 348.6 μs (348.5 μs .. 348.7 μs)
[...]
```

There is no luck here :) That was effectively the rewrite rules that optimizes the expressions.

## Conclusion

The `build`

technique can be used to optimize user code with many structure with a custom `fold`

, for example with alga’s graphs, but certainly for many others data-type. Also note that almost any function based on `foldg`

and of type `Graph a -> Graph b`

can be optimized, not only `mapG`

(`induce`

and `transpose`

composition can be optimized to same way).