Rewrite rules and a specific fold: use optimization techniques from GHC.Base

Introduction

The GHC.Base module defines, in particular, base functions on lists and several rewrite rules to optimize them (if you are not familiar with rewrite rules, I suggest the reading of my previous article “GHC, one compiler to RULE them all”, or directly the related GHC documentation). Especially, these rules allow

foldr f n (map g xs)

to be rewritten by the compiler into something similar to

foldr (f . g) n xs

These rules improve the expression by allowing a single reading of the list, whereas the first implementation was reading it twice.

Alga, a functional implementation of graphs, defines a foldable structure with a fold (named foldg) specialized for the graph data.

Can we use the same tricks than GHC.Base to optimize compositions of foldg with fmap? Spoiler: Yes, and we can do it without any pain!

GHC.Base optimization on lists

The build function

build is a function you can find in GHC.Base module:

build :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]
build g = g (:) []
{-# INLINE [1] build #-}

It is mainly used for optimization when working on list.

For the next part of this article, I suggest you to read this great article by David Luposchainsky (alias quchen): https://github.com/quchen/articles/blob/master/build.md, that mainly explains how the build function works, but not fully how it is used.

I will assume now that you are familiar with build and with lambda-lists.

FB functions

FB functions can be seen when searching for build in GHC.Base. For example, with map there is a mapFB:

mapFB :: (elt -> lst -> lst) -> (a -> elt) -> a -> lst -> lst
mapFB c f = \x ys -> c (f x) ys
{-# INLINE [0] mapFB #-}

Given a c(ons) function and something to convert a generic type a to an element of the list of type elt (namely f), you can append an element of a generic type a to a lambda-list of type elt.

For example

let myLambdaList = \cons nil -> 1 `cons` (2 `cons` (3 `cons` nil))
 in (\cons nil -> mapFB cons read "0" (myLambdaList cons nil) ) (:) []

will produce

[0,1,2,3]

But what is the purpose of these functions since a clever reader will have seen that they are not exported? Optimization! Take a look at the type of mapFB if we apply to it the two first arguments: now you have a convenient function to use in a foldr.

And why bother to give it a name? Because it will allow us to remember what we were rewriting, and rewrite back function when we were not able to make any optimization.

The rules

Let us jump quick in the interesting part. Here is (a part of) the rewrite rules used in GHC.Base to optimize the code of functions working on lists:

{-# RULES
"map"     [~1] forall f xs. map f xs = build (\c n -> foldr (mapFB c f) n xs)

"fold/build"   forall k z (g::forall b. (a->b->b) -> b -> b) .
               foldr k z (build g) = g k z

"mapFB"        forall c f g. mapFB (mapFB c f) g = mapFB c (f.g)

"mapList" [1]  forall f. foldr (mapFB (:) f) [] = map f
  #-}

So what is happening ?

Only in phase 2 (the first one), we rewrite every occurrence of map by its strange build equivalent (which use mapFB), hoping for fusion to happen.
This fusion can happen with two rules, always activated:
- The "fold/build" one, described in the previously mentioned David’s article, merging a foldr followed by a build (thus allowing the simplification of a foldr f n . map g expression).
- If there was two mapFB (and thus something like map f . map g at the beginning of the process), we merge them into a simpler and faster map (f . g)
- In the meantime, build is inlined, allowing the "mapList" rule to fire.
If there is any “dumb” foldr and mapFB left, we rewrite them back to a standard form using the "mapList" rule (only active from the phase 1 to the end).

For more details, see the related comment in the GHC.Base source.

Is it doing it right?

Let us take an example.

Imagine I want to make the sum the elements of a list of integers and add the whole length of this list. One approach (rather bad) is to add 1 to each element of the list and then make the sum its elements. Here is roughly the different phases of the rewriting work made by the compiler if you try to compile such a function:

foldr (+) 0 (map (+1) xs)
=(map)=          foldr (+) 0 (build (\c n -> foldr (mapFB c (+1)) n xs))
=(fold/build)=   (\c n -> foldr (mapFB c (+1)) n xs)) (+) 0
==               foldr (mapFB (+) (+1)) 0 xs
=(inline mapFB)= foldr (\x -> (+) (x+1)) 0 xs

Et voila. At the beginning we were reading two times the list and at the end, only one.

Back to graphs

The data

The main data we will work with is Graph from Algebra.Graph module:

data Graph a = Empty
             | Vertex a
             | Overlay (Graph a) (Graph a)
             | Connect (Graph a) (Graph a)

We can define a fold specialized for Graph:

-- Generalised 'Graph' folding: recursively collapse a 'Graph' by applying
-- the provided functions to the leaves and internal nodes of the expression.
-- The order of arguments is: empty, vertex, overlay and connect.
foldg :: b -> (a -> b) -> (b -> b -> b) -> (b -> b -> b) -> Graph a -> b
foldg e v o c = go
  where
    go Empty         = e
    go (Vertex  x  ) = v x
    go (Overlay x y) = o (go x) (go y)
    go (Connect x y) = c (go x) (go y)
{-# INLINE [0] foldg #-}

This useful function allows us to define two key-instances:

instance Functor (Graph a) where
	fmap f = foldg Empty (Vertex . f) Overlay Connect

instance Foldable (Graph a) where
	foldMap f = foldg mempty f (<>) (<>)

Déjà Vu?

Now if we define for convenience:

mapG :: (a -> b) -> Graph a -> Graph b
mapG = fmap
{-# INLINE [0] mapG #-}

We cannot use a rule targetting directly fmap because it does not have any inline pragma attached to its definition and thus can be inlined at any moment. mapG’s INLINE pragma guarantee that we can target mapG in a rewrite rule during the phase 2 and 1.

We are almost in the same position as when working with lists! We have a foldr and a map. Can we achieve the same as for lists? Can we optimize

foldg e v o c . mapG f

into

foldg e (v . f) o c

using the same method than in GHC.Base for lists?

Introducing buildG

The work is simple, we only need to adapt the type from GHC.Base to graphs:

type GraphF a b = b -> (a -> b) -> (b -> b -> b) -> (b -> b -> b) -> b

buildG :: forall a. (forall b. GraphF a b) -> Graph a
buildG g = g Empty Vertex Overlay Connect
{-# INLINE [1] buildG #-}

A “lambda-graph” of type a is a function of type GraphF a b. So like lambda-lists, applying the four constructors allow us to convert it back to a standard Graph.

Add FB function

Our mapGFB will be slightly different:

mapGFB :: (b -> c) -> (a -> b) -> a -> c
mapGFB = (.)
{-# INLINE [0] mapGFB #-}

It is only a composition of function, but it will allow us to remember that it was made for a rewriting of mapG.

The rules

Now we have all the bricks to build our graph rules:

{-# RULES
-- Transform a mapG into its build equivalent
"mapG"   [~1]   forall f g. mapG f g = buildG (\e v o c -> foldg e (mapGFB v f) o c g)

-- Merge a foldg followed by a buildG
"foldg/buildG"  forall e v o c (g::forall b. GraphF a b).
                       foldg e v o c (buildG g) = g e v o c

-- Merge two mapFB
"mapFB/mapFB"   forall c f g. mapGFB (mapGFB c f) g = mapGFB c (f.g)

-- Rules to rewrite un-merged function back
"mapGGraph" [1] forall f. foldg Empty (mapGFB Vertex f) Overlay Connect = mapG f
 #-}

So, as in GHC.Base, we

Rewrite mapG into its buildG equivalent using the "mapG" rule
Try to optimize what can be optimized with the "foldg/buildG" and the "mapFB/mapFB" rules
Convert back to standard form what was not optimized using the "mapGGraph" rule

Benchmarks

Let’s verify if all of this is working. Using the great Criterion library:

import Algebra.Graph
import Control.DeepSeq
import Criterion.Main

main :: IO ()
main = defaultMain
  [ bench "foldg" $ nf foldg' $!! grInt
  , bench "foldg . mapG" $ nf (foldg' . mapG') $!! grInt
  , bench "foldg . mapG . mapG" $ nf (foldg' . mapG' . mapG') $!! grInt
  , bench "mapG" $ nf mapG' $!! grInt
  , bench "mapG . mapG" $ nf (mapG' . mapG') $!! grInt
  ]
  where
    foldg' :: Graph Int -> Int
    foldg' = foldg (0 :: Int) (const 1) (+) (+)
    mapG' :: Graph Int -> Graph Int
    mapG'  = mapG (+1)

grInt :: Graph Int
grInt = vertices [0..9999]

we will benchmark different combinations of foldg and mapG.

With rewrite rules

Compiling with $ ghc -O Test.hs -ddump-rule-firings | grep Algebra.Graph

Rule fired: mapG (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: mapFB/mapFB (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: mapFB/mapFB (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: foldg/mapGFB (Algebra.Graph)

So our rules are doing something… Are they doing it right ?:

benchmarking foldg
time                 96.62 μs   (96.26 μs .. 97.09 μs)
[...]

benchmarking foldg . mapG
time                 94.05 μs   (93.90 μs .. 94.22 μs)
[...]

benchmarking foldg . mapG . mapG
time                 94.57 μs   (94.39 μs .. 94.79 μs)
[...]

benchmarking mapG
time                 207.6 μs   (205.9 μs .. 209.8 μs)
[...]

benchmarking mapG . mapG
time                 201.2 μs   (199.9 μs .. 203.2 μs)
[...]

The rules are working! foldg and foldg . mapG are taking almost the same time to run. The mapG composition was also optimized away!

Without the rules

But are we just lucky ? Let’s try without the rewrite rules:

$ ghc -O Test.hs -fno-enable-rewrite-rules

benchmarking foldg
time                 95.36 μs   (95.04 μs .. 95.62 μs)
[...]

benchmarking foldg . mapG
time                 221.5 μs   (221.3 μs .. 221.8 μs)
[...]

benchmarking foldg . mapG . mapG
time                 398.3 μs   (397.4 μs .. 399.0 μs)
[...]

benchmarking mapG
time                 201.5 μs   (201.4 μs .. 201.6 μs)
[...]

benchmarking mapG . mapG
time                 348.6 μs   (348.5 μs .. 348.7 μs)
[...]

There is no luck here :) That was effectively the rewrite rules that optimizes the expressions.

Conclusion

The build technique can be used to optimize user code with many structures with a custom fold, for example with alga’s graphs, but certainly for many others data-type. Also note that almost any function based on foldg and of type Graph a -> Graph b can be optimized, not only mapG (induce and transpose composition can be optimized to same way).