Rewrite rules and a specific fold: use optimization techniques from GHC.Base
Introduction
The GHC.Base module defines, in particular, base functions on lists and several rewrite rules to optimize them (if you are not familiar with rewrite rules, I suggest the reading of my previous article “GHC, one compiler to RULE them all”, or directly the related GHC documentation). Especially, these rules allow
foldr f n (map g xs)to be rewritten by the compiler into something similar to
foldr (f . g) n xsThese rules improve the expression by allowing a single reading of the list, whereas the first implementation was reading it twice.
Alga, a functional implementation of graphs, defines a foldable structure with a fold (named foldg) specialized for the graph data.
Can we use the same tricks than GHC.Base to optimize compositions of foldg with fmap? Spoiler: Yes, and we can do it without any pain!
GHC.Base optimization on lists
The build function
build is a function you can find in GHC.Base module:
build :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]
build g = g (:) []
{-# INLINE [1] build #-}It is mainly used for optimization when working on list.
For the next part of this article, I suggest you to read this great article by David Luposchainsky (alias quchen): https://github.com/quchen/articles/blob/master/build.md, that mainly explains how the build function works, but not fully how it is used.
I will assume now that you are familiar with build and with lambda-lists.
FB functions
FB functions can be seen when searching for build in GHC.Base. For example, with map there is a mapFB:
mapFB :: (elt -> lst -> lst) -> (a -> elt) -> a -> lst -> lst
mapFB c f = \x ys -> c (f x) ys
{-# INLINE [0] mapFB #-}Given a c(ons) function and something to convert a generic type a to an element of the list of type elt (namely f), you can append an element of a generic type a to a lambda-list of type elt.
For example
let myLambdaList = \cons nil -> 1 `cons` (2 `cons` (3 `cons` nil))
in (\cons nil -> mapFB cons read "0" (myLambdaList cons nil) ) (:) []will produce
[0,1,2,3]But what is the purpose of these functions since a clever reader will have seen that they are not exported? Optimization! Take a look at the type of mapFB if we apply to it the two first arguments: now you have a convenient function to use in a foldr.
And why bother to give it a name? Because it will allow us to remember what we were rewriting, and rewrite back function when we were not able to make any optimization.
The rules
Let us jump quick in the interesting part. Here is (a part of) the rewrite rules used in GHC.Base to optimize the code of functions working on lists:
{-# RULES
"map" [~1] forall f xs. map f xs = build (\c n -> foldr (mapFB c f) n xs)
"fold/build" forall k z (g::forall b. (a->b->b) -> b -> b) .
foldr k z (build g) = g k z
"mapFB" forall c f g. mapFB (mapFB c f) g = mapFB c (f.g)
"mapList" [1] forall f. foldr (mapFB (:) f) [] = map f
#-}So what is happening ?
- Only in phase
2(the first one), we rewrite every occurrence ofmapby its strangebuildequivalent (which usemapFB), hoping for fusion to happen. - This fusion can happen with two rules, always activated:
- The
"fold/build"one, described in the previously mentioned David’s article, merging afoldrfollowed by abuild(thus allowing the simplification of afoldr f n . map gexpression). - If there was two
mapFB(and thus something likemap f . map gat the beginning of the process), we merge them into a simpler and fastermap (f . g) - In the meantime,
buildis inlined, allowing the"mapList"rule to fire.
- The
- If there is any “dumb”
foldrandmapFBleft, we rewrite them back to a standard form using the"mapList"rule (only active from the phase1to the end).
For more details, see the related comment in the GHC.Base source.
Is it doing it right?
Let us take an example.
Imagine I want to make the sum the elements of a list of integers and add the whole length of this list. One approach (rather bad) is to add 1 to each element of the list and then make the sum its elements.
Here is roughly the different phases of the rewriting work made by the compiler if you try to compile such a function:
foldr (+) 0 (map (+1) xs)
=(map)= foldr (+) 0 (build (\c n -> foldr (mapFB c (+1)) n xs))
=(fold/build)= (\c n -> foldr (mapFB c (+1)) n xs)) (+) 0
== foldr (mapFB (+) (+1)) 0 xs
=(inline mapFB)= foldr (\x -> (+) (x+1)) 0 xsEt voila. At the beginning we were reading two times the list and at the end, only one.
Back to graphs
The data
The main data we will work with is Graph from Algebra.Graph module:
data Graph a = Empty
| Vertex a
| Overlay (Graph a) (Graph a)
| Connect (Graph a) (Graph a)We can define a fold specialized for Graph:
-- Generalised 'Graph' folding: recursively collapse a 'Graph' by applying
-- the provided functions to the leaves and internal nodes of the expression.
-- The order of arguments is: empty, vertex, overlay and connect.
foldg :: b -> (a -> b) -> (b -> b -> b) -> (b -> b -> b) -> Graph a -> b
foldg e v o c = go
where
go Empty = e
go (Vertex x ) = v x
go (Overlay x y) = o (go x) (go y)
go (Connect x y) = c (go x) (go y)
{-# INLINE [0] foldg #-}This useful function allows us to define two key-instances:
instance Functor (Graph a) where
fmap f = foldg Empty (Vertex . f) Overlay Connect
instance Foldable (Graph a) where
foldMap f = foldg mempty f (<>) (<>)Déjà Vu?
Now if we define for convenience:
mapG :: (a -> b) -> Graph a -> Graph b
mapG = fmap
{-# INLINE [0] mapG #-}We cannot use a rule targetting directly fmap because it does not have any inline pragma attached to its definition and thus can be inlined at any moment. mapG’s INLINE pragma guarantee that we can target mapG in a rewrite rule during the phase 2 and 1.
We are almost in the same position as when working with lists! We have a foldr and a map.
Can we achieve the same as for lists? Can we optimize
foldg e v o c . mapG finto
foldg e (v . f) o cusing the same method than in GHC.Base for lists?
Introducing buildG
The work is simple, we only need to adapt the type from GHC.Base to graphs:
type GraphF a b = b -> (a -> b) -> (b -> b -> b) -> (b -> b -> b) -> b
buildG :: forall a. (forall b. GraphF a b) -> Graph a
buildG g = g Empty Vertex Overlay Connect
{-# INLINE [1] buildG #-}A “lambda-graph” of type a is a function of type GraphF a b. So like lambda-lists, applying the four constructors allow us to convert it back to a standard Graph.
Add FB function
Our mapGFB will be slightly different:
mapGFB :: (b -> c) -> (a -> b) -> a -> c
mapGFB = (.)
{-# INLINE [0] mapGFB #-}It is only a composition of function, but it will allow us to remember that it was made for a rewriting of mapG.
The rules
Now we have all the bricks to build our graph rules:
{-# RULES
-- Transform a mapG into its build equivalent
"mapG" [~1] forall f g. mapG f g = buildG (\e v o c -> foldg e (mapGFB v f) o c g)
-- Merge a foldg followed by a buildG
"foldg/buildG" forall e v o c (g::forall b. GraphF a b).
foldg e v o c (buildG g) = g e v o c
-- Merge two mapFB
"mapFB/mapFB" forall c f g. mapGFB (mapGFB c f) g = mapGFB c (f.g)
-- Rules to rewrite un-merged function back
"mapGGraph" [1] forall f. foldg Empty (mapGFB Vertex f) Overlay Connect = mapG f
#-}So, as in GHC.Base, we
- Rewrite
mapGinto itsbuildGequivalent using the"mapG"rule - Try to optimize what can be optimized with the
"foldg/buildG"and the"mapFB/mapFB"rules - Convert back to standard form what was not optimized using the
"mapGGraph"rule
Benchmarks
Let’s verify if all of this is working. Using the great Criterion library:
import Algebra.Graph
import Control.DeepSeq
import Criterion.Main
main :: IO ()
main = defaultMain
[ bench "foldg" $ nf foldg' $!! grInt
, bench "foldg . mapG" $ nf (foldg' . mapG') $!! grInt
, bench "foldg . mapG . mapG" $ nf (foldg' . mapG' . mapG') $!! grInt
, bench "mapG" $ nf mapG' $!! grInt
, bench "mapG . mapG" $ nf (mapG' . mapG') $!! grInt
]
where
foldg' :: Graph Int -> Int
foldg' = foldg (0 :: Int) (const 1) (+) (+)
mapG' :: Graph Int -> Graph Int
mapG' = mapG (+1)
grInt :: Graph Int
grInt = vertices [0..9999]we will benchmark different combinations of foldg and mapG.
With rewrite rules
Compiling with
$ ghc -O Test.hs -ddump-rule-firings | grep Algebra.Graph
Rule fired: mapG (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: mapFB/mapFB (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: mapG (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: mapFB/mapFB (Algebra.Graph)
Rule fired: foldg/buildG (Algebra.Graph)
Rule fired: foldg/mapGFB (Algebra.Graph)
So our rules are doing something… Are they doing it right ?:
benchmarking foldg
time 96.62 μs (96.26 μs .. 97.09 μs)
[...]
benchmarking foldg . mapG
time 94.05 μs (93.90 μs .. 94.22 μs)
[...]
benchmarking foldg . mapG . mapG
time 94.57 μs (94.39 μs .. 94.79 μs)
[...]
benchmarking mapG
time 207.6 μs (205.9 μs .. 209.8 μs)
[...]
benchmarking mapG . mapG
time 201.2 μs (199.9 μs .. 203.2 μs)
[...]
The rules are working! foldg and foldg . mapG are taking almost the same time to run. The mapG composition was also optimized away!
Without the rules
But are we just lucky ? Let’s try without the rewrite rules:
$ ghc -O Test.hs -fno-enable-rewrite-rules
benchmarking foldg
time 95.36 μs (95.04 μs .. 95.62 μs)
[...]
benchmarking foldg . mapG
time 221.5 μs (221.3 μs .. 221.8 μs)
[...]
benchmarking foldg . mapG . mapG
time 398.3 μs (397.4 μs .. 399.0 μs)
[...]
benchmarking mapG
time 201.5 μs (201.4 μs .. 201.6 μs)
[...]
benchmarking mapG . mapG
time 348.6 μs (348.5 μs .. 348.7 μs)
[...]
There is no luck here :) That was effectively the rewrite rules that optimizes the expressions.
Conclusion
The build technique can be used to optimize user code with many structures with a custom fold, for example with alga’s graphs, but certainly for many others data-type. Also note that almost any function based on foldg and of type Graph a -> Graph b can be optimized, not only mapG (induce and transpose composition can be optimized to same way).