Paramorphisms fusion for algebraic graphs

Introduction

Algebraic graphs are a wonderful structure to work with, and especially to learn optimization techniques!

In this article, the goal is to optimize at compile-time the composition of some paramorphisms and homomorphisms of graphs (don’t worry with these barbaric terms, all of this is discussed below).

If you never approached algebraic graphs, I suggest that you read the original paper by A. Mokhov, my tutorial or just the documentation on Hackage of Alga, the Haskell library that implements the idea.

The problem

Shortcut deforestation

This process is introduced in “A Short Cut to deforestation” (by A. Gill, J. Launchbury and S. Peyton Jones). Basically, it is the idea that one can remove intermediate structures when working with “good functions” (functions that are preserving the structure of the object on which they are working on).

For example, the expression

foldr (+) 0 . map (+1)

builds an intermediate (and unneeded) list: map will build a new list that will be destructed right away by foldr. It can be optimized (and GHC does it) to

foldr (\x y -> x + 1 + y) 0

Note that this is possible only because map preserves the list structure.

On algebraic graphs, we have an equivalent to foldr named foldg:

data Graph a =
    Empty
  | Vertex a
  | Overlay (Graph a) (Graph a)
  | Connect (Graph a) (Graph a)

foldg :: b -> (a -> b) -> (b -> b -> b) -> (b -> b -> b) -> Graph a -> b
foldg e v o c = go
  where
    go Empty         = e
    go (Vertex  x  ) = v x
    go (Overlay x y) = o (go x) (go y)
    go (Connect x y) = c (go x) (go y)

Consider the following expression

foldg e v o c (f x)

where f :: Graph a -> Graph b is an operation that preserves the structure of the graph (mathematicians like to call this an homomorphism) and x is a graph.

We are pretty much in the same situation as for lists: indeed, the above expression currently goes through the structure two times (one for f, the other for foldg). The goal is to rewrite it to an equivalent expression that goes through the graph only once, using the fact that f is an homomorphism.

Implementation for algebraic graphs

I already wrote about how I implemented this technique for foldg using GHC’s rewrite rules. I suggest that you read it to understand better the following, even if I will try to quickly summarize it just below.

Quick reminder

The basic idea is to view a graph homomorphism as a function transforming arguments passed to foldg (which are generally graph constructors).

First of all, we need to remember a cool result (proved in this blog): graphs homomorphisms are exactly functions that are made with bind, which is written for graphs as:

(>>=) :: Graph a -> (a -> Graph b) -> Graph b
x >>= f = foldg Empty f Overlay Connect x

So we only have to optimize something like:

foldg e v o c (x >>= f)

The question is: how to represent “functionally” the bind operation? Remember, if we can represent bind as a function operating on foldg arguments, we can easily fuse it with an actual foldg-function (imitating what GHC does for lists).

First, there is a type that encapsulate this notion:

type Foldg a = forall b. b -> (a -> b) -> (b -> b -> b) -> (b -> b -> b) -> b

Then we can use these rules:

buildR :: Foldg a -> Graph a
buildR g = g Empty Vertex Overlay Connect
{-# INLINE [1] buildR #-}

-- These rules transform functions into their buildR equivalents.
{-# RULES
"buildR/bindR" forall f g.
bindR g f = buildR (\e v o c -> foldg e (foldg e v o c . f) o c g)

"foldg/buildR" forall e v o c (g :: Foldg a).
foldg e v o c (buildR g) = g e v o c
 #-}

It effectively transforms any occurrence of bind into a “function-like” foldg, that transforms its arguments.

Then if we encounter an expression like foldg e v o c . buildR, we can optimize the composition using "foldg/buildR"!

If there was nothing to optimize, things are rewritten back to their normal form with the inlining of buildR.

hasEdge

We can now optimize the composition of several functions of the Alga library.

Let me introduce a killjoy: the hasEdge function, that tests if an edge is present in the graph. It is actually defined as:

data Hit = Miss | Tail | Edge deriving (Eq,Ord)

hasEdge :: Eq a => a -> a -> Graph a -> Bool
hasEdge s t g = hit g == Edge
  where
    hit Empty         = Miss
    hit (Vertex x   ) = if x == s then Tail else Miss
    hit (Overlay x y) = case hit x of
        Miss -> hit y
        Tail -> max Tail (hit y)
        Edge -> Edge
    hit (Connect x y) = case hit x of
        Miss -> hit y
        Tail -> if hasVertex t y then Edge else Tail
        Edge -> Edge

The algorithm is pretty simple:

There is of course no edge in an Empty graph, so you Miss it.
If you hit a Vertex corresponding to the Tail of the edge, you say it, else you Miss.
For an Overlay node:
- If you missed it in the left part, the edge can only be in the right one.
- If you hit a Tail in the left part, maybe there is the whole edge in the right part, but at least you hit the Tail.
- If you hit an Edge, you say it!
For a Connect node,
- If you missed it in the left part, the edge can only be in the right one.
- If you hit a Tail in the left part:
  - If the right part contains the other part of the edge, you found the whole Edge.
  - Otherwise, you only have its Tail.
- If you hit an Edge, you say it!

Observations

hasEdge seems a bit like a foldg-function but not totally. The problem comes with the call to hasVertex on an unconsumed part of the graph in the middle of the process.

Our actual rules will not optimize hasEdge (x >>= f), since our hasEdge is not expressed with foldg.

hasEdge seems anyway not so extraordinary and it is easy to see that it can be indeed optimized when used with fmap to traverse the graph only once. For example hasEdge s t (fmap f g) is equivalent to:

hasEdgeF :: Eq b => (a -> b) -> Graph a -> b -> b -> Bool
hasEdgeF f g s t = Edge == go g
  where
    go Empty = Miss
    go (Vertex x) = if (f x) == s then Tail else Miss
    go (Overlay x y) = case go x of
        Miss -> y
        Tail -> max Tail y
        Edge -> Edge
    go (Connect x y) = case go x of
        Miss -> y
        Tail -> if foldg False (\x -> f x == t) (||) (||) y then Edge else Tail
        Edge -> Edge

Note that in this expression, the fmap operation was incorporated in go (Vertex x) and in the hasVertex call (which was also inlined).

Can the theory help us to optimize it automatically? And even better, for any function like hasEdge, that are looking almost like foldg but not?

More theory

Of bananas and barbed wire

The great article “Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire” of E. Meijer, M. Fokkinga and R. Paterson, describes some recursion operators; here, we will focus on two major: catamorphisms (written between bananas bracket in the article) and paramorphisms (written between barbed wire bracket in the article). If you haven’t read it, it is really worth it!

Catamorphisms

Catamorphism (which comes from Greek “κατα” meaning “downwards”, “down from”) is a term to describe functions that:

In functional programming, generalize the fold function, that is, on a list, a function of type cata :: (a -> b -> b) -> b -> [a] -> b. cata take a combination function, a base case, and fold the structure accordingly.
From a mathematical point of view, it (quoting the great Wikipedia) “denotes the unique homomorphism from an initial algebra into some other algebra”. Here, the important word is, I think, “homomorphism”: being a homomorphism, a catamoprhism will preserve the underlying algebraic structure.

You already know cata for lists; as a clever reader has already noticed from its type, cata for lists is nothing more than foldr in disguise:

cata :: (a -> b -> b) -> b -> [a] -> b
cata combine e []     = e
cata combine e (y:ys) = combine y (cata k z ys)

Did you guess what is the function expressing a catamorphism on graph? It is foldg!

Paramorphisms

Quoting again Wikipedia: “In formal methods of computer science, a paramorphism (from Greek παρά, meaning “close together”) is an extension of the concept of catamorphism […] to deal with a form which ‘eats its argument and keeps it too’”.

The idea is to extend the combination function: you can allow it to reason on the structure itself. When you are doing recursion, you sometimes want to inspect (or directly use) what is left of the structure (note that you cannot simply do that in a catamorphism (since this argument is not available to the combination function)).

For example, again on lists, you will have something like: para :: (a -> ([a], b) -> b) -> b -> [a] -> b. The combination function can use the not already consumed list.

para :: (a -> ([a], b) -> b) -> b -> [a] -> b
para combine e []     = e
para combine e (y:ys) = combine e (y,ys) (para k z ys)

But why can I need such a function ? foldr does already anything I want!

Indeed foldr is equivalent (in term of expressivity) to para. This is nicely explained in this stackoverflow answer. Basically, you can define foldr with para and you can define para from foldr. But if you do it with para' (again from the mentioned stackoverflow answer):

para' c n = snd . foldr (\ x (xs, t) -> (x : xs, c x xs t)) ([], n)

para' will be slower than the one we defined: it builds an exact copy of the list as it traverses it.

So the answer to the above question is efficiency: para allows you to make more efficient functions, and when you work with graphs (and when you simply work), you just want the more efficient implementation!

Paramorphisms for graphs

One can make paramorphisms for graphs by changing a bit foldg (and making more complex its signature):

paragraph :: b -> (a -> b) -> (b -> b -> Graph a -> Graph a -> b) -> (b -> b -> Graph a -> Graph a -> b) -> Graph a -> b
paragraph e v o c = go
  where
    go Empty         = e
    go (Vertex  x  ) = v x
    go (Overlay x y) = o (go x) (go y) x y
    go (Connect x y) = c (go x) (go y) x y

(Thanks Andrey for the great name :) ).

Back to `hasEdge`

You guessed it, hasEdge is a paramorphism!

Indeed we can write hasEdge using paragraph:

hasEdge :: Eq a => a -> a -> Graph a -> Bool
hasEdge s t = (==) Edge . paragraph Miss v o c
  where
    v x = if x == s then Tail else Miss
    o x y _ _ = case x of
      Miss -> y
      Tail -> max Tail y
      Edge -> Edge
    c x y _ yy = case x of
      Miss -> y
      Tail -> if hasVertex t yy then Edge else Tail
      Edge -> Edge

The goal

Our current goal is to optimize

paragraph e v o c (bind f x)

to something that traverses the original graph x only once. If we achieve this, it will solve our problem with hasEdge and for every function expressed with paragraph.

Is this even possible?

A legitimate question is: can we indeed do such a thing? Is not paragraph too general to allow fusion?

This is not so simple and I did not found an optimization that works for every paramorphism (and I don’t think it is possible), but only for a subclass of them.

I proved in Coq the following theorem (that is valid for every paramorphism):

Definition apply2L {A B C D}
  (f : A -> A -> D -> D -> C) (g:B -> D) :=
  fun a b x y => f a b (g x) (g y).

Theorem para_compo
  {A B C}
  {e : C} {v: B -> C} {o:C -> C -> Graph B -> Graph B -> C} {c:C -> C -> Graph B -> Graph B -> C}
  {f : A -> Graph B}:
paragraph e v o c ∘ bind f =
paragraph e (paragraph e v o c ∘ f) (apply2L o (bind f)) (apply2L c (bind f)).

(∘ stands for functional composition in Coq).

With the right part of the equality, you will traverse the graph once using paragraph until you find a vertex, and then you will apply your “binded” function and traverse the result with paragraph again.

The only “little” problem is with apply2L. Indeed, you need to change the last arguments of your paramorphism to reflect the bind in the unconsumed part of the graph.

You can view this by a duplication of the bind operation: it gets incorporated in the paragraph expression but also for the two lasts arguments of the combination function. This leads to a simple problem: you don’t want to rewrite functions that use both the unconsumed graph and the recursive call. If you allow this, the rewriting will lead to a double computation of the “binded” function (one for the recursive call, and one for the delayed bind on the unconsumed arguments).

We need to ensure that we rewrite only functions that make the promise to use either the recursive call or the unconsumed graph. Note that hasEdge fulfill this promise, so all is well!

Implementation

Promises need to be kept

I don’t like “functions that make promises”, so we need to encode the promise “I either use the recursive call or the unconsumed graph” in types.

The not working but elegant version

Ideally, this kind of function can be encoded with:

type Combination a b =
  Either
    (b -> Either (b -> b) (Graph a -> b))
    (Graph a -> Either (b -> b) (Graph a -> b))

First of all, this indeed encode the promise, almost literally: your function either use the left recursive call or the left unconsumed part and then produce a function that either use the right recursive call or the unconsumed right graph.

But with this definition comes a problem: the second argument of the combination function is “hidden” in another function. For example, with such a definition, the only way to define apply2L is something like:

apply2L :: (Graph c -> Graph a) -> Combination a b -> Combination c b
apply2L c (Left f) = Left $ \a ->
  case f a of
   Left f'  -> Left $ \b -> f' b
   Right f' -> Right $ \y -> f' (c y)
apply2L c (Right f) = Right $ \x ->
  case f (c x) of
    Left f'  -> Left $ \b -> f' b
    Right f' -> Right $ \y -> f' (c y)

Your only choice (and this is logical) is to apply first the function (either on the recursive call of the unconsumed graph) and discuss its result. Sadly, this way will lead to a problem: one cannot fuse any operation (like bind) on the unconsumed right part of the graph. GHC is “blinded” by the function encoding and cannot inline what the returned function (f') really is (since it depends at run-time on the result of f).

A solution to this problem is to write explicitly the combination function, giving the user only the control on some parameter.

A working solution

Here is a solution, not very classy, but working (if you find something better, don’t hesitate to tell me) :

data Combination a b =
    B (b -> b -> b)
  | L (b -> Bool, b -> b -> b, Graph a -> b -> b)
  | R (b -> Bool, b -> b -> b, b -> Graph a -> b)

You have three possibilities:

B: it is a classic combination function (like the one we can find in foldg).
L: you inspect the left recursive call using a predicate b -> Bool, decide what to do, either call the recursion on the right part or inspect the right unconsumed part.
R: the same, but you inspect first the right recursive call.

Here is switchCombi that effectively transform this data into a combination function for paragraph

switchCombi :: Combination a b -> b -> b -> Graph a -> Graph a -> b
switchCombi (B f) = \a b _ _ -> f a b
switchCombi (L (pred,c1,c2)) = \a b x _ ->
  if pred b
  then c1 a b
  else c2 x b
switchCombi (R (pred,c1,c2)) = \a b _ x ->
  if pred a
  then c1 a b
  else c2 a x

Note that providing a constructor for the case Graph a -> Graph a -> b is not very interesting, since it does not use any recursive calls.

Thus we can define paragraphR:

paragraphR :: b -> (a -> b) -> Combination a b -> Combination a b -> Graph a -> b
paragraphR e v o c = paragraph e v (switchCombi o) (switchCombi c)
{-# INLINE [0] paragraphR #-}

Then it is easy to define hasEdge using paragraphR :

hasEdge :: Eq a => a -> a -> Graph a -> Bool
hasEdge s t = (==) Edge . paragraphR Miss v (B o) (R (cp,c1,c2))
  where
    v x = if x == s then Tail else Miss
    o x y = case x of
      Miss -> y
      Tail -> max Tail y
      Edge -> Edge
    cp x = x /= Tail
    c1 x y =
      case x of
        Miss -> y
        _ -> x
    c2 _ y = if hasVertex t y then Edge else Tail

Foldg adaptation

To adapt this for paragraphR we need to change the previous encoding of Foldg with

Paragraph a =
  forall b. b -> b -> (a -> b) -> Combination a b -> Combination a b -> b

That is a structure that represents exactly the type of paragraphR (like Foldg represented foldg).

Given a function into its Paragraph form, it is easy to get the graph it represents:

buildR :: Paragraph a -> Graph a
buildR g = g Empty Vertex (B Overlay) (B Connect)
{-# INLINE [1] buildR #-}

Bind arguments

Some problems come with the choices we made, lets see directly the rules and some auxilliary functions (don’t run away!):

apply2LR :: (Graph c -> Graph a) -> Combination a b -> Combination c b
apply2LR _ (B a) = B a
apply2LR f (L (a,b,c)) = L (a,b,\x y -> c (f x) y)
apply2LR f (R (a,b,c)) = R (a,b,\x y -> c x (f y))

-- Bind already in its "buildR" form (to avoid a rewriting cycle with "buildR/bindR")
buildBindR :: (a -> Graph b) -> Graph a -> Graph b
buildBindR f g =
  buildR
    (\e v o c ->
	  paragraphR
	    e
	    ((paragraphR e v o c) . f)
	    (apply2LR bind2R o)
	    (apply2LR bind2R c)
	    g
    )
  where
    -- bind without rewrite rules (to avoid a rewriting cycle with "buildR/bindR")
    bind2R = foldg Empty f Overlay Connect
{-# INLINE buildBindR #-}

{-# RULES
"buildR/bindR" forall (f::a -> Graph b) g.
  bindR g f =
    buildR
      (\e v o c ->
	    paragraphR
	      e
	      ((paragraphR e v o c) . f)
	      (apply2LR buildBindR o)
	      (apply2LR buildBindR c)
	      g
      )

"foldg/buildR" forall e v o c (g :: Paragraph a).
  foldg e v o c (buildR g) = g e v (B o) (B c)

"paragraphR/buildR" forall e v o c (g :: Paragraph a).
  paragraphR e v o c (buildR g) = g e v o c
 #-}

Far more scary than the previous ones, isn’t it?

Let me explain what are doing the rules:

"buildR/bindR" is converting a bind into its buildR equivalent (it is, up to some names, exactly the previous theorem).
"foldg/buildR" assure the compatibility with the previous work: you can always fuse a foldg on the left of a bind!
"paragraphR/buildR" is the main rule and fuse a paragraphR on the left of a bind.

The complication comes from the fact that we fuse with bind. Because we allow the combination function of paragraph to reason on the graph itself, we will have to remember the bind operation that needs to be made on the “not already consumed argument” of our paramorphism (more explicitly, we are delaying the bind operation). This is done by apply2LR which apply bind to the two lasts arguments.

Even with this, the above rules seem a lot more complicated than necessary.

Warning: cycle

One (who does not read commentaries) might ask

Why do you use buildBindR and not directly bind?

Because there is a little problem. The "buildR/bindR" rules is precisely rewriting bind occurrences, so using it directly in the right part of the equality will lead to a rewriting cycle.

BUT

We need this bind and we need it in a “re-writable form”. If you look at hasEdge, you see that it uses hasVertex on the unconsumed part of the graph, and this function can be fused with bind. So, ideally, standard shortcut deforestation will also happen here, directly on the argument of combination function for Connect.

The only possibility is to use buildBindR, which is only bind is its buildR form. We use the direct code of bind for the arguments, because we do not want to come into a rewriting cycle.

Conclusion

Is all of this working?

Yes! Using inspection-testing (a great tool that compare the code produced by GHC at compile-time), the following code compiles:

hasEdgeF :: Eq b => (a -> b) -> Graph a -> b -> b -> Bool
hasEdgeF f g s t = hasEdge s t (fmap f g)

hasEdgeFR :: Eq b => (a -> b) -> Graph a -> b -> b -> Bool
hasEdgeFR f g s t = Edge == go g
  where
    go Empty         = Miss
    go (Vertex  x  ) = if (f x) == s then Tail else Miss
    go (Overlay x y) =
      case go x of
        Miss -> go y
        Tail -> max Tail (go y)
        Edge -> Edge
    go (Connect x y) =
      let gx = go x in
        if gx /= Tail
        then
          case gx of
            Miss -> go y
            _ -> gx
        else if foldg False (\x -> f x == t) (||) (||) y then Edge else Tail

-- If this compiles, it means that hasEdge and hasEdgeFR have (almost) the same
inspect $ 'hasEdgeF === 'hasEdgeFR

What happened?

The fmap was entirely fused with hasEdge. By entirely, I mean not only the test on vertices, but also the composition of hasVertex and fmap happened!

More precisely, the fmap operation is made only on the needed part of the graph. That is, if your graph is like Overlay a b where the desired edge is on the a part, the b part of the graph will not be evaluated (and particularly not the fmap operation on this part).

What remains to be done

Add list optimizations: a classic way to create an algebraic graph is to use a list (an adjacency list for example). For example edges :: [(a,a)] -> Graph a that make an algebraic graph from a list of edges. We already have some rewrite rules to optimize expressions like foldg e v o c (edges xs) (because we know how the graph is made, we can avoid building an intermediate graph). Is this possible with paragraph?
Alga: all I described here is already implemented on my fork of Alga, but there is certainly some cleanup and verifications to make it available in the official Alga repository!
Proofs: I only proved that we can optimize things. It will be nice to actually prove that the rewriting process is correct (that is: we rewrite things only to equivalent ones).

Thanks

Thanks to Andrey Mokhov for his review of this article.