Paramorphisms fusion for algebraic graphs
Introduction
Algebraic graphs are a wonderful structure to work with, and especially to learn optimization techniques!
In this article, the goal is to optimize at compile-time the composition of some paramorphisms and homomorphisms of graphs (don’t worry with these barbaric terms, all of this is discussed below).
If you never approached algebraic graphs, I suggest that you read the original paper by A. Mokhov, my tutorial or just the documentation on Hackage of Alga, the Haskell library that implements the idea.
The problem
Shortcut deforestation
This process is introduced in “A Short Cut to deforestation” (by A. Gill, J. Launchbury and S. Peyton Jones). Basically, it is the idea that one can remove intermediate structures when working with “good functions” (functions that are preserving the structure of the object on which they are working on).
For example, the expression
builds an intermediate (and unneeded) list: map
will build a new list that will be destructed right away by foldr
. It can be optimized (and GHC does it) to
Note that this is possible only because map
preserves the list structure.
On algebraic graphs, we have an equivalent to foldr
named foldg
:
Consider the following expression
where f :: Graph a -> Graph b
is an operation that preserves the structure of the graph (mathematicians like to call this an homomorphism) and x
is a graph.
We are pretty much in the same situation as for lists: indeed, the above expression currently goes through the structure two times (one for f
, the other for foldg
). The goal is to rewrite it to an equivalent expression that goes through the graph only once, using the fact that f
is an homomorphism.
Implementation for algebraic graphs
I already wrote about how I implemented this technique for foldg
using GHC’s rewrite rules. I suggest that you read it to understand better the following, even if I will try to quickly summarize it just below.
Quick reminder
The basic idea is to view a graph homomorphism as a function transforming arguments passed to foldg
(which are generally graph constructors).
First of all, we need to remember a cool result (proved in this blog): graphs homomorphisms are exactly functions that are made with bind
, which is written for graphs as:
So we only have to optimize something like:
The question is: how to represent “functionally” the bind
operation? Remember, if we can represent bind
as a function operating on foldg
arguments, we can easily fuse it with an actual foldg
-function (imitating what GHC does for lists).
First, there is a type that encapsulate this notion:
Then we can use these rules:
It effectively transforms any occurrence of bind
into a “function-like” foldg, that transforms its arguments.
Then if we encounter an expression like foldg e v o c . buildR
, we can optimize the composition using "foldg/buildR"
!
If there was nothing to optimize, things are rewritten back to their normal form with the inlining of buildR
.
hasEdge
We can now optimize the composition of several functions of the Alga library.
Let me introduce a killjoy: the hasEdge
function, that tests if an edge is present in the graph. It is actually defined as:
The algorithm is pretty simple:
- There is of course no edge in an
Empty
graph, so youMiss
it. - If you
hit
aVertex
corresponding to theTail
of the edge, you say it, else youMiss
. - For an
Overlay
node:- If you missed it in the left part, the edge can only be in the right one.
- If you hit a
Tail
in the left part, maybe there is the whole edge in the right part, but at least you hit theTail
. - If you hit an
Edge
, you say it!
- For a
Connect
node,- If you missed it in the left part, the edge can only be in the right one.
- If you hit a
Tail
in the left part:- If the right part contains the other part of the edge, you found the whole
Edge
. - Otherwise, you only have its
Tail
.
- If the right part contains the other part of the edge, you found the whole
- If you hit an
Edge
, you say it!
Observations
hasEdge
seems a bit like a foldg-function but not totally. The problem comes with the call to hasVertex
on an unconsumed part of the graph in the middle of the process.
Our actual rules will not optimize hasEdge (x >>= f)
, since our hasEdge
is not expressed with foldg
.
hasEdge
seems anyway not so extraordinary and it is easy to see that it can be indeed optimized when used with fmap
to traverse the graph only once. For example hasEdge s t (fmap f g)
is equivalent to:
Note that in this expression, the fmap
operation was incorporated in go (Vertex x)
and in the hasVertex
call (which was also inlined).
Can the theory help us to optimize it automatically? And even better, for any function like hasEdge
, that are looking almost like foldg
but not?
More theory
Of bananas and barbed wire
The great article “Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire” of E. Meijer, M. Fokkinga and R. Paterson, describes some recursion operators; here, we will focus on two major: catamorphisms (written between bananas bracket in the article) and paramorphisms (written between barbed wire bracket in the article). If you haven’t read it, it is really worth it!
Catamorphisms
Catamorphism (which comes from Greek “κατα” meaning “downwards”, “down from”) is a term to describe functions that:
- In functional programming, generalize the
fold
function, that is, on a list, a function of typecata :: (a -> b -> b) -> b -> [a] -> b
.cata
take a combination function, a base case, and fold the structure accordingly. - From a mathematical point of view, it (quoting the great Wikipedia) “denotes the unique homomorphism from an initial algebra into some other algebra”. Here, the important word is, I think, “homomorphism”: being a homomorphism, a catamoprhism will preserve the underlying algebraic structure.
You already know cata
for lists; as a clever reader has already noticed from its type, cata
for lists is nothing more than foldr
in disguise:
Did you guess what is the function expressing a catamorphism on graph? It is foldg
!
Paramorphisms
Quoting again Wikipedia: “In formal methods of computer science, a paramorphism (from Greek παρά, meaning “close together”) is an extension of the concept of catamorphism […] to deal with a form which ‘eats its argument and keeps it too’”.
The idea is to extend the combination function: you can allow it to reason on the structure itself. When you are doing recursion, you sometimes want to inspect (or directly use) what is left of the structure (note that you cannot simply do that in a catamorphism (since this argument is not available to the combination function)).
For example, again on lists, you will have something like: para :: (a -> ([a], b) -> b) -> b -> [a] -> b
. The combination function can use the not already consumed list.
But why can I need such a function ?
foldr
does already anything I want!
Indeed foldr
is equivalent (in term of expressivity) to para
. This is nicely explained in this stackoverflow answer. Basically, you can define foldr
with para
and you can define para
from foldr
. But if you do it with para'
(again from the mentioned stackoverflow answer):
para'
will be slower than the one we defined: it builds an exact copy of the list as it traverses it.
So the answer to the above question is efficiency: para
allows you to make more efficient functions, and when you work with graphs (and when you simply work), you just want the more efficient implementation!
Paramorphisms for graphs
One can make paramorphisms for graphs by changing a bit foldg
(and making more complex its signature):
(Thanks Andrey for the great name :) ).
Back to hasEdge
You guessed it, hasEdge
is a paramorphism!
Indeed we can write hasEdge
using paragraph
:
The goal
Our current goal is to optimize
to something that traverses the original graph x
only once. If we achieve this, it will solve our problem with hasEdge
and for every function expressed with paragraph
.
Is this even possible?
A legitimate question is: can we indeed do such a thing? Is not paragraph
too general to allow fusion?
This is not so simple and I did not found an optimization that works for every paramorphism (and I don’t think it is possible), but only for a subclass of them.
I proved in Coq the following theorem (that is valid for every paramorphism):
(∘
stands for functional composition in Coq
).
With the right part of the equality, you will traverse the graph once using paragraph
until you find a vertex, and then you will apply your “binded” function and traverse the result with paragraph
again.
The only “little” problem is with apply2L
. Indeed, you need to change the last arguments of your paramorphism to reflect the bind in the unconsumed part of the graph.
You can view this by a duplication of the bind
operation: it gets incorporated in the paragraph
expression but also for the two lasts arguments of the combination function. This leads to a simple problem: you don’t want to rewrite functions that use both the unconsumed graph and the recursive call. If you allow this, the rewriting will lead to a double computation of the “binded” function (one for the recursive call, and one for the delayed bind
on the unconsumed arguments).
We need to ensure that we rewrite only functions that make the promise to use either the recursive call or the unconsumed graph. Note that hasEdge
fulfill this promise, so all is well!
Implementation
Promises need to be kept
I don’t like “functions that make promises”, so we need to encode the promise “I either use the recursive call or the unconsumed graph” in types.
The not working but elegant version
Ideally, this kind of function can be encoded with:
First of all, this indeed encode the promise, almost literally: your function either use the left recursive call or the left unconsumed part and then produce a function that either use the right recursive call or the unconsumed right graph.
But with this definition comes a problem: the second argument of the combination function is “hidden” in another function. For example, with such a definition, the only way to define apply2L
is something like:
Your only choice (and this is logical) is to apply first the function (either on the recursive call of the unconsumed graph) and discuss its result. Sadly, this way will lead to a problem: one cannot fuse any operation (like bind
) on the unconsumed right part of the graph. GHC is “blinded” by the function encoding and cannot inline what the returned function (f'
) really is (since it depends at run-time on the result of f
).
A solution to this problem is to write explicitly the combination function, giving the user only the control on some parameter.
A working solution
Here is a solution, not very classy, but working (if you find something better, don’t hesitate to tell me) :
You have three possibilities:
B
: it is a classic combination function (like the one we can find infoldg
).L
: you inspect the left recursive call using a predicateb -> Bool
, decide what to do, either call the recursion on the right part or inspect the right unconsumed part.R
: the same, but you inspect first the right recursive call.
Here is switchCombi
that effectively transform this data into a combination function for paragraph
Note that providing a constructor for the case Graph a -> Graph a -> b
is not very interesting, since it does not use any recursive calls.
Thus we can define paragraphR
:
Then it is easy to define hasEdge
using paragraphR
:
Foldg adaptation
To adapt this for paragraphR
we need to change the previous encoding of Foldg
with
That is a structure that represents exactly the type of paragraphR
(like Foldg
represented foldg
).
Given a function into its Paragraph
form, it is easy to get the graph it represents:
Bind arguments
Some problems come with the choices we made, lets see directly the rules and some auxilliary functions (don’t run away!):
Far more scary than the previous ones, isn’t it?
Let me explain what are doing the rules:
"buildR/bindR"
is converting abind
into itsbuildR
equivalent (it is, up to some names, exactly the previous theorem)."foldg/buildR"
assure the compatibility with the previous work: you can always fuse afoldg
on the left of abind
!"paragraphR/buildR"
is the main rule and fuse aparagraphR
on the left of abind
.
The complication comes from the fact that we fuse with bind
. Because we allow the combination function of paragraph
to reason on the graph itself, we will have to remember the bind
operation that needs to be made on the “not already consumed argument” of our paramorphism (more explicitly, we are delaying the bind operation).
This is done by apply2LR
which apply bind
to the two lasts arguments.
Even with this, the above rules seem a lot more complicated than necessary.
Warning: cycle
One (who does not read commentaries) might ask
Why do you use buildBindR and not directly bind?
Because there is a little problem. The "buildR/bindR"
rules is precisely rewriting bind
occurrences, so using it directly in the right part of the equality will lead to a rewriting cycle.
BUT
We need this bind
and we need it in a “re-writable form”. If you look at hasEdge
, you see that it uses hasVertex
on the unconsumed part of the graph, and this function can be fused with bind
. So, ideally, standard shortcut deforestation will also happen here, directly on the argument of combination function for Connect
.
The only possibility is to use buildBindR
, which is only bind
is its buildR
form. We use the direct code of bind
for the arguments, because we do not want to come into a rewriting cycle.
Conclusion
Is all of this working?
Yes! Using inspection-testing
(a great tool that compare the code produced by GHC at compile-time), the following code compiles:
What happened?
The fmap
was entirely fused with hasEdge
. By entirely, I mean not only the test on vertices, but also the composition of hasVertex
and fmap
happened!
More precisely, the fmap
operation is made only on the needed part of the graph. That is, if your graph is like Overlay a b
where the desired edge is on the a
part, the b
part of the graph will not be evaluated (and particularly not the fmap
operation on this part).
What remains to be done
-
Add list optimizations: a classic way to create an algebraic graph is to use a list (an adjacency list for example). For example
edges :: [(a,a)] -> Graph a
that make an algebraic graph from a list of edges. We already have some rewrite rules to optimize expressions likefoldg e v o c (edges xs)
(because we know how the graph is made, we can avoid building an intermediate graph). Is this possible withparagraph
? -
Alga: all I described here is already implemented on my fork of Alga, but there is certainly some cleanup and verifications to make it available in the official Alga repository!
-
Proofs: I only proved that we can optimize things. It will be nice to actually prove that the rewriting process is correct (that is: we rewrite things only to equivalent ones).
Thanks
Thanks to Andrey Mokhov for his review of this article.