Prelude

In my previous post “Rewrite rules and a specific fold: use optimization techniques from GHC.Base”, I described a technique to optimize composition of a custom fold and some functions on algebraic graphs, implemented in Haskell in the alga library.

The goal of this post is to try to specify exactly what are the classes of functions we can optimize. I will try to do this in a “mathematical” way (don’t run, this is actually fun), with all my results proved in Coq, a very cool proof assistant (it is a first time for me, please be indulgent with my code :) ).

All the work can be found at https://github.com/nobrakal/coq-alga/.

The version of this repositiory when I wrote this article can be found at https://github.com/nobrakal/coq-alga/tree/4586bda0847b059b263cc4a1c8685004c9461d6c.

Alga theory

I had to implement some of the already known algebraic graphs theory in Coq (already know because proved in agda on a dedicated repository).

All of this work is in the src/Graph.v file:

  1. The data structure, an inductive one called Graph.
  2. Some standards functions (foldg, isEmpty…).
  3. Useful lemmas on them.
  4. Axioms on the graph equality (in a Coq class called EqG).
  5. Indications for Coq to allow rewriting (using the rewrite tactic) when working with equal graphs.
  6. Some proofs about the graph equality (especially, that Empty is the identity for Overlay).

A note on equality

Here the = symbol is ambiguous.

So we will distinguish:

  • a = b for the structural equality (a.k.a. Leibniz equality) between a and b.
  • R a b for graph equality relation between a and b (note that the axioms of this relation are satisfied by others relations than graph equality, so things are not so simple).

Homomorphism

The work described here can be found in the src/Homomorphism.v file.

Graph homomorphism

An homomorphism in algebra theory is (quoting Wikipedia) “a structure-preserving map between two algebraic structures of the same type”.

For us, a correct homomorphism f :: Graph a -> Graph b should satisfy:

  • f Empty = Empty
  • f (Overlay a b) = Overlay (f a) (f b)
  • f (Connect a b) = Connect (f a) (f b)

This is encoded in a Coq class called Homomorphism:

Class Homomorphism {A B:Type} (f : Graph A -> Graph B) : Prop := {
  Hom_Empty :> f Empty = Empty;
  Hom_Overlay :> forall a b, f (Overlay a b) = Overlay (f a) (f b) ;
  Hom_Connect :> forall a b, f (Connect a b) = Connect (f a) (f b)
 }.

Equivalence with bind-functions

Let us recall briefly the concerned Haskell functions:

data Graph a = Empty
             | Vertex a
             | Overlay (Graph a) (Graph a)
             | Connect (Graph a) (Graph a)

foldg :: b -> (a -> b) -> (b -> b -> b) -> (b -> b -> b) -> Graph a -> b
foldg e v o c = go
  where
    go Empty         = e
    go (Vertex  x  ) = v x
    go (Overlay x y) = o (go x) (go y)
    go (Connect x y) = c (go x) (go y)

(>>=) :: Graph a -> (a -> Graph b) -> Graph b
g >>= f = foldg Empty f Overlay Connect g

A cool result is that such homomorphisms are exactly bind-functions, that is to say any graph homomorphism can be expressed with the (>>=) operator.

Proof

So any function expressed with (>>=) is clearly a graph homomorphism (this is proved in the bind_is_hom lemma).

A fun fact is that the reciprocal is also true, like shown in the hom_is_bind lemma:

Lemma hom_is_bind A B (hom: Graph A -> Graph B):
  Homomorphism hom -> hom = bind (hom ∘ Vertex).

That is, an homomorphism is equal to bind with its definition for a vertex. A graph homomorphism is indeed entirely defined by what it is doing on a leaf of the graph expression.

Examples

  • fmap f is a graph homomorphism, using Vertex . f as bind argument.
  • removeVertex v is a graph homomorphism, using \x -> if v == x then Vertex x else Empty as bind argument.

Theorem 1: Composition of a graph homomorphism and a foldg function

Any graph homomorphism can be fused with a foldg-function, like show in the foldg_bind theorem:

Theorem foldg_bind A B C (e : C) (v: B -> C)
  (o:C -> C -> C) (c:C -> C -> C) (f_v : A -> Graph B):
    foldg e v o c ∘ bind f_v = foldg e (foldg e v o c ∘ f_v) o c.

Basically, because a graph homomorphism preserve the structure, it can easily be fused with something that destruct this structure: this is shortcut deforestation.

Theorem 2: Composition of graph homomorphisms

A cool corollary of the previous theorem is that the composition of two graph homomorphisms is again a graph homomorphism.

Proof

This is proved in the bind_compo theorem:

Theorem bind_compo A B C (hom1 : Graph A -> Graph B) (hom2 : Graph B -> Graph C):
 (Homomorphism hom1) /\ (Homomorphism hom2) ->
 hom2 ∘ hom1 = bind (hom2 ∘ hom1 ∘ Vertex).

Problems

So graph homomorphisms are cool, but not always what we want: in the examples of graph homomorphisms, removeVertex is leaving empty leaves in the structure. If it was removing them (like the actual removeVertex in the alga library), it will change the graph structure, and thus will not be a graph homomorphism as we defined it. Such functions are very useful, so we must lower our restrictions and only require preservation of the graph equality to include them.

Reduced graph homomorphism

Theorems of this section are in the src/ReducedHomo.v file.

Definition

A function f :: Graph a -> Graph b is called a reduced graph homomorphism if:

R (f Empty) Empty
R (f (Overlay a b)) (Overlay (f a) (f b))
R (f (Connect a b)) (Connect (f a) (f b))

So we require only that the function has the same properties than a graph homomorphism, but with graph equality instead of the structural one.

This idea is encapsulated in the Reduced_hom Coq class:

Class Reduced_hom {A B : Type} (R: relation (Graph B)) (f : Graph A -> Graph B) : Prop :=
{
  RHom_EqG :> EqG B R;
  RHom_Empty :>  R (f Empty) Empty ;
  RHom_Overlay :> forall a b, R (f (Overlay a b)) (Overlay (f a) (f b)) ;
  RHom_Connect :> forall a b, R (f (Connect a b)) (Connect (f a) (f b))
}.

Theorem 3: Reduced graph homomorphism are indeed reduced graph homomorphism

Graph homomorphism are reduced graph homomorphism.

The reverse is false.

Proof

  • “graph homomorphism are reduced graph homomorphism.”

This is proved in the hom_is_reduced_hom theorem.

The idea is only to use the fact that, for all a b :: Graph A, if a = b then R a b.

  • “The reverse is false”. Using
    wforget :: Graph a -> Graph b
    wforget _ = Empty
    

wforget is clearly a reduced homomorphism (this is proved in the const_empty_is_reduced_hom theorem).

But it is also clearly not a graph homorphism, this is proved by the absurd in the const_empty_is_not_hom theorem.

The case of empty leaves: a class of reduced graph homomorphism

Theorems of this section are in the src/SmartHomo.v file.

Prelude

-- Remove empty leaves around a graph operator (usually Overlay or Connect)
kSimpl :: (Graph a -> Graph a -> Graph a) -> Graph a -> Graph a -> Graph a
kSimpl c x y =
  if isEmpty x
  then if isEmpty y
    then Empty
    else y
  else if isEmpty y
    then x
    else c x y

-- Remove empty leaves of a graph
dropEmpty :: Graph a -> Graph a
dropEmpty = foldg Empty Vertex (kSimpl Overlay) (kSimpl Connect)

Definition

A function f :: Graph a -> Graph b is called a smart graph homomorphism if it can be described using f_w :: a -> Graph b such that

f g = dropEmpty (g >>= f_w)

Obviously, such a function can go through the graph only once, using

f = foldg Empty f_w (kSimpl Overlay) (kSimpl Connect)

This is proved in the smart_hom_single theorem.

Example

induce is a smart graph homomorphism (maybe the definition was made for it…):

induce :: (p -> Bool) -> Graph a -> Graph a
induce p =
  foldg
    Empty
    (\x -> if p x then Vertex x else Empty)
    (kSimpl Overlay)
    (kSimpl Connect)

This is proved in the induce_smart_hom theorem.

Theorem 4: smart graph homomorphisms are reduced homomorphisms.

This seems clear because empty leaves does not play any role in graph equality, but it takes me anyway 44 lines to prove it.

It is proved in the smart_hom_is_reduced_hom theorem:

Theorem smart_hom_is_reduced_hom A B (R: relation (Graph B))
  (f : Graph A -> Graph B) :
    EqG B R -> Smart_hom f -> Reduced_hom R f.

Theorem 5: Composition of smart graph homomorphisms

The composition of two smart graph homomorphisms is again a smart graph homomorphism.

Proof

This was hard to prove because not clear at first glance (well mostly, it was long, using 280 lines ^^). The idea is to see that the composition of two smart graph homomorphisms is something not so strange:

Theorem smart_hom_compo A B C (s1 : Graph A -> Graph B)
  (s2 : Graph B -> Graph C): (Smart_hom s1) /\ (Smart_hom s2) ->
  s2 ∘ s1 = foldg Empty (s2 ∘ s1 ∘ Vertex) (kSimpl Overlay) (kSimpl Connect).

(The proof is here).

Then see that this form allow us to prove that the composition is again a smart graph morphism:

Theorem smart_hom_compo_is_smart A B C (s1 : Graph A -> Graph B)
  (s2 : Graph B -> Graph C):
    (Smart_hom s1) /\ (Smart_hom s2) -> Smart_hom (s2 ∘ s1).

(The proof is here).

Conclusion

So all this work to prove that we can optimize anything like:

foldg e v o c . morph1 . morph2 . [...] . morphn

to a function that will only go through the graph once.

if morph1, morph2, …, morphn are graph homomorphisms (theorem 1 and theorem 2).

We can also optimize any combination of smart graph homomorphisms (theorem 5).

What is left

There is certainly some things to do playing with these results:

  • Find a cool (and efficient) implementation for the alga library.
  • The composition of smart graph homomorphisms is preserving graph structure, maybe there is an efficient one that is only preserving graph equality ?
  • Determine precisely the class of reduced homomorphisms. That is to say, is there an equivalence with a well known class of functions like for graph homomorphisms ?
  • What is the class of reduced homomorphisms that we can compose ?
  • Can this be generalised to arbitrary algebraic data types that have some notion of an Empty constructor?
  • And certainly many other things :)

Thanks

Thanks to Andrey Mokhov for his review of this article.