GHC, One compiler to RULE them all
I am using some GHC pragmas to optimize alga and I must admit that they are very powerful. On the wise counsel of one of my GSoC mentors, A. Mokhov, here is a little summary of what I discovered, as this can help others.
GHC is the employee you need
First of all, it is important to understand the power of GHC.
As a Haskell compiler, GHC is already great, but it is greater than that because it can optimize your code.
For example, let us consider times4List
that multiply by 4 every member of a list of Int
.
Obviously, this definition is equivalent to
The second definition should be faster, because it traverses the list once, while the first do it 2 times.
Let us try with a very simple code to benchmark this, using Criterion:
If we compile the code without optimization, it produces:
benchmarking times2Times2List
time 726.9 μs (707.4 μs .. 749.2 μs)
[...]
benchmarking times4List
time 397.2 μs (396.0 μs .. 398.5 μs)
[...]
So as expected, the first is slower.
Now if we recompile the same file but asking GHC to optimize our code (with -O
):
benchmarking times2Times2List
time 215.0 μs (214.9 μs .. 215.2 μs)
[...]
benchmarking times4List
time 212.2 μs (211.9 μs .. 212.6 μs)
[...]
So GHC, among other optimizations, merged the two maps.
GHC’s RULEs pragmas
But how GHC figured out to combine the two maps? With a cool pragma: the rewrites rules. It is very simple and yet very powerful.
Let us see what optimized the two first examples (this is not true, you can see the real RULES used here, but we will assume it for the sake of simplicity).
Basically, this rule is telling GHC to change map f (map g xs)
to map (f.g) xs
when types match.
Everything is important
The actual little words “when types match” lead to something very cool: you can specialize a function to a specific type.
Imagine you have a function that internally deals with Data.Set
to count the unique elements of a list:
It is well known that Data.IntSet
is faster than Data.Set
. So why not use IntSet
when possible? And, even better, is it possible to do the change “behind” the user’s back, so the user does not have to use the specialized version “by hand”?
Now, you are telling GHC to change countUniques
by the faster countUniquesInt
when possible, even is the user’s code (do not pay attention to the INLINE
pragma, it is discussed after).
In two other files, here is some benchmarking code:
- Alone.hs
- Main.hs
Compiling with ghc Main.hs -O
and running it gives:
benchmarking countUniques
time 530.9 μs (528.6 μs .. 533.9 μs)
[...]
benchmarking countUniquesP
time 836.3 μs (834.0 μs .. 838.2 μs)
[...]
Using the rewrite rule, we get more than a 50%
speedup!
Try to remove the INLINE
line in my previous example and re-build using ghc Main.hs -O
. Ghc will complain about an INLINE
pragma, and in performances:
benchmarking countUniques
time 880.6 μs (855.6 μs .. 907.1 μs)
[...]
benchmarking countUniquesP
time 885.0 μs (858.8 μs .. 910.2 μs)
[...]
The countUniques/Int
rule has not fired!
So what did the INLINE
pragma?
INLINE, please
Now let’s say that you are going to use a strange library which provide:
Then one can write:
So does the optimization occur?
That is the pretty part: yes it actually takes place. To help to figure out such optimizations, GHC inlines your code. Because Haskell is pure, one can replace a function by its definition.
If GHC estimate that it worth it, the compiler will replace your function by its definition. And you guessed it, GHC tries to apply RULES at each pass. So conceptually, GHC translates:
in
and apply the previous rule:
So GHC does this pretty well itself, but you can sometimes want to override its default behavior (in fact, it almost every time inlines small functions, but is more reluctant to inline big functions). That is what INLINE
pragma is made for (and the corresponding NOINLINE
).
Adding
will make GHC very keen to inline myComplicatedFunc
.
Inlining phase
You can even add the phase on which you want the inlining to be done. Currently, GHC runs 3 phases (numbered from 2 to 0) in which it tries to inline and applies rewrite rules. The syntax is {-# INLINE [k] myComplicatedFunc #-}
, which, quoting GHC manual, means:
do not inline
myComplicatedFunc
until phase k, but from phase k onwards be very keen to inline it.
Why do we care? Because of RULES
! If we INLINE a function too early in the process, related RULES
might not fire. So it is always a good idea to specify an INLINE
pragma with phase control when using rules, to ensure that nothing is done too early preventing your super rule to fire!
So what happened to countUniques
?
Because its definition is very simple, GHC
inlined it and prevented the RULE to fire!
Adding {-# INLINE [1] countUniques #-}
ensure that GHC will not inline countUniques
too early and thus that the RULE will fire.
Why not everything is inlined?
Because it is sometimes not possible (for a recursive function for example) or sometimes useless (transforming map f xs
in map (\x -> fbody x) xs
is not useful) and this takes time to the compiler. Moreover inlining everything when it is not needed can lead to an enormous amount of code.
SPECIALIZE yourself
One other pragma was once a RULE
but is now a pragma in full: SPECIALIZE
.
When you have a class constraint in a function, for example with:
(It is basically map (+2)
but we will not use this definition either GHC will inline it too quickly).
The function produced by GHC is containing another argument called a dictionary carrying information about class methods (here the Num
class). Adding an argument and having to lookup through it takes time. So you can specialize the function using the SPECIALIZE
pragma.
This will create another function, with the same code than the original plusTwoRec
but with the specialized type (and thus, without the dictionary) and a rule to change the two functions when types match.
To benchmark all of this, we will use three files:
- Alone.hs
- Todo.hs
- Main.hs
(Merging Alone.hs
and Todo.hs
prevent the rule to fire in GHC 8.4.3, this will be corrected in a next release (see https://ghc.haskell.org/trac/ghc/ticket/15445)).
Compiling with ghc Main.hs -O
and running it:
benchmarking plusTwoRec
time 187.3 μs (184.6 μs .. 190.8 μs)
[...]
benchmarking plusTwoRecP
time 396.0 μs (393.9 μs .. 398.3 μs)
[...]
Compiling with ghc Main.hs -O -fno-enable-rewrite-rules
and running it:
benchmarking plusTwoRec
time 396.3 μs (394.4 μs .. 398.4 μs)
[...]
benchmarking plusTwoRecP
time 395.3 μs (393.8 μs .. 397.7 μs)
[...]
Specialization is a great way to optimize a polymorphic library when you know that it will be used mostly with one type.
Some examples of specialization can be found in the actual GHC.Base
.
Why not everything is specialized?
When you have to deal with the outside you cannot tell in advance what types will your user need; moreover creating a specialized function takes space and doing this too much can create a very big library.
Inspect the core
But how to figure out what really does GHC? You can inspect the code produced after the inline/rules
passes by passing -ddump-simpl
to it. It will throw a lot of code, not always very understandable, but useful for comparison.
Conclusion
GHC is like your grandpa’s old car. When you are using it, it goes well, but your grandpa was able to make miracles with this old thing! How? Only because he knew how to use it properly :) .
Note: This article presents a very (very) superficial point of view about these pragmas. Please don’t hesitate to check the GHC manual related chapter or comment if you find something strange!