Library ghc RULES don't activate

fgl is a Haskell library for graph manipulation. This library comes with an implementation of its base classes - Data.Graph.Inductive.PatriciaTree - that is supposedly highly tuned for performance. Part of that performance tuning involves ghc RULES pragmas to replace certain generic functions with specialized versions that are much faster.

However, my evidence is that these RULES don't seem to work at all, and I don't understand why not. For people trying to replicate exactly what I see, I've put my test project up at https://github.com/fizbin/GraphOptiTest and am using ghc version 7.10.2 .


Here's my test program:

{-# LANGUAGE TupleSections #-}

module Main where

import Control.Exception
import Control.Monad
import Data.Graph.Inductive
import qualified Data.Graph.Inductive.PatriciaTree as Pt
import qualified MyPatriciaTree as MPt

makeGraph :: (DynGraph gr) => Int -> gr () Int
makeGraph n = mkGraph (map (,()) [1 .. n])
  (concatMap (x -> map (y -> (x, y, x*y)) [x .. n]) [1 .. n])

main1 :: IO ()
main1 =
  replicateM_ 200 $ let x = makeGraph 200 :: Pt.Gr () Int
                    in evaluate (length $ show x)

main2 :: IO ()
main2 =
  replicateM_ 200 $ let x = makeGraph 200 :: MPt.Gr () Int
                    in evaluate (length $ show x)

main :: IO ()
main = main1 >> main2

Now, Data.Graph.Inductive.PatriciaTree has this definition for the class function mkGraph :

    mkGraph vs es   = insEdges es
                      . Gr
                      . IM.fromList
                      . map (second (l -> (IM.empty,l,IM.empty)))
                      $ vs

Where insEdges is a function defined in the module Data.Graph.Inductive.Graph as:

insEdges :: (DynGraph gr) => [LEdge b] -> gr a b -> gr a b
insEdges es g = foldl' (flip insEdge) g es

And Data.Graph.Inductive.PatriciaTree has this to say about insEdge :

{-# RULES
      "insEdge/Data.Graph.Inductive.PatriciaTree"  insEdge = fastInsEdge
  #-}
fastInsEdge :: LEdge b -> Gr a b -> Gr a b
fastInsEdge (v, w, l) (Gr g) = g2 `seq` Gr g2
  where
    g1 = IM.adjust addSucc' v g
    g2 = IM.adjust addPred' w g1

    addSucc' (ps, l', ss) = (ps, l', IM.insertWith addLists w [l] ss)
    addPred' (ps, l', ss) = (IM.insertWith addLists v [l] ps, l', ss)

So, in theory, when I run main1 in my test program I should get that compiled down into something that eventually calls fastInsEdge .

To test this, I compare against a modified version of Data.Graph.Inductive.PatriciaTree that uses this as its definition of the mkGraph method: (this is the class MyPatriciaTree used above in main2 )

    mkGraph vs es   = doInsEdges
                      . Gr
                      . IM.fromList
                      . map (second (l -> (IM.empty,l,IM.empty)))
                      $ vs
      where
        doInsEdges g = foldl' (flip fastInsEdge) g es

When I run my test program (after cabal configure --enable-library-profiling --enable-executable-profiling and cabal build GraphOptiTest ), though, the main2 method smokes the main1 method. It isn't even close - the profile shows 99.2% of the program's time is spent inside main1 . (and changing the program to just run main2 shows that yes, main2 is really fast on its own)

Yes, I do have -O in the ghc-options section of my cabal file.

Trying ghc options like -ddump-rule-firings doesn't really help - all I can see is that these replacement rules aren't firing, but I have no idea why. I don't know how to get the compiler to tell me why it didn't activate the replacement rules.


Bringing up something discovered by messing around with fgl 's source in response @dfeuer's answer below:

If I add a specialized version of insEdges to Data.Graph.Inductive.PatriciaTree as:

{-# RULES
      "insEdges/Data.Graph.Inductive.PatriciaTree"  insEdges = fastInsEdges
  #-}
fastInsEdges :: [LEdge b] -> Gr a b -> Gr a b
fastInsEdges es g = foldl' (flip fastInsEdge) g es

Then both main1 and main2 are now fast. This replacement rule fires; why doesn't the other one? (And no, telling ghc to NOINLINE the function insEdge does no good)


EPILOGUE:

So there's now a bug filed with the fgl package for not tagging their functions that use insEdge and insNode appropriately so that the fast versions will be used. But in my code now I work around this and the workaround may be useful in more situations, so I thought I'd share it. At the top of my code now, I have:

import qualified Data.Graph.Inductive as G
import qualified Data.Graph.Inductive.PatriciaTree as Pt

-- Work around design and implementation performance issues
-- in the Data.Graph.Inductive package.
-- Specifically, the tuned versions of insNode, insEdge, gmap, nmap, and emap
-- for PatriciaTree graphs are exposed only through RULES pragmas, meaning
-- that you only get them when the compiler can specialize the function
-- to that specific instance of G.DynGraph. Therefore, I create my own
-- type class with the functions that have specialized versions and use that
-- type class here; the compiler then can do the specialized RULES
-- replacement on the Pt.Gr instance of my class.
class (G.DynGraph gr) => MyDynGraph gr where
  mkGraph :: [G.LNode a] -> [G.LEdge b] -> gr a b
  insNodes :: [G.LNode a] -> gr a b -> gr a b
  insEdges :: [G.LEdge b] -> gr a b -> gr a b
  insNode :: G.LNode a -> gr a b -> gr a b
  insEdge :: G.LEdge b -> gr a b -> gr a b
  gmap :: (G.Context a b -> G.Context c d) -> gr a b -> gr c d
  nmap :: (a -> c) -> gr a b -> gr c b
  emap :: (b -> c) -> gr a b -> gr a c

instance MyDynGraph Pt.Gr where
  mkGraph nodes edges = insEdges edges $ G.mkGraph nodes []
  insNodes vs g = foldl' (flip G.insNode) g vs
  insEdges es g = foldl' (flip G.insEdge) g es
  insNode = G.insNode
  insEdge = G.insEdge
  gmap = G.gmap
  nmap = G.nmap
  emap = G.emap

(Had I used the nemap function in my code I would have included that in the class too) Then, any code of mine which was formerly written in terms of (G.DynGraph gr) => ... is now written in terms of (MyDynGraph gr) => ... . The compiler RULES activate for the Pt.Gr instance, and I then get the optimized version for each function.

Essentially, this trades away the ability of the compiler to inline any of these functions into the calling code and possibly do other optimizations for always getting the optimized versions. (and the cost of an extra pointer indirection at runtime, but that's trivial in comparison) Since profiling showed that those other optimizations never yielded anything significant anyway, this was a clear net win in my case.

Many people's code could use SPECIALIZE rules aggressively to get the optimized versions everywhere; however, sometimes that isn't possible and it wasn't in the real production code that caused my question without refactoring huge chunks of the application. I had a data structure with a member that has the type (forall gr. G.DynGraph gr => tokType -> gr () (MyEdge c)) - that now uses MyDynGraph for the class constraint, but completely unwinding it to not have forall gr. in the signature would have been a huge effort, and such a signature prevents specialization from working across that boundary.


I haven't done any experiments, but here's my guess. The insEdge function is not marked with a (phased) INLINE or NOINLINE , so the inliner is free to inline it whenever it's fully applied. In the definition of insEdges , we see

foldl' (flip insEdge) g es

Inlining foldl' gives

foldr f' id es g
  where f' x k z = k $! flip insEdge z x

flip is now fully applied, so we can inline it:

foldr f' id es g
  where f' x k z = k $! insEdge x z

Now insEdge is fully applied, so GHC may choose to inline it right then and there, before the rule ever has a chance to fire.

Try adding {-# NOINLINE [0] insEdge #-} right by the definition of insEdge and see what happens. If it works, submit a pull request to fgl .

PS In my opinion, this sort of thing should really be done by using class methods with defaults, rather than rewrite rules. Rules are always a bit fussy.


As the comments revealed, the big problem wasn't premature inlining, but rather a failure to specialize insEdge . In particular, Data.Graph.Inductive.Graph does not export an unfolding for insEdges , so it's impossible to specialize it, and the insEdge it calls, to the appropriate type. The ultimate fix was to mark insEdges INLINABLE , but I would still advise marking insEdge NOINLINE [0] out of an abundance of caution.

链接地址: http://www.djcxy.com/p/33192.html

上一篇: 在Haskell程序中收集暂停时间

下一篇: 图书馆ghc规则不激活