Generating a unique value in Haskell do

To generate x86 assembly code, I have defined a custom type called X86 :

data X86 a = X86 { code :: String, counter :: Integer, value :: (X86 a -> a) }

This type is used in do-notation like the following. This makes it easy to write templates for generating if-statements, for-loops, etc...

generateCode :: X86 ()
generateCode = do
  label1 <- allocateUniqueLabel
  label2 <- allocateUniqueLabel
  jmp label1
  label label1
  jmp label2
  label label2

Instructions are defined like this:

jmp :: String -> X86 ()
jmp l = X86 { code = "jmp " ++ l ++ ";n", counter = 0, value = const () }

label :: String -> X86 ()
label l = X86 { code = l ++ ":n", counter = 0, value = const () }

And the completed assembly file is printed like so:

printAsm :: X86 a -> String
printAsm X86{code=code} = code

main = do
  putStrLn (printAsm generateCode)

I implemented the X86 monad in the following manner. Essentially, the sequence operator concatenates blocks of assembly code in order and ensures the counters are incremented.

instance Monad X86 where
  x >> y = X86 { code = code x ++ code y, counter = counter x + counter y, value = value y }
  x >>= f = x >> y
    where y = f (value x x)

The problem is the labels are not incremented properly, so they are not unique! The following is the output:

jmp Label1;
Label1:
jmp Label1;
Label1:

I desire the output to have a unique value for each label:

jmp Label1;
Label1:
jmp Label2;
Label2:

To complete the example, here is the implementation of the allocatedUniqueLabel function:

allocateUniqueId :: X86 Integer
allocateUniqueId = X86 { code = "", counter = 1, value = counter }

allocateUniqueLabel :: X86 String
allocateUniqueLabel = do
  id <- allocateUniqueId
  return ("Label" ++ show id)

How can I fix my X86 monad so the labels are unique?

Here is what I've tried:

  • Incrementing a global counter. => Haskell does not safely allow global state outside the IO monad.
  • Using the State monad. => I have looked into a number of examples, but do not understand how to integrate them into my existing X86 monad.
  • Keep track of the counter outside of the monad. => I rather the counter is updated "behind the scenes"; otherwise, a lot of code templates which do not use labels will need to propagate the counter manually.

  • We can use mtl classes to describe X86 code as effectful programs. We want:

  • to generate code, this is a Writer effect;
  • to maintain a counter, this is a State effect.
  • We worry about instantiating these effects last, and in the description of the programs we use MonadWriter and MonadState constraints.

    import Control.Monad.State  -- mtl
    import Control.Monad.Writer
    

    Allocating a new identifier increments the counter, without generating any code. This only uses the State effect.

    type Id = Integer
    
    allocateUniqueLabel :: MonadState Id m => m Id
    allocateUniqueLabel = do
      i <- get
      put (i+1)  -- increment
      return ("Label" ++ show (i+1))
    

    And of course, we have actions to generate code, that don't need to care about the current state. So they use the Writer effect.

    jmp :: MonadWriter String m => String -> m ()
    jmp l = tell ("jmp " ++ l ++ ";n")
    
    label :: MonadWriter String m => String -> m ()
    label l = tell (l ++ ":n")
    

    The actual program looks the same as the original, but with more general types.

    generateCode :: (MonadState Id m, MonadWriter String m) => m ()
    generateCode = do
      label1 <- allocateUniqueLabel
      label2 <- allocateUniqueLabel
      jmp label1
      label label1
      jmp label2
      label label2
    

    The effects are instantiated when we run this program, here using runWriterT / runWriter and runStateT / runState (the order doesn't matter much, these two effects commute).

    type X86 = WriterT String (State Id)
    
    runX86 :: X86 () -> String
    runX86 gen = evalState (execWriterT gen) 1 -- start counting from 1
    -- evalState and execWriterT are wrappers around `runStateT` and `runWriterT`:
    -- - execWriterT: discards the result (of type ()), only keeping the generated code.
    -- - evalState: discards the final state, only keeping the generated code,
    --   and does some unwrapping after there are no effects to handle.
    

    You probably want to use this monad stack:

    type X86 a = StateT Integer (Writer String) a
    

    Since you have a state and a writer, you could also consider using RWS (reader-writer-state all in one):

    type X86 a = RWS () String Integer a
    

    Let's pick the first one for fun. I'd first define a helper function to increment the counter (monads cannot lawfully increment a counter "automatically"):

    instr :: X86 a -> X86 a
    instr i = do
        x <- i
        modify (+1)
        return x
    

    Then you could define jmp as:

    jmp :: String -> X86 ()
    jmp l = instr $ do
        lift (tell ("jmp " ++ l ++ ";n"))
           -- 'tell' is one of Writer's operations, and then we 'lift'
           -- it into StateT
    

    (The do there is superfluous, however I suspect there will be a pattern of starting instruction definitions with instr $ do )

    I would not roll my own monad for this -- it can be instructive to do so, but I think you'll get more mileage using the standard libraries for this one.


    As you probably now underestand from the other answers, the problem with your approach was that even though you were using the counter, you were still generating your labels locally. In particular

    label1 <- allocateUniqueLabel
    label label1
    

    was equivalent to

    X86 { code = "Label1:n", counter = 1, value = const () }    
    

    We need to assemble the whole code first, generate the labels, and only afterwards (in some sense) generate the actual code using the labels. And this is what the other answers are suggesting by having the counter stored in the State (or RWS ) monad.


    There is yet another issue that we can address: You want to be able to jump both forwards and backwards. This is most likely why you have separate allocateUniqueLabel and label functions. But this allows to set the same label twice.

    It is actually possible to use to do notation with "backwards" binding using MonadFix , which defines this monadic operation:

    mfix :: (a -> m a) -> m a
    

    Since both State and RWS have MonadFix instances, we can indeed write code like this:

    {-# LANGUAGE GeneralizedNewtypeDeriving, RecursiveDo #-}
    module X86
        ( X86()
        , runX86
        , label
        , jmp
        ) where
    
    import Control.Monad.RWS
    
    -- In production code it'll be much faster if we replace String with
    -- ByteString.
    newtype X86 a = X86 (RWS () String Int a)
        deriving (Functor, Applicative, Monad, MonadFix)
    
    runX86 :: X86 a -> String
    runX86 (X86 k) = snd (execRWS k () 1)
    
    newtype Label = Label { getLabel :: String }
    
    label :: X86 Label
    label = X86 $ do
        counter <- get
        let l = "Label" ++ show counter
        tell (l ++ ":n")
        modify (+1)
        return (Label l)
    
    jmp :: Label -> X86 ()
    jmp (Label l) = X86 . tell $ "jmp " ++ l ++ ";n"
    

    And use it like this:

    example :: X86 ()
    example = do
        rec l1 <- label
            jmp l2
            l2 <- label
        jmp l1
    

    There are a few things to note:

  • We need to use the RecursiveDo extension to enable the rec keyword.
  • Keyword rec delimits a block of mutually recursive definitions. In our case it could also start one line later ( rec jmp l2 ). GHC then translates it into using mfix internally. (Using the deprecated mdo keyword instead of rec would make the code somewhat more natural.)
  • We wrap the internals in the X86 newtype. First it's always good to hide the internal implementation, it allows easy refactorings later. Second, mfix requires that the function passed to it a -> ma isn't strict in its argument. The effect must not depend on the argument, otherwise mfix diverges. This is condition is satisfied for our functions, but if the internals are exposed, someone could define a contrived function like this:

    -- | Reset the counter to the specified label.
    evilReset :: Label -> X86 ()
    evilReset = X86 . put . read . drop 5 . getLabel
    

    Not only it breaks the uniqueness of labels, but also causes the following code to hang:

    diverge :: X86 ()
    diverge = do
        rec evilReset l2
            l2 <- label
        return ()
    

  • Another quite similar alternative would be to use the Rand monad and generate labels with the Random instance of UUID . Something like WriterT String Rand a , which also has a MonadFix instance.


    (From purely academic perspective it might be possible to construct an arrow instead of a monad, that'd implement ArrowLoop , but disallow state modifications that depend on values, such as in evilReset . But encapsulation of X86 achieves the same goal, keeping the much friendlier do syntax.)

    链接地址: http://www.djcxy.com/p/42922.html

    上一篇: Haskell有什么大惊小怪的?

    下一篇: 在Haskell中生成一个独特的值