Use of Haskell state monad a code smell?

God I hate the term "code smell", but I can't think of anything more accurate.

I'm designing a high-level language & compiler to Whitespace in my spare time to learn about compiler construction, language design, and functional programming (compiler is being written in Haskell).

During the code generation phase of the compiler, I have to maintain "state"-ish data as I traverse the syntax tree. For example, when compiling flow-control statements I need to generate unique names for the labels to jump to (labels generated from a counter that's passed in, updated, & returned, and the old value of the counter must never be used again). Another example is when I come across in-line string literals in the syntax tree, they need to be permanently converted into heap variables (in Whitespace, strings are best stored on the heap). I'm currently wrapping the entire code generation module in the state monad to handle this.

I've been told that writing a compiler is a problem well suited to the functional paradigm, but I find that I'm designing this in much the same way I would design it in C (you really can write C in any language - even Haskell w/ state monads).

I want to learn how to think in Haskell (rather, in the functional paradigm) - not in C with Haskell syntax. Should I really try to eliminate/minimize use of the state monad, or is it a legitimate functional "design pattern"?


I'd say that state in general is not a code smell, so long as it's kept small and well controlled.

This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.

An fairly clear example of this was when my team was working on our entry for the ICFP Programming Contest 2009 (the code is available at git://git.cynic.net/haskell/icfp-contest-2009). We ended up with several different modular parts to this:

  • VM: the virtual machine that ran the simulation program
  • Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
  • Solution: generation of the solution file based on the output of the controllers
  • Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed
  • Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.

    The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.

    All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).

    When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.

    One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (ie, the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.


    I've written multiple compilers in Haskell, and a state monad is a reasonable solution to many compiler problems. But you want to keep it abstract---don't make it obvious you're using a monad.

    Here's an example from the Glasgow Haskell Compiler (which I did not write; I just work around a few edges), where we build control-flow graphs. Here are the basic ways to make graphs:

    empyGraph    :: Graph
    mkLabel      :: Label -> Graph
    mkAssignment :: Assignment -> Graph  -- modify a register or memory
    mkTransfer   :: ControlTransfer -> Graph   -- any control transfer
    (<*>)        :: Graph -> Graph -> Graph
    

    But as you've discovered, maintaining a supply of unique labels is tedious at best, so we provide these functions as well:

    withFreshLabel :: (Label -> Graph) -> Graph
    mkIfThenElse :: (Label -> Label -> Graph) -- branch condition
                 -> Graph   -- code in the 'then' branch
                 -> Graph   -- code in the 'else' branch 
                 -> Graph   -- resulting if-then-else construct
    

    The whole Graph thing is an abstract type, and the translator just merrily constructs graphs in purely functional fashion, without being aware that anything monadic is going on. Then, when the graph is finally constructed, in order to turn it into an algebraic datatype we can generate code from, we give it a supply of unique labels, run the state monad, and pull out the data structure.

    The state monad is hidden underneath; although it's not exposed to the client, the definition of Graph is something like this:

    type Graph = RealGraph -> [Label] -> (RealGraph, [Label])
    

    or a bit more accurately

    type Graph = RealGraph -> State [Label] RealGraph
      -- a Graph is a monadic function from a successor RealGraph to a new RealGraph
    

    With the state monad hidden behind a layer of abstraction, it's not smelly at all!


    Have you looked at Attribute grammars (AG)? (More info on wikipedia and an article in the Monad Reader)?

    With AG you can add attributes to a syntax tree. These attributes are separated in synthesized and inherited attributes.

    Synthesized attributes are things you generate (or synthesize) from your syntax tree, this could be the generated code, or all comments, or whatever else your interested in.

    Inherited attributes are input to your syntax tree, this could be the environment, or a list of labels to use during code generation.

    At Utrecht University we use the Attribute Grammar System (UUAGC) to write compilers. This is a pre-processor which generates haskell code ( .hs files) from the provided .ag files.


    Although, if you're still learning Haskell, then maybe this is not the time to start learning yet another layer of abstraction over that.

    In that case, you could manually write the sort of code that attributes grammars generate for you, for example:

    data AbstractSyntax = Literal Int | Block AbstractSyntax
                        | Comment String AbstractSyntax
    
    compile :: AbstractSyntax -> [Label] -> (Code, Comments)
    compile (Literal x) _      = (generateCode x, [])
    compile (Block ast) (l:ls) = let (code', comments) = compile ast ls
                                 in (labelCode l code', comments)
    compile (Comment s ast) ls = let (code, comments') = compile ast ls
                                 in (code, s : comments')
    
    generateCode :: Int -> Code
    labelCode :: Label -> Code -> Code
    
    链接地址: http://www.djcxy.com/p/7422.html

    上一篇: 什么是Haskell实际上有用的?

    下一篇: Haskell状态monad使用代码的气味?