Advantages of strict fields in data types

This may now be a bit fuzzy, but I've been wondering that for a while. To my knowledge with ! , one can make sure a parameter for a data constructor is being evaluated before the value is constructed:

data Foo = Bar !Int !Float

I have often thought that laziness is a great thing. Now, when I go through sources, I see strict fields more often than the ! -less variant.

What is the advantage of this and why shouldn't I leave it lazy as it is?


Unless you're storing a large computation in the Int and Float fields, significant overhead can build up from lots of trivial computations building up in thunks. For instance, if you repeatedly add 1 to a lazy Float field in a data type, it will use up more and more memory until you actually force the field, calculating it.

Often, you want to store to expensive computation in a field. But if you know you won't be doing anything like that ahead of time, you can mark the field strict, and avoid having to manually add seq everywhere to get the efficiency you desire.

As an additional bonus, when given the flag -funbox-strict-fields GHC will unpack strict fields1 of data types directly into the data type itself, which is possible since it knows they will always be evaluated, and thus no thunk has to be allocated; in this case, a Bar value would contain the machine words comprising the Int and Float directly inside the Bar value in memory, rather than containing two pointers to thunks which contain the data.

Laziness is a very useful thing, but some of the time, it just gets in the way and impedes computation, especially for small fields that are always looked at (and thus forced), or that are modified often but never with very expensive computations. Strict fields help overcome these issues without having to modify all uses of the data type.

Whether it's more common than lazy fields or not depends on the type of code you're reading; you aren't likely to see any functional tree structures use strict fields extensively, for instance, because they benefit greatly from laziness.

Let's say you have an AST with a constructor for infix operations:

data Exp = Infix Op Exp Exp
         | ...

data Op = Add | Subtract | Multiply | Divide

You wouldn't want to make the Exp fields strict, as applying a policy like that would mean that the entire AST is evaluated whenever you look at the top-level node, which is clearly not what you want to benefit from laziness. However, the Op field is never going to contain an expensive computation that you want to defer to a later date, and the overhead of a thunk per infix operator might get expensive if you have really deeply-nested parse trees. So for the infix constructor, you'd want to make the Op field strict, but leave the two Exp fields lazy.

1 Only single-constructor types can be unpacked.


In addition to the information provided by other answers, keep in mind:

To my knowledge with ! , one can make sure a parameter for a data constructor is being evaluated before the value is constructed

It's interesting to look at how deep the parameter is evaluated - it's like with seq and $! evaluated to WHNF.

Given the datatypes

data Foo = IntFoo !Int | FooFoo !Foo | BarFoo !Bar
data Bar = IntBar Int

the expression

let x' = IntFoo $ 1 + 2 + 3
in  x'

evaluated to WHNF produces value IntFoo 6 (== fully evaluated, == NF).
Additionally this expression

let x' = FooFoo $ IntFoo $ 1 + 2 + 3
in  x'

evaluated to WHNF produces value FooFoo (IntFoo 6) (== fully evaluated, == NF).
However, this expression

let x' = BarFoo $ IntBar $ 1 + 2 + 3
in  x'

evaluated to WHNF produces value BarFoo (IntBar (1 + 2 + 3)) (!= fully evaluated, != NF).

Main point: The strictness of the !Bar parameter won't necessarily help if the data constructors of Bar don't contain strict parameters themselves.


There is an overhead associated with lazyness — the compiler has to create a thunk for the value to store the computation until the result is needed. If you know that you'll always need the result sooner or later, then it can make sense to force the evaluation of the result.

链接地址: http://www.djcxy.com/p/42966.html

上一篇: seq功能和严格性

下一篇: 数据类型中严格字段的​​优点