How to find out GHC's memory representations of data types?

Recently, blog entries such as Computing the Size of a Hashmap explained how to reason about space complexities of commonly used container types. Now I'm facing the question of how to actually "see" which memory layout my GHC version chooses (depending on compile flags and target architecture) for weird data types (constructors) such as

data BitVec257 = BitVec257 {-# UNPACK #-} !Word64
                           {-# UNPACK #-} !Word64
                           {-# UNPACK #-} !Bool
                           {-# UNPACK #-} !Word64
                           {-# UNPACK #-} !Word64

data BitVec514 = BitVec514 {-# UNPACK #-} !BitVec257
                           {-# UNPACK #-} !BitVec257

In C there's the sizeof and offsetof operator, which allows me to "see" what size and alignment was chosen for the fields of C struct .

I've tried to look at GHC Core in the hope to find some hint there, but I didn't know what to look for. Can somebody point me in the right direction?


My first idea was to use this neat litte function, due to Simon Marlow:

{-# LANGUAGE MagicHash,UnboxedTuples #-}
module Size where

import GHC.Exts
import Foreign

unsafeSizeof :: a -> Int
unsafeSizeof a =
  case unpackClosure# a of
    (# x, ptrs, nptrs #) ->
      sizeOf (undefined::Int) + -- one word for the header
        I# (sizeofByteArray# (unsafeCoerce# ptrs)
             +# sizeofByteArray# nptrs)

Using it:

Prelude> :!ghc -c Size.hs

Size.hs:15:18:
    Warning: Ignoring unusable UNPACK pragma on the
             third argument of `BitVec257'
    In the definition of data constructor `BitVec257'
    In the data type declaration for `BitVec257'
Prelude Size> unsafeSizeof $! BitVec514 (BitVec257 1 2 True 3 4) (BitVec257 1 2 True 3 4)
74

(Note that GHC is telling you that it cannot unbox Bool since it's a sum type.)

The above function claims that your data type uses 74 bytes on a 64-bit machine. I find that hard to believe. I'd expect the data type to use 11 words = 88 bytes, one word per field. Even Bool s take one word, as they are pointer to (statically allocated) constructors. I'm not quite sure what's going on here.

As for alignment I believe every field should be word aligned.


Memory footprints of Haskell Data Types

(The following applies to GHC, other compilers may use different storage conventions)

Rule of thumb: a constructor costs one word for a header, and one word for each field. Exception: a constructor with no fields (like Nothing or True) takes no space, because GHC creates a single instance of these constructors and shares it amongst all uses.

A word is 4 bytes on a 32-bit machine, and 8 bytes on a 64-bit machine.

So eg

data Uno = Uno a
data Due = Due a b

an Uno takes 2 words, and a Due takes 3.

Also I believe it is possible to write a haskell function which performs the same tasks as sizeof or offsetof

链接地址: http://www.djcxy.com/p/80194.html

上一篇: Python:如何估计/计算数据结构的内存占用量?

下一篇: 如何找出GHC的数据类型的内存表示?