Why are Haskell/GHC executables so large in filesize?

2018-06-11 14:40:45

Possible Duplicate:
Small Haskell program compiled with GHC into huge binary

Recently I noticed how large Haskell executables are. Everything below was compiled on GHC 7.4.1 with -O2 on Linux.

Hello World ( main = putStrLn "Hello World!" ) is over 800 KiB. Running strip over it reduces the filesize to 500 KiB; even adding -dynamic to the compilation doesn't help much, leaving me with a stripped executable around 400 KiB.

Compiling a very primitive example involving Parsec yields a 1.7 MiB file.

-- File: test.hs
import qualified Text.ParserCombinators.Parsec as P
import Data.Either (either)

-- Parses a string of type "x y" to the tuple (x,y).
testParser :: P.Parser (Char, Char)
testParser = do
    a <- P.anyChar
    P.char ' '
    b <- P.anyChar
    return (a, b)

-- Parse, print result.
str = "1 2"
main = print $ either (error . show) id . P.parse    testParser "" $ str
-- Output: ('1','2')

Parsec may be a larger library, but I'm only using a tiny subset of it, and indeed the optimized core code generated by the above is dramatically smaller than the executable:

$ ghc -O2 -ddump-simpl -fforce-recomp test.hs | wc -c
49190 (bytes)

Therefore, it's not the case that a huge amount of Parsec is actually found in the program, which was my initial assumption.

Why are the executables of such an enormous size? Is there something I can do about it (except dynamic linking)?

To effectively reduce size of the executable produced by Glasgow Haskell Compiler you have to focus on

use of dynamic linking with -dynamic option passed to ghc so modules code won't get bundled into the final executable by utilizing of shared(dynamic) libraries. The existence of shared versions of these GHC's libraries in the system is required !

removing debugging informations of the final executable (fE by strip tool of GNU's binutils)

removing imports of unused modules (don't expect gains at dynamic linking)

The simple hello world example has the final size 9 KiB and Parsec test about 28 KiB (both 64 bit Linux executables) which I find quite small and acceptable for such a high level language implementation.

My understanding is that if you use a single function from package X, the entire package gets statically linked in. I don't think GHC actually links function-by-function. (Unless you use the "split objects" hack, which "tends to freak the linker out".)

But if you're linking dynamically, that ought to fix this. So I'm not sure what to suggest here...

(I'm pretty sure I saw a blog post when dynamic linking first came out, demonstrating Hello World compiled to a 2KB binary. Obviously I cannot find this blog post now... grr.)

Consider also cross-module optimisation. If you're writing a Parsec parser, it's likely that GHC will inline all the parser definitions and simplify them down to the most efficient code. And, sure enough, your few lines of Haskell have produced 50KB of Core. Should that get 37x bigger when compiling to machine-code? I don't know. You could perhaps try looking at the STG and Cmm code produced in the next steps. (Sorry, I don't recall the compiler flags off the top of my head...)

链接地址: http://www.djcxy.com/p/33286.html

上一篇: 最小的haskell（ghc）程序安装（没有ghc / cabal的部署）

下一篇: 为什么Haskell / GHC可执行文件在文件大小上如此之大？