Load pure global variable from file

I have a file with some data in it. This data never changes and I want to make it available outside of the IO monad. How can I do that?

Example (note that this is just an example, my data is not computable):

primes.txt:

2 3 5 7 13

code.hs:

primes :: [Int]
primes = map read . words . unsafePerformIO . readFile $ "primes.txt"

Is this a "legal" use of unsafePerformIO ? Are there alternatives?


You could use TemplateHaskell to read in the file at compile time. The data of the file would then be stored as an actual string in the program.

In one module ( Text/Literal/TH.hs in this example), define this:

module Text.Literal.TH where

import Language.Haskell.TH
import Language.Haskell.TH.Quote

literally :: String -> Q Exp
literally = return . LitE . StringL

lit :: QuasiQuoter
lit = QuasiQuoter { quoteExp = literally }

litFile :: QuasiQuoter
litFile = quoteFile lit

In your module, you can then do:

{-# LANGUAGE QuasiQuotes #-}
module MyModule where

import Text.Literal.TH (litFile)

primes :: [Int]
primes = map read . words $ [litFile|primes.txt|]

When you compile your program, GHC will open the primes.txt file and insert its contents where the [litFile|primes.txt|] part is.


Using unsafePerformIO in that way isn't great.

The declaration primes :: [Int] says that primes is a list of numbers. One particular list of numbers, that doesn't depend on anything.

In fact, however, it depends on the state of file "primes.txt" when the definition happens to be evaluated. Someone could alter this file to alter the value that primes appears to have, which shouldn't be possible according to its type.

In the presence of a hypothetical optimisation which decides that primes should be recomputed on demand rather than stored in memory in full (after all, its type says we'll get the same thing every time we recompute it), primes could even appear to have two different values during a single run of the program. This is the sort of problem that can come with using unsafePerformIO to lie to the compiler.

In practice, all of the above are probably unlikely to be a problem.

But the theoretically correct thing to do is to not make primes a global constant (because it's not a constant). Instead, you make the computation that needs it parameterised on it (ie take primes as an argument), and in the outer IO program you read the file and then call the pure computation by passing the pure value the IO program extracted from the file. You get the best of both worlds; you don't have to lie to the compiler, and you don't have to put your entire program in IO . You can use constructs such as the Reader monad to avoid having to manually pass primes around everywhere, if that helps.

So you can use unsafePerformIO if you want to just get on with it. It's theoretically wrong, but unlikely to cause issues in practice.

Or you can refactor your program to reflect what's really going on.

Or, if primes really is a global constant and you just don't want to literally include a huge chunk of data in your program source, you can use TemplateHaskell as demonstrated by dflemstr.


Yes, it should be fine. You could add a {-# NOINLINE primes #-} pragma to be safe — not sure whether GHC would ever inline a CAF.

The only alternative I can think of is to do the same thing during compile time (using Template Haskell), essentially embedding the primes into the binary. However, I prefer your version — note that the primes list will be actually read & created lazily!

链接地址: http://www.djcxy.com/p/7512.html

上一篇: 为什么GHC这么大/大?

下一篇: 从文件加载纯全局变量