Repa performance versus lists

In the Numeric Haskell Repa Tutorial Wiki, there is a passage that reads (for context):

10.1 Fusion, and why you need it

Repa depends critically on array fusion to achieve fast code. Fusion is a fancy name for the combination of inlining and code transformations performed by GHC when it compiles your program. The fusion process merges the array filling loops defined in the Repa library, with the "worker" functions that you write in your own module. If the fusion process fails, then the resulting program will be much slower than it needs to be, often 10x slower an equivalent program using plain Haskell lists. On the other hand, provided fusion works, the resulting code will run as fast as an equivalent cleanly written C program. Making fusion work is not hard once you understand what's going on.

The part that I don't understand is this:

"If the fusion process fails, then the resulting program will be much slower than it needs to be, often 10x slower an equivalent program using plain Haskell lists."

I understand why it would run slower if stream fusion fails, but why does it run that much slower than lists?

Thanks!


Edit: This is not correct--see Don Nelson's comment (and his answer--he knows a lot more about the library than I do).

Immutable arrays cannot share components; disregarding fusion, any modification to an immutable array must reallocate the entire array. By contrast, while list operations are non-destructive, they can share parts: fi (h:t) = i:t , for example, replaces the head of a list in constant time by creating a new list in which the first cell points to the second cell of the original list. Moreover, because lists can be built incrementally, such functions as generators that build a list by repeated calls to a function can still run in O(n) time, while the equivalent function on an immutable array without fusion would need to reallocate the array with every call to the function, taking O(n^2) time.


Typically, because lists are lazy, and Repa arrays are strict.

If you fail to fuse a lazy list traversal, eg

map f . map g

you pay O(1) cost per value for leaving the intermediate (lazy) cons cell there.

If you fail to fuse the same traversal over a strict sequence, you pay at least O(n) per value for allocating a strict intermediate array.

Also, since fusion mangles your code into an unrecognizable Stream data type, to improve analysis, you can be left with code that has just too many constructors and other overheads.

链接地址: http://www.djcxy.com/p/43218.html

上一篇: 使用LLVM的大数运算(来自Haskell)

下一篇: 性能与列表的对比