Big number arithmetic using LLVM (from Haskell)

2018-06-15 05:23:20

An answer to my previous question indicated that Haskell represents plusWord2# as llvm.uadd.with.overflow . I'm looking to do long addition with carry, like how the x86 ADC instruction works. This instruction not only adds it's two arguments, but also adds the contents of the carry bit.

One can then add long numbers like the following:

ADD x1 y1
ADC x2 y2
ADC x3 y3
...

Resulting in one instruction per word (disregarding any surrounding moves etc required).

I looked at the GMP library, and how it did long addition itself, in its generic C code. Here's an excerpt from mpn/generic/add_n.c

sl = ul + vl;
cy1 = sl < ul;
rl = sl + cy;
cy2 = rl < sl;
cy = cy1 | cy2;

Note it saves the carry bits from both the original addition and the addition of the carry bit. Only one of these operations can carry, so or'ing the carries afterwards is sufficient.

Obviously GMP has specific assembly code for particular platforms, but I thought the generic code would be a good basis, as it would presumably be written to compile to reasonable code. The plusWord2# primitive operation means I don't need to do silly comparisons to get the carry bit, so I implemented the GMP generic code like the following in Haskell:

addWithCarry :: Word# -> Word# -> Word# -> (# Word#, Word# #)
addWithCarry x y c =
  let 
    (# c1, r1 #) = plusWord2# x y
    (# c2, r2 #) = plusWord2# r1 c
  in
    (# or# c1 c2, r2 #)

Unfortunately this results in x86 code that saves the carry bit into a register before then adding the carry bit on it's own, as well as adding the two numbers, resulting in three or four instructions per word instead of one.

So what I'm wondering is how I can combine llvm.uadd.with.overflow to create a chain of ADC instructions on x86 to implement multi-precision arithmetic. If I got LLVM code that produced efficient long addition I was hoping I could then convert it back into Haskell primitive ops to produce the efficient addition directly from Haskell code.

Notes:

I could of course use Haskell's FFI to call inline assembly or GMP, but that would stop inlining and I suspect be relatively slow as compared to inlined code for small (ie <=256 bit) operands.

I've found that 'clang' has __builtin_addc , a form of three argument addition that takes not only two numbers but a carry bit, but GHC doesn't compile via clang, so I don't see how this is useful.

The correct thing to do here is to make sure that your Haskell frontend represents your carrying arithmetic in LLVM's IR using the same patterns formed by Clang when using its __builtin_addc.

However, don't expect this to produce good code with LLVM today. See http://llvm.org/PR20748 which documents the absolutely horrible code we currently generate on x86 for trivial patterns like this. But once that bug is fixed, you should get the generated code you desire.

链接地址: http://www.djcxy.com/p/43220.html

上一篇: 定制Cabal库（我认为？）

下一篇: 使用LLVM的大数运算（来自Haskell）