Iterating over arrays in cython, is list faster than np.array?

TLDR: in cython, why (or when?) is iterating over a numpy array faster than iterating over a python list?

Generally: I've used Cython before and was able to get tremendous speed ups over naive python impl', However, figuring out what exactly needs to be done seems non-trivial.

Consider the following 3 implementations of a sum() function. They reside in a cython file called 'cy' (obviously, there's np.sum(), but that's besides my point..)

Naive python:

def sum_naive(A):
   s = 0
   for a in A:
       s += a
   return s

Cython with a function that expects a python list:

def sum_list(A):
    cdef unsigned long s = 0
    for a in A:
        s += a
    return s

Cython with a function that expects a numpy array.

def sum_np(np.ndarray[np.int64_t, ndim=1] A):
    cdef unsigned long s = 0
    for a in A:
        s += a
    return s

I would expect that in terms of running time, sum_np < sum_list < sum_naive , however, the following script demonstrates to the contrary (for completeness, I added np.sum() )

N = 1000000
v_np = np.array(range(N))
v_list = range(N)

%timeit cy.sum_naive(v_list)
%timeit cy.sum_naive(v_np)
%timeit cy.sum_list(v_list)
%timeit cy.sum_np(v_np)
%timeit v_np.sum()

with results:

In [18]: %timeit cyMatching.sum_naive(v_list)
100 loops, best of 3: 18.7 ms per loop

In [19]: %timeit cyMatching.sum_naive(v_np)
1 loops, best of 3: 389 ms per loop

In [20]: %timeit cyMatching.sum_list(v_list)
10 loops, best of 3: 82.9 ms per loop

In [21]: %timeit cyMatching.sum_np(v_np)
1 loops, best of 3: 1.14 s per loop

In [22]: %timeit v_np.sum()
1000 loops, best of 3: 659 us per loop

What's going on? Why is cython+numpy slow?

PS
I do use
#cython: boundscheck=False
#cython: wraparound=False


There is a better way to implement this in cython that at least on my machine beats np.sum because it avoids type checking and other things that numpy normally has to do when dealing with an arbitrary array:

#cython.wraparound=False
#cython.boundscheck=False
cimport numpy as np

def sum_np(np.ndarray[np.int64_t, ndim=1] A):
    cdef unsigned long s = 0
    for a in A:
        s += a
    return s

def sum_np2(np.int64_t[::1] A):
    cdef:
        unsigned long s = 0
        size_t k

    for k in range(A.shape[0]):
        s += A[k]

    return s

And then the timings:

N = 1000000
v_np = np.array(range(N))
v_list = range(N)

%timeit sum(v_list)
%timeit sum_naive(v_list)
%timeit np.sum(v_np)
%timeit sum_np(v_np)
%timeit sum_np2(v_np)
10 loops, best of 3: 19.5 ms per loop
10 loops, best of 3: 64.9 ms per loop
1000 loops, best of 3: 1.62 ms per loop
1 loops, best of 3: 1.7 s per loop
1000 loops, best of 3: 1.42 ms per loop

You don't want to iterate over the numpy array via the Python style, but rather access elements using indexing as that it can be translated into pure C, rather than relying on the Python API.


a is untyped and thus there will be lots of conversions from Python to C types and back. These can be slow.

JoshAdel correctly pointed out that instead of iterating though, you should iterate over a range. Cython will convert the indexing to C, which is fast.


Using cython -a myfile.pyx will highlight these sorts of things for you; you want all of your loop logic to be white for maximum speed.

PS: Note that np.ndarray[np.int64_t, ndim=1] is outdated and has been deprecated in favour of the faster and more general long[:] .

链接地址: http://www.djcxy.com/p/62928.html

上一篇: 在Python / Numpy中优化许多矩阵操作

下一篇: 迭代cython中的数组,列表比np.array更快吗?