Cython either marginally faster or slower than pure Python

I am using several techniques ( NumPy , Weave and Cython ) to perform a Python performance benchmark. What the code basically does mathematically is C = AB , where A, B and C are N x N matrices ( NOTE: this is a matrix product and not an element-wise multiplication).

I have written 5 distinct implementations of the code:

  • Pure python (Loop over 2D Python lists)
  • NumPy (Dot product of 2D NumPy arrays)
  • Weave inline (C++ loop over 2D arrays)
  • Cython (Loop over 2D Python lists + static typing)
  • Cython-Numpy (Loop over 2D NumPy arrays + static typing)
  • My expectation is that implementations 2 through 5 will be substantially faster than implementation 1. My results however indicate otherwise. These are my normalised speed-up results relative to the pure Python implementation:

  • python_list: 1.00
  • numpy_array: 330.09
  • weave_inline: 30.72
  • cython_list: 2.80
  • cython_array: 0.14
  • I am quite happy with the performance of NumPy, however I am less enthusiastic about Weave's performance and Cython's performance makes me cry. My entire code is separated over two files. Everything is automated and you simply need to run the first file to see all results. Could someone please aid me by indicating what I could do to obtain better results?

    matmul.py:

    import time
    
    import numpy as np
    from scipy import weave
    from scipy.weave import converters
    
    import pyximport
    pyximport.install()
    import cython_matmul as cml
    
    
    def python_list_matmul(A, B):
        C = np.zeros(A.shape, dtype=float).tolist()
        A = A.tolist()
        B = B.tolist()
        for k in xrange(len(A)):
            for i in xrange(len(A)):
                for j in xrange(len(A)):
                    C[i][k] += A[i][j] * B[j][k]
        return C
    
    
    def numpy_array_matmul(A, B):
        return np.dot(A, B)
    
    
    def weave_inline_matmul(A, B):
        code = """
           int i, j, k;
           for (k = 0; k < N; ++k)
           {
               for (i = 0; i < N; ++i)
               {
                   for (j = 0; j < N; ++j)
                   {
                       C(i, k) += A(i, j) * B(j, k);
                   }
               }
           }
           """
    
        C = np.zeros(A.shape, dtype=float)
        weave.inline(code, ['A', 'B', 'C', 'N'], type_converters=converters.blitz, compiler='gcc')
        return C
    
    
    N = 100
    A = np.random.rand(N, N)
    B = np.random.rand(N, N)
    
    function = []
    function.append([python_list_matmul, 'python_list'])
    function.append([numpy_array_matmul, 'numpy_array'])
    function.append([weave_inline_matmul, 'weave_inline'])
    function.append([cml.cython_list_matmul, 'cython_list'])
    function.append([cml.cython_array_matmul, 'cython_array'])
    
    t = []
    for i in xrange(len(function)):
        t1 = time.time()
        C = function[i][0](A, B)
        t2 = time.time()
        t.append(t2 - t1)
        print function[i][1] + ' t: ' + '{:10.6f}'.format(t[0] / t[-1])
    

    cython_matmul.pyx:

    import numpy as np
    cimport numpy as np
    
    import cython
    cimport cython
    
    DTYPE = np.float
    ctypedef np.float_t DTYPE_t
    
    
    @cython.boundscheck(False)
    @cython.wraparound(False)
    @cython.nonecheck(False)
    cpdef cython_list_matmul(A, B):
    
        cdef int i, j, k
        cdef int N = len(A)
    
        A = A.tolist()
        B = B.tolist()
        C = np.zeros([N, N]).tolist()
    
        for k in xrange(N):
            for i in xrange(N):
                for j in xrange(N):
                    C[i][k] += A[i][j] * B[j][k]
        return C
    
    
    @cython.boundscheck(False)
    @cython.wraparound(False)
    @cython.nonecheck(False)
    cpdef cython_array_matmul(np.ndarray[DTYPE_t, ndim=2] A, np.ndarray[DTYPE_t, ndim=2] B):
    
        cdef int i, j, k, N = A.shape[0]
        cdef np.ndarray[DTYPE_t, ndim=2] C = np.zeros([N, N], dtype=DTYPE)
    
        for k in xrange(N):
            for i in xrange(N):
                for j in xrange(N):
                    C[i][k] += A[i][j] * B[j][k]
        return C
    

    Python lists and high performance math are incompatible, forget about cython_list_matmul .

    The only problem with your cython_array_matmul is incorrect usage of indexing. It should be

    C[i,k] += A[i,j] * B[j,k]
    

    That's how numpy arrays are indexed in Python and that's the syntax Cython optimizes. With this change you should get decent performance.

    Cython's annotation feature is really helpful in spotting optimization problems like this one. You could notice that A[i][j] produces a ton of Python API calls, while A[i,j] produces none.

    Also, if you initialize all entries by hand, np.empty is more appropriate than np.zeros .

    链接地址: http://www.djcxy.com/p/62926.html

    上一篇: 迭代cython中的数组,列表比np.array更快吗?

    下一篇: Cython比纯Python要稍快或慢一些