cython memoryview slower than expected

I've started using memoryviews in cython to access numpy arrays. One of the various advantages they have is that they are considerably faster than the old numpy buffer support: http://docs.cython.org/src/userguide/memoryviews.html#comparison-to-the-old-buffer-support

However, I have an example where the old numpy buffer support is faster than memoryviews! How can this be?! I wonder if I'm using memoryviews correctly?

This is my test:

import numpy as np
cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
cpdef np.ndarray[np.uint8_t, ndim=2] image_box1(np.ndarray[np.uint8_t, ndim=2] im, 
                                               np.ndarray[np.float64_t, ndim=1] pd,  
                                               int box_half_size):
    cdef unsigned int p0 = <int>(pd[0] + 0.5)  
    cdef unsigned int p1 = <int>(pd[1] + 0.5)    
    cdef unsigned int top = p1 - box_half_size
    cdef unsigned int left = p0 - box_half_size
    cdef unsigned int bottom = p1 + box_half_size
    cdef unsigned int right = p0 + box_half_size    
    cdef np.ndarray[np.uint8_t, ndim=2] box = im[top:bottom, left:right] 
    return box 

@cython.boundscheck(False)
@cython.wraparound(False)
cpdef np.uint8_t[:, ::1] image_box2(np.uint8_t[:, ::1] im, 
                                    np.float64_t[:] pd,  
                                    int box_half_size):

    cdef unsigned int p0 = <int>(pd[0] + 0.5)  
    cdef unsigned int p1 = <int>(pd[1] + 0.5)    
    cdef unsigned int top = p1 - box_half_size
    cdef unsigned int left = p0 - box_half_size
    cdef unsigned int bottom = p1 + box_half_size
    cdef unsigned int right = p0 + box_half_size     
    cdef np.uint8_t[:, ::1] box = im[top:bottom, left:right]   
    return box 

The timing results are:

image_box1: typed numpy: 100000 loops, best of 3: 11.2 us per loop

image_box2: memoryview: 100000 loops, best of 3: 18.1 us per loop

These measurements are done from IPython using %timeit image_box1(im, pd, box_half_size)


Alright! I found the problem. As seberg pointed out the memoryviews appeared slower because the measurement included the automatic conversion from numpy array to memoryview.

I used the following function to measure the times from within the cython module:

def test(params):   
    import timeit
    im = params[0]
    pd = params[1]
    box_half_size = params[2]
    t1 = timeit.Timer(lambda: image_box1(im, pd, box_half_size))
    print 'image_box1: typed numpy:'
    print min(t1.repeat(3, 10))
    cdef np.uint8_t[:, ::1] im2 = im
    cdef np.float64_t[:] pd2 = pd
    t2 = timeit.Timer(lambda: image_box2(im2, pd2, box_half_size))
    print 'image_box2: memoryview:'
    print min(t2.repeat(3, 10)) 

result:

image_box1: typed numpy: 9.07607864065e-05

image_box2: memoryview: 5.81799904467e-05

So memoryviews are indeed faster!

Note that I converted im and pd to memoryviews before calling image_box2. If I don't do this step and I pass im and pd directly, then image_box2 is slower:

image_box1: typed numpy: 9.12262257771e-05

image_box2: memoryview: 0.000185245087778

链接地址: http://www.djcxy.com/p/62922.html

上一篇: 如何使用cython创建自定义numpy dtype

下一篇: cython memoryview比预期慢