trivial sums of outer products without temporaries in numpy

2018-06-25 22:58:14

The actual problem I wish to solve is, given a set of N unit vectors and another set of M vectors calculate for each of the unit vectors the average of the absolute value of the dot product of it with every one of the M vectors. Essentially this is calculating the outer product of the two matrices and summing and averaging with an absolute value stuck in-between.

For N and M not too large this is not hard and there are many ways to proceed (see below). The problem is when N and M are large the temporaries created are huge and provide a practical limitation for the provided approach. Can this calculation be done without creating temporaries? The main difficulty I have is due to the presence of the absolute value. Are there general techniques for "threading" such calculations?

As an example consider the following code

N = 7
M = 5

# Create the unit vectors, just so we have some examples,
# this is not meant to be elegant
phi = np.random.rand(N)*2*np.pi
ctheta = np.random.rand(N)*2 - 1
stheta = np.sqrt(1-ctheta**2)
nhat = np.array([stheta*np.cos(phi), stheta*np.sin(phi), ctheta]).T

# Create the other vectors
m = np.random.rand(M,3)

# Calculate the quantity we desire, here using broadcasting.
S = np.average(np.abs(np.sum(nhat*m[:,np.newaxis,:], axis=-1)), axis=0)

This is great, S is now an array of length N and contains the desired results. Unfortunately in the process we have created some potentially huge arrays. The result of

np.sum(nhat*m[:,np.newaxis,:], axis=-1)

is a MXN array. The final result, of course, is only of size N. Start increasing the sizes of N and M and we quickly run into a memory error.

As noted above, if the absolute value were not required then we could proceed as follows, now using einsum()

T = np.einsum('ik,jk,j', nhat, m, np.ones(M)) / M

This works and works quickly even for quite large N and M . For the specific problem I need to include the abs() but a more general solution (perhaps a more general ufunc) would also be of interest.

Based on some of the comments it seems that using cython is the best way to go. I have foolishly never looked into using cython. It turns out to be relatively easy to produce working code.

After some searching I put together the following cython code. This is not the most general code, probably not the best way to write it, and can probably be made more efficient. Even so, it is only about 25% slower than the einsum() code in the original question so it isn't too bad! It has been written to work explicitly with arrays created as done in the original question (hence the assumed modes of the input arrays).
Despite the caveats it does provide a reasonably efficient solution to the original problem and can serve as a starting point in similar situations.

import numpy as np
cimport numpy as np
import cython
DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
cdef inline double d_abs (double a) : return a if a >= 0 else -a

@cython.boundscheck(False)
@cython.wraparound(False)
def process_vectors (np.ndarray[DTYPE_t, ndim=2, mode="fortran"] nhat not None,
                     np.ndarray[DTYPE_t, ndim=2, mode="c"] m not None) :
    if nhat.shape[1] != m.shape[1] :
        raise ValueError ("Arrays must contain vectors of the same dimension")
    cdef Py_ssize_t imax = nhat.shape[0]
    cdef Py_ssize_t jmax = m.shape[0]
    cdef Py_ssize_t kmax = nhat.shape[1] # same as m.shape[1]
    cdef np.ndarray[DTYPE_t, ndim=1] S = np.zeros(imax, dtype=DTYPE)
    cdef Py_ssize_t i, j, k
    cdef DTYPE_t val, tmp
    for i in range(imax) :
        val = 0
        for j in range(jmax) :
            tmp = 0
            for k in range(kmax) :
                tmp += nhat[i,k] * m[j,k]
            val += d_abs(tmp)
        S[i] = val / jmax
    return S

I don't think there is any easy way (outside of Cython and the like) to speed up your exact operation. But you may want to consider whether you really need to calculate what you are calculating. For if instead of the mean of the absolute values you could use the root mean square, you would still be somehow averaging magnitudes of inner products, but you could get it in a single shot as:

rms = np.sqrt(np.einsum('ij,il,kj,kl,k->i', nhat, nhat, m, m, np.ones(M)/M))

This is the same as doing:

rms_2 = np.sqrt(np.average(np.einsum('ij,kj->ik', nhat, m)**2, axis=-1))

Yes, it is not exactly what you asked for, but I am afraid it is as close as you will get with a vectorized approach. If you decide to go down this road, see how well np.einsum performs for large N and M : it has a tendency to bog down when passed too many parameters and indices.

This is quite a bit slower, but doesn't create the large intermediate matrix.

vals = np.zeros(N)
for i in xrange(N):
    u = nhat[i]
    for v in m:
        vals[i]+=abs(np.dot(u,v))
    vals[i]=vals[i]/M

edit: moved dividing by M outside of for loop.

edit2: new idea, keeping old one for posterity and relevant comment.

m2 = np.average(m,0)
vals = np.zeros(N)
for i in xrange(N):
    u=nhat[i]
    vals[i]=abs(np.dot(u,m2))

This is fast, but gives different values sometimes, I am working on why but maybe it can help in the mean time.

edit 3: Ah, it's the absolute value thing. hmm

>>> S
array([ 0.28620962,  0.65337876,  0.37470707,  0.46500913,  0.49579837,
        0.29348924,  0.27444208,  0.74586928,  0.35789315,  0.3079964 ,
        0.298353  ,  0.42571445,  0.32535728,  0.87505053,  0.25547394,
        0.23964505,  0.44773271,  0.25235646,  0.4722281 ,  0.33003338])
>>> vals
array([ 0.2099343 ,  0.6532155 ,  0.33039334,  0.45366889,  0.48921527,
        0.20467291,  0.16585856,  0.74586928,  0.31234917,  0.22198642,
        0.21013519,  0.41422894,  0.26020981,  0.87505053,  0.1199069 ,
        0.06542492,  0.44145805,  0.08455833,  0.46824704,  0.28483342])

time to compute S: 0.000342130661011 seconds
time to compute vals: 7.29560852051e-05 seconds

edit 4: Well, if you have mostly positive values for your unit vectors this should run quicker, assuming the vectors in m are always positive like they are in your dummy data.

m2 = np.average(m,0)
vals = np.zeros(N)
for i in xrange(N):
    u=nhat[i]
    if u[0] >= 0 and u[1] >= 0 and u[2] >= 0:
        vals[i] = abs(np.dot(u,m2))
    else:
        for j in xrange(M):
            vals[i]+=abs(np.dot(u,m[j]))
        vals[i]/=M

链接地址: http://www.djcxy.com/p/72618.html

上一篇: 在Ruby中打开嵌套的模块异常

下一篇: 没有临时装饰的外层产品的小数目