minimizing a multivariate, differentiable function using scipy.optimize

I'm trying to minimize the following function with scipy.optimize :

whose gradient is this:

在这里输入图像描述

(for those who are interested, this is the likelihood function of a Bradley-Terry-Luce model for pairwise comparisons. Very closely linked to logistic regression.)

It is fairly clear that adding a constant to all the parameters does not change the value of the function. Hence, I let theta_1 = 0. Here are the implementation the objective functions and the gradient in python (theta becomes x here):

def objective(x):
    x = np.insert(x, 0, 0.0)
    tiles = np.tile(x, (len(x), 1))
    combs = tiles.T - tiles
    exps = np.dstack((zeros, combs))
    return np.sum(cijs * scipy.misc.logsumexp(exps, axis=2))

def gradient(x):
    zeros = np.zeros(cijs.shape)
    x = np.insert(x, 0, 0.0)
    tiles = np.tile(x, (len(x), 1))
    combs = tiles - tiles.T
    one = 1.0 / (np.exp(combs) + 1)
    two = 1.0 / (np.exp(combs.T) + 1)
    mat = (cijs * one) + (cijs.T * two)
    grad = np.sum(mat, axis=0)
    return grad[1:]  # Don't return the first element

Here's an example of what cijs might look like:

[[ 0  5  1  4  6]
 [ 4  0  2  2  0]
 [ 6  4  0  9  3]
 [ 6  8  3  0  5]
 [10  7 11  4  0]]

This is the code I run to perform the minimization:

x0 = numpy.random.random(nb_items - 1)
# Let's try one algorithm...
xopt1 = scipy.optimize.fmin_bfgs(objective, x0, fprime=gradient, disp=True)
# And another one...
xopt2 = scipy.optimize.fmin_cg(objective, x0, fprime=gradient, disp=True)

However, it always fails in the first iteration:

Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 73.290610
         Iterations: 0
         Function evaluations: 38
         Gradient evaluations: 27

I can't figure out why it fails. The error gets displayed because of this line: https://github.com/scipy/scipy/blob/master/scipy/optimize/optimize.py#L853

So this "Wolfe line search" does not seem to succeed, but I have no idea how to proceed from here... Any help is appreciated!


As @pv. pointed out as a comment, I made a mistake in computing the gradient. First of all, the correct (mathematical) expression for the gradient of my objective function is:

在这里输入图像描述

(notice the minus sign.) Furthermore, my Python implementation was completely wrong, beyond the sign mistake. Here's my updated gradient:

def gradient(x):
    nb_comparisons = cijs + cijs.T
    x = np.insert(x, 0, 0.0)
    tiles = np.tile(x, (len(x), 1))
    combs = tiles - tiles.T
    probs = 1.0 / (np.exp(combs) + 1)
    mat = (nb_comparisons * probs) - cijs
    grad = np.sum(mat, axis=1)
    return grad[1:]  # Don't return the first element.

To debug it , I used:

  • scipy.optimize.check_grad : showed that my gradient function was producing results very far away from an approximated (finite difference) gradient.
  • scipy.optimize.approx_fprime to get an idea of the values should look like.
  • a few hand-picked simple examples that could be analyzed by hand if needed, and a few Wolfram Alpha queries for sanity-checking.

  • It seems you can transform it into a (non-linear) least-square problem. In this way you have to define intervals for each of the n variables and the number of sample points for each variable in order to build the coefficients' matrix.

    In this example I am using the same number of points and the same interval for all the variables:

    from scipy.optimize import leastsq
    from numpy import exp, linspace, zeros, ones
    
    n = 4
    npts = 1000
    xs = [linspace(0, 1, npts) for _ in range(n)]
    
    c = ones(n**2)
    
    a = zeros((n*npts, n**2))
    def residual(c):
        a.fill(0)
        for i in range(n):
            for j in range(n):
                for k in range(npts):
                    a[i+k*n, i*n+j] = 1/(exp(xs[i][k] - xs[j][k]) + 1)
                    a[i+k*n, j*n+i] = 1/(exp(xs[j][k] - xs[i][k]) + 1)
    
        return a.dot(c)
    
    popt, pconv = leastsq(residual, x0=c)
    print(popt.reshape(n, n))
    #[[ -1.24886411   1.07854552  -2.67212118   1.86334625]
    # [ -7.43330057   2.0935734   37.85989442   1.37005925]
    # [ -3.51761322 -37.49627917  24.90538136  -4.23103535]
    # [ 11.93000731   2.52750715 -14.84822686   1.38834225]]
    

    EDIT: more details about the coefficients matrix built above:

    链接地址: http://www.djcxy.com/p/20020.html

    上一篇: 图片上传裁剪调整大小保存到文件夹

    下一篇: 使用scipy.optimize最小化多变量,可微函数