sided items based on the similarity of consecutive items

2018-06-14 02:43:06

I'm looking for some kind of "Domino sort" algorithm that sorts a list of two-sided items based on the similarity of "tangent" sides of subsequent items.

Suppose the following list where items are represented by 2-tuples:

>>> items
[(0.72, 0.12),
 (0.11, 0.67),
 (0.74, 0.65),
 (0.32, 0.52),
 (0.82, 0.43),
 (0.94, 0.64),
 (0.39, 0.95),
 (0.01, 0.72),
 (0.49, 0.41),
 (0.27, 0.60)]

The goal is to sort that list such that the sum of squared differences of the tangent ends of each two subsequent items (the loss) is minimal:

>>> loss = sum(
...     (items[i][1] - items[i+1][0])**2
...     for i in range(len(items)-1)
... )

For the above example this can be computed by just working through all possible permutations but for lists with more items this becomes quickly unfeasible ( O(n!) ).

The approach of selecting the best match step-by-step as sketched here

def compute_loss(items):
    return sum((items[i][1] - items[i+1][0])**2 for i in range(len(items)-1))


def domino_sort(items):
    best_attempt = items
    best_score = compute_loss(best_attempt)
    for i in range(len(items)):
        copy = [x for x in items]
        attempt = [copy.pop(i)]
        for j in range(len(copy)):
            copy = sorted(copy, key=lambda x: abs(x[0] - attempt[-1][1]))
            attempt.append(copy.pop(0))
        score = compute_loss(attempt)
        if score < best_score:
            best_attempt = attempt
            best_score = score
    return best_attempt, best_score

gives the following result with a loss of 0.1381 :

[(0.01, 0.72),
 (0.72, 0.12),
 (0.11, 0.67),
 (0.74, 0.65),
 (0.49, 0.41),
 (0.39, 0.95),
 (0.94, 0.64),
 (0.82, 0.43),
 (0.32, 0.52),
 (0.27, 0.6)]

This is however not the best solution which would be

[(0.01, 0.72),
 (0.82, 0.43),
 (0.27, 0.6),
 (0.49, 0.41),
 (0.32, 0.52),
 (0.39, 0.95),
 (0.94, 0.64),
 (0.72, 0.12),
 (0.11, 0.67),
 (0.74, 0.65)]

with a loss of 0.0842 . Obviously the above algorithm performs well for the first few items however the differences for the last ones grow so large that they dominate the loss.

Is there any algorithm that can perform this kind of sort in acceptable time dependence (feasible for lists of hundreds of items)?

If it's not possible to do this kind of sort exactly in less than O(n!) are there any approximate approaches that are likely to return a good score (small loss)?

In general, this problem is about finding a Hamiltonian path with a minimum length that is closely related to famous Travelling salesman problem (TSP). And it does not look like a special case of this problem that can be solved in polynomial time.

There is a huge amount of heuristics and approximate algorithms for solving TSP. This wikipedia article could be a good place to start.

Slightly more efficient version of the naive approach using bisect.
(implementation kudos: https://stackoverflow.com/a/12141511/6163736)

# Domino Packing
from bisect import bisect_left
from pprint import pprint


def compute_loss(items):
    return sum((items[i][1] - items[i+1][0])**2 for i in range(len(items)-1))


def find_nearest(values, target):
    """
    Assumes values is sorted. Returns closest value to target.
    If two numbers are equally close, return the smallest number.
    """
    idx = bisect_left(values, target)
    if idx == 0:
        return 0
    if idx == len(values):
        return -1
    before = values[idx - 1]
    after = values[idx]
    if after - target < target - before:
        return idx      # after
    else:
        return idx - 1  # before


if __name__ == '__main__':

    dominos = [(0.72, 0.12),
               (0.11, 0.67),
               (0.74, 0.65),
               (0.32, 0.52),
               (0.82, 0.43),
               (0.94, 0.64),
               (0.39, 0.95),
               (0.01, 0.72),
               (0.49, 0.41),
               (0.27, 0.60)]

    dominos = sorted(dominos, key=lambda x: x[0])
    x_values, y_values = [list(l) for l in zip(*dominos)]
    packed = list()
    idx = 0

    for _ in range(len(dominos)):
        x = x_values[idx]
        y = y_values[idx]
        del x_values[idx]
        del y_values[idx]

        idx = find_nearest(x_values, y)
        packed.append((x, y))

    pprint(packed)
    print("loss :%f" % compute_loss(packed))

output :

[(0.01, 0.72),
 (0.72, 0.12),
 (0.11, 0.67),
 (0.74, 0.65),
 (0.49, 0.41),
 (0.39, 0.95),
 (0.94, 0.64),
 (0.82, 0.43),
 (0.32, 0.52),
 (0.27, 0.6)]
loss :0.138100

The theoretical question was already discussed in other answsers. I tried to improve your "nearest unvisited neighbor" algorithm.

Before I get in the alogrithms, note that you can obviously replace sorted + pop(0) by pop(min_index) :

min_index, _ = min(enumerate(copy), key=lambda i_x: abs(i_x[1][0] - attempt[-1][1]))
attempt.append(copy.pop(min_index))

Method 1: Improve the basic approach

I was guided by a very simple idea: instead of considering only the left side of the next domino to see if it matches the right side of the current sequence, why not add a constraint on its right side too?

I tried this: check if the candidate right side is close to the remaining dominos' left side. I thought it was easier to find the "next next" domino with a right side close to the mean of remaining left sides. Thus I made the following changes to your code:

mean = sum(x[0] for x in copy)/len(copy)
copy = sorted(copy, key=lambda x: abs(x[0] - attempt[-1][1]) + abs(x[1]-mean)) # give a bonus for being close to the mean.

But this was not an improvement. The cumulated loss for 100 random series of 100 items (all values between 0 and 1) was:

nearest unvisited neighbor: 132.73

nearest unvisited neighbor and right side close to mean: 259.13

After some tuning, I tried to transform the bonus into a penalty:

mean = sum(x[0] for x in copy)/len(copy)
copy = sorted(copy, key=lambda x: 2*abs(x[0] - attempt[-1][1]) - abs(x[1]-mean)) # note the 2 times and the minus

This time, there was a clear improvement:

nearest unvisited neighbor: 145.00

nearest unvisited neighbor and right side far from mean: 93.65

But why? I made a little research. Obviously, the original algorithm performs better at the beginning, but the new algorithm "consumes" the big dominos (with a large gap between left and right) and thus performs better at the end.

Hence I focused on the gap:

copy = sorted(copy, key=lambda x: 2*abs(x[0] - attempt[-1][1]) - abs(x[1]-x[0]))

The idea is clear: consume the big dominos before the others. This worked fine:

nearest unvisited neighbor: 132.85

nearest unvisited neighbor and right side far from mean: 90.71

nearest unvisited neighbor and big dominos: 79.51

Method 2: Improve a given sequence

Ok, now a more sophisticated heuristic. I took inspiration from the Lin–Kernighan heuristic. I tried to build sequences of swap meeting the following condition: stop the sequence as soon as the last swap did produce a decrease of local loss for one of the swapped dominos. Every sequence of swap is estimated to find the best.

A code will be clearer than a long explanation:

def improve_sort(items, N=4):
    """Take every pair of dominos and try to build a sequence that will maybe reduce the loss.
    N is the threshold for the size of the subsequence"""
    ret = items
    ret = (items, compute_loss(items))
    for i in range(len(items)):
        for j in range(i+1, len(items)):
            # for every couple of indices
            r = try_to_find_better_swap_sequence(ret, [i, j], N)
            if r[1] < ret[1]:
                ret = r

    return ret

def try_to_find_better_swap_sequence(ret, indices, N):
    """Try to swap dominos until the local loss is greater than in the previous sequence"""
    stack = [(indices, ret[0])] # for an iterative DFS
    while stack:
        indices, items = stack.pop()

        # pop the last indices
        j = indices.pop()
        i = indices.pop()
        # create a copy and swap the i-th and the j-th element
        items2 = list(items)
        items2[i] = items[j]
        items2[j] = items[i]
        loss = compute_loss(items2)
        if loss < ret[1]:
            ret = (items2, loss)
        if len(indices) <= N-3: # at most N indices in the sequence
            # continue if there is at least one local improvement
            if local_loss(items2, i) < local_loss(items, i): # i was improved
                stack.extend((indices+[i,j,k], items2) for k in range(j+1, len(items2)))
            if local_loss(items2, j) < local_loss(items, j): # j was improved
                stack.extend((indices+[j,i,k], items2) for k in range(i+1, len(items2)))

    return ret

def local_loss(items, i):
    """This is the loss on the left and the right of the domino"""
    if i==0:
        return (items[i][1] - items[i+1][0])**2
    elif i==len(items)-1:
        return (items[i-1][1] - items[i][0])**2
    else:
        return (items[i-1][1] - items[i][0])**2+(items[i][1] - items[i+1][0])**2

no sort + improve sort: 46.72

nearest unvisited neighbor and big dominos: 78.37

nearest unvisited neighbor and big dominos + improve_sort: 46.68

Conclusion

The second method is still suboptimal (try it on the original items ). It is obviously slower than the first one but gives far better results and doesn't even need a pre-sort. (Consider using a shuffle to avoid degenerated cases).

You can also take a look at this. The method to score the next possible domino is to shuffle a great number of times the remaining dominos and sum the loss for every shuffle. The minimum cumulated loss will probably give you a good next domino. I did not try...

链接地址: http://www.djcxy.com/p/40184.html

上一篇: 用位快速整数矩阵乘法

下一篇: 基于连续项目的相似性的双面项目