Find K nearest neighbors, starting from a distance matrix

I'm looking for a well-optimized function that accepts an n X n distance matrix and returns an n X k matrix with the indices of the k nearest neighbors of the ith datapoint in the ith row.

I find a gazillion different R packages that let you do KNN, but they all seem to include the distance computations along with the sorting algorithm within the same function. In particular, for most routines the main argument is the original data matrix, not a distance matrix. In my case, I'm using a nonstandard distance on mixed variable types, so I need to separate the sorting problem from the distance computations.

This is not exactly a daunting problem -- I obviously could just use the order function inside a loop to get what I want (see my solution below), but this is far from optimal. For example, the sort function with partial = 1:k when k is small (less than 11) goes much faster, but unfortunately returns only sorted values rather than the desired indices.


Try to use FastKNN CRAN package (although it is not well documented). It offers k.nearest.neighbors function where an arbitrary distance matrix can be given. Below you have an example that computes the matrix you need.

# arbitrary data
train <- matrix(sample(c("a","b","c"),12,replace=TRUE), ncol=2) # n x 2
n = dim(train)[1]
distMatrix <- matrix(runif(n^2,0,1),ncol=n) # n x n

# matrix of neighbours
k=3
nn = matrix(0,n,k) # n x k
for (i in 1:n)
   nn[i,] = k.nearest.neighbors(i, distMatrix, k = k)

Notice: You can always check Cran packages list for Ctrl+F='knn' related functions: https://cran.r-project.org/web/packages/available_packages_by_name.html


For the record (I won't mark this as the answer), here is a quick-and-dirty solution. Suppose sd.dist is the special distance matrix. Suppose k.for.nn is the number of nearest neighbors.

n = nrow(sd.dist)
knn.mat = matrix(0, ncol = k.for.nn, nrow = n)
knd.mat = knn.mat
for(i in 1:n){
  knn.mat[i,] = order(sd.dist[i,])[1:k.for.nn]
  knd.mat[i,] = sd.dist[i,knn.mat[i,]]
}

Now knn.mat is the matrix with the indices of the k nearest neighbors in each row, and for convenience knd.mat stores the corresponding distances.

链接地址: http://www.djcxy.com/p/78880.html

上一篇: C#中的ValueType数组转到堆或堆栈?

下一篇: 从距离矩阵开始查找K个最近的邻居