Fast k-Nearest Neighbor classifier build upon ANN, a high efficient
C++
library for nearest neighbor searching.
fastknn(xtr, ytr, xte, k, method = "dist", normalize = NULL)
"dist"
(default) to compute probabilites from
the inverse of the nearest neighbor distances. This method works as
a shrinkage estimator and provides a better predictive performance in general.
Or you can choose "vote"
to compute probabilities from the frequency
of the nearest neighbor labels.normalize=NULL
. Normalization is
recommended if variables are not in the same units. It can be one of the
following:
list
with predictions for the test set:
class
: factor array of predicted classes.
prob
: matrix with predicted probabilities.
There are two estimators for the class membership probabilities:
method="vote"
: The classical estimator based on the label
proportions of the nearest neighbors. This estimator can be thought as of a
voting rule.
method="dist"
: A shrinkage estimator based on the distances
from the nearest neighbors, so that those neighbors more close to the test
observation have more importance on predicting the class label. This
estimator can be thought as of a weighted voting rule. In general,
it reduces log-loss.
## Not run: ------------------------------------ # library("mlbench") # library("caTools") # library("fastknn") # # data("Ionosphere") # # x <- data.matrix(subset(Ionosphere, select = -Class)) # y <- Ionosphere$Class # # set.seed(2048) # tr.idx <- which(sample.split(Y = y, SplitRatio = 0.7)) # x.tr <- x[tr.idx,] # x.te <- x[-tr.idx,] # y.tr <- y[tr.idx] # y.te <- y[-tr.idx] # # knn.out <- fastknn(xtr = x.tr, ytr = y.tr, xte = x.te, k = 10) # # knn.out$class # knn.out$prob ## ---------------------------------------------