function varargout = kmeans(X, k, varargin)
%KMEANS K-means clustering.
% IDX = KMEANS(X, K) partitions the points in the N-by-P data matrix X
% into K clusters. This partition minimizes the sum, over all clusters, of
% the within-cluster sums of point-to-cluster-centroid distances. Rows of X
% correspond to points, columns correspond to variables. Note: when X is a
% vector, KMEANS treats it as an N-by-1 data matrix, regardless of its
% orientation. KMEANS returns an N-by-1 vector IDX containing the cluster
% indices of each point. By default, KMEANS uses squared Euclidean
% distances.
%
% KMEANS treats NaNs as missing data, and ignores any rows of X that
% contain NaNs.
%
% [IDX, C] = KMEANS(X, K) returns the K cluster centroid locations in
% the K-by-P matrix C.
%
% [IDX, C, SUMD] = KMEANS(X, K) returns the within-cluster sums of
% point-to-centroid distances in the K-by-1 vector sumD.
%
% [IDX, C, SUMD, D] = KMEANS(X, K) returns distances from each point
% to every centroid in the N-by-K matrix D.
%
% [ ... ] = KMEANS(..., 'PARAM1',val1, 'PARAM2',val2, ...) specifies
% optional parameter name/value pairs to control the iterative algorithm
% used by KMEANS. Parameters are:
%
% 'Distance' - Distance measure, in P-dimensional space, that KMEANS
% should minimize with respect to. Choices are:
% 'sqeuclidean' - Squared Euclidean distance (the default)
% 'cityblock' - Sum of absolute differences, a.k.a. L1 distance
% 'cosine' - One minus the cosine of the included angle
% between points (treated as vectors)
% 'correlation' - One minus the sample correlation between points
% (treated as sequences of values)
% 'hamming' - Percentage of bits that differ (only suitable
% for binary data)
%
% 'Start' - Method used to choose initial cluster centroid positions,
% sometimes known as "seeds". Choices are:
% 'plus' - The Default. Select K observations from X according
% to the k-means++ algorithm: the first cluster center
% is chosen uniformly at random from X, after which
% each subsequent cluster center is chosen randomly
% from the remaining data points with probability
% proportional to its distance from the point's
% closest existing cluster center.
% 'sample' - Select K observations from X at random.
% 'uniform' - Select K points uniformly at random from the range
% of X. Not valid for Hamming distance.
% 'cluster' - Perform preliminary clustering phase on random 10%
% subsample of X. This preliminary phase is itself
% initialized using 'sample'.
% matrix - A K-by-P matrix of starting locations. In this case,
% you can pass in [] for K, and KMEANS infers K from
% the first dimension of the matrix. You can also
% supply a 3D array, implying a value for 'Replicates'
% from the array's third dimension.
%
% 'Replicates' - Number of times to repeat the clustering, each with a