解释:% 'Distance' - Distance measure, in P-dimensional space, that KMEANS % should minimize with respect to. Choices are: % {'sqEuclidean'} - Squared Euclidean distance (the default) % 'cosine' - One minus the cosine of the included angle % between points (treated as vectors). Each % row of X SHOULD be normalized to unit. If % the intial center matrix is provided, it % SHOULD also be normalized. % % 'Start' - Method used to choose initial cluster centroid positions, % sometimes known as "seeds". Choices are: % {'sample'} - Select K observations from X at random (the default) % 'cluster' - Perform preliminary clustering phase on random 10% % subsample of X. This preliminary phase is itself % initialized using 'sample'. An additional parameter % clusterMaxIter can be used to control the maximum % number of iterations in each preliminary clustering % problem. % matrix - A K-by-P matrix of starting locations; or a K-by-1 % indicate vector indicating which K points in X % should be used as the initial center. In this case, % you can pass in [] for K, and KMEANS infers K from % the first dimension of the matrix. % % 'MaxIter' - Maximum number of iterations allowed. Default is 100. % % 'Replicates' - Number of times to repeat the clustering, each with a % new set of initial centroids. Default is 1. If the % initial centroids are provided, the replicate will be % automatically set to be 1. % % 'clusterMaxIter' - Only useful when 'Start' is 'cluster'. Maximum number % of iterations of the preliminary clustering phase. % Default is 10. %
时间: 2023-06-26 22:04:05 浏览: 108
Dealing with Undesirable Outputs in DEA: A Slacks-based Measure(SBM) Approach
这段代码是关于K-means聚类算法的参数说明。其中包括:
1. Distance:距离度量方法,可选参数为欧氏距离的平方(默认)或余弦相似度。
2. Start:初始质心选择方法,可选参数为从数据集中随机选择k个样本作为初始质心(默认)、对数据集的10%进行初步的聚类,然后从每个簇中选取一个样本作为初始质心,或者手动指定初始质心的位置。
3. MaxIter:最大迭代次数,默认为100。
4. Replicates:重复聚类的次数,每次使用不同的初始质心,默认为1。如果手动指定了初始质心,则自动将Replicates设置为1。
5. clusterMaxIter:只有在Start为cluster时有用,指定初步聚类的最大迭代次数,默认为10。
阅读全文