K-means++多维数据聚类分析Matlab代码及其模型性能评估
时间: 2024-05-07 20:21:54 浏览: 7
以下是K-means++多维数据聚类分析的Matlab代码,以及模型性能评估方法:
```matlab
% Load data
data = load('data.mat');
% Number of clusters to find
k = 3;
% Initialize centroids using K-means++
centroids = init_centroids(data, k);
% Run K-means algorithm
for i = 1:100
% Assign each data point to the nearest centroid
idx = find_closest_centroids(data, centroids);
% Compute new centroids based on the assigned data points
centroids = compute_centroids(data, idx, k);
end
% Plot the clusters
plot_clusters(data, idx, k);
% Function to initialize centroids using K-means++
function centroids = init_centroids(data, k)
% Randomly choose the first centroid
centroids = data(randi(size(data, 1)), :);
% Choose each subsequent centroid with probability proportional to its
% distance from the closest existing centroid
for i = 2:k
dists = pdist2(data, centroids);
[~, D] = min(dists, [], 2);
P = min(D, [], 1).^2;
P = P / sum(P);
centroids(i, :) = data(find(rand < cumsum(P), 1), :);
end
end
% Function to assign each data point to the nearest centroid
function idx = find_closest_centroids(data, centroids)
dists = pdist2(data, centroids);
[~, idx] = min(dists, [], 2);
end
% Function to compute new centroids based on the assigned data points
function centroids = compute_centroids(data, idx, k)
for i = 1:k
centroids(i, :) = mean(data(idx == i, :), 1);
end
end
% Function to plot the clusters
function plot_clusters(data, idx, k)
colors = ['r', 'g', 'b', 'y', 'm', 'c'];
hold on;
for i = 1:k
scatter(data(idx == i, 1), data(idx == i, 2), colors(i));
end
hold off;
end
```
对于模型性能评估,可以使用以下指标:
1. SSE(Sum of Squared Errors):表示所有数据点到其所属簇中心的距离之和,越小越好;
2. Silhouette coefficient:表示聚类结果的紧密度和分离度,范围在-1到1之间,越接近1越好。
可以使用以下代码计算这些指标:
```matlab
% SSE
dists = pdist2(data, centroids);
[~, idx] = min(dists, [], 2);
sse = sum(min(dists, [], 2));
% Silhouette coefficient
s = silhouette(data, idx);
sc = mean(s);
```