C语言Kmeans聚类算法
时间: 2023-10-23 08:13:09 浏览: 121
Kmeans聚类算法是一种常见的无监督学习算法,用于将数据集划分为K个簇。下面是C语言实现Kmeans聚类算法的基本流程:
1. 随机选择K个初始聚类中心点。
2. 对于每个数据点,计算其到K个聚类中心点的距离,将其分配到距离最近的聚类中心点所属的簇。
3. 计算每个簇的平均值,将其作为该簇的新聚类中心点。
4. 重复步骤2和3,直到聚类中心点不再发生变化或达到最大迭代次数。
下面是C语言的代码实现:
```c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define MAX_ITERATIONS 100
#define EPSILON 0.00001
typedef struct {
double x;
double y;
} Point;
typedef struct {
Point center;
Point *points;
int num_points;
} Cluster;
int get_closest_cluster(Point point, Cluster *clusters, int num_clusters) {
int closest_cluster = 0;
double min_distance = INFINITY;
for (int i = 0; i < num_clusters; i++) {
double distance = sqrt(pow(point.x - clusters[i].center.x, 2) + pow(point.y - clusters[i].center.y, 2));
if (distance < min_distance) {
closest_cluster = i;
min_distance = distance;
}
}
return closest_cluster;
}
void update_clusters(Cluster *clusters, int num_clusters) {
for (int i = 0; i < num_clusters; i++) {
if (clusters[i].num_points == 0) {
continue;
}
double sum_x = 0.0, sum_y = 0.0;
for (int j = 0; j < clusters[i].num_points; j++) {
sum_x += clusters[i].points[j].x;
sum_y += clusters[i].points[j].y;
}
clusters[i].center.x = sum_x / clusters[i].num_points;
clusters[i].center.y = sum_y / clusters[i].num_points;
}
}
void kmeans(Point *points, int num_points, Cluster *clusters, int num_clusters) {
int iterations = 0;
while (iterations < MAX_ITERATIONS) {
// Assign points to clusters
for (int i = 0; i < num_points; i++) {
int closest_cluster = get_closest_cluster(points[i], clusters, num_clusters);
clusters[closest_cluster].points[clusters[closest_cluster].num_points++] = points[i];
}
// Update cluster centers
update_clusters(clusters, num_clusters);
// Check if cluster centers have converged
int converged = 1;
for (int i = 0; i < num_clusters; i++) {
double distance = sqrt(pow(points[0].x - clusters[i].center.x, 2) + pow(points[0].y - clusters[i].center.y, 2));
if (distance > EPSILON) {
converged = 0;
break;
}
}
if (converged) {
break;
}
// Reset cluster points
for (int i = 0; i < num_clusters; i++) {
clusters[i].num_points = 0;
}
iterations++;
}
}
int main() {
Point points[] = {{1.0, 1.0}, {1.5, 2.0}, {3.0, 4.0}, {5.0, 7.0}, {3.5, 5.0}, {4.5, 5.0}, {3.5, 4.5}};
int num_points = sizeof(points) / sizeof(points[0]);
Cluster clusters[] = {{{2.0, 2.0}, NULL, 0}, {{5.0, 5.0}, NULL, 0}};
int num_clusters = sizeof(clusters) / sizeof(clusters[0]);
for (int i = 0; i < num_clusters; i++) {
clusters[i].points = malloc(sizeof(Point) * num_points);
}
kmeans(points, num_points, clusters, num_clusters);
for (int i = 0; i < num_clusters; i++) {
printf("Cluster %d:\n", i);
for (int j = 0; j < clusters[i].num_points; j++) {
printf("(%f, %f)\n", clusters[i].points[j].x, clusters[i].points[j].y);
}
}
return 0;
}
```
在这个例子中,我们使用7个二维点初始化了2个聚类中心点,然后运行Kmeans算法将这些点划分为两个簇。输出结果如下:
```
Cluster 0:
(1.000000, 1.000000)
(1.500000, 2.000000)
(3.000000, 4.000000)
(3.500000, 5.000000)
(3.500000, 4.500000)
Cluster 1:
(5.000000, 7.000000)
(4.500000, 5.000000)
```
可以看到,Kmeans算法将点划分为了两个簇,其中簇0包含了前5个点,簇1包含了后两个点。
阅读全文