kmeans聚类算法莺尾花c语言
时间: 2023-09-11 20:08:29 浏览: 43
K-means聚类算法是一种常用的无监督学习算法,用于将样本数据划分到K个不同的簇中。下面是一个用C语言实现K-means聚类算法的例子,以对鸢尾花数据进行聚类:
```c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define MAX_ITERATIONS 100
#define NUM_CLUSTERS 3
typedef struct {
double x;
double y;
} Point;
typedef struct {
Point centroid;
int num_points;
} Cluster;
double calculate_distance(Point p1, Point p2) {
return sqrt(pow(p2.x - p1.x, 2) + pow(p2.y - p1.y, 2));
}
int find_nearest_cluster(Point point, Cluster* clusters) {
int nearest_cluster_index = 0;
double min_distance = calculate_distance(point, clusters[0].centroid);
for (int i = 1; i < NUM_CLUSTERS; i++) {
double distance = calculate_distance(point, clusters[i].centroid);
if (distance < min_distance) {
min_distance = distance;
nearest_cluster_index = i;
}
}
return nearest_cluster_index;
}
void update_centroids(Cluster* clusters) {
for (int i = 0; i < NUM_CLUSTERS; i++) {
clusters[i].centroid.x /= clusters[i].num_points;
clusters[i].centroid.y /= clusters[i].num_points;
}
}
void kmeans(Point* points, Cluster* clusters, int num_points) {
int iterations = 0;
int changed;
do {
changed = 0;
for (int i = 0; i < NUM_CLUSTERS; i++) {
clusters[i].centroid.x = 0;
clusters[i].centroid.y = 0;
clusters[i].num_points = 0;
}
for (int i = 0; i < num_points; i++) {
int nearest_cluster_index = find_nearest_cluster(points[i], clusters);
clusters[nearest_cluster_index].centroid.x += points[i].x;
clusters[nearest_cluster_index].centroid.y += points[i].y;
clusters[nearest_cluster_index].num_points++; }
update_centroids(clusters);
for (int i = 0; i < num_points; i++) {
int nearest_cluster_index = find_nearest_cluster(points[i], clusters);
if (points[i].x != clusters[nearest_cluster_index].centroid.x || points[i].y != clusters[nearest_cluster_index].centroid.y) {
changed = 1;
points[i].x = clusters[nearest_cluster_index].centroid.x;
points[i].y = clusters[nearest_cluster_index].centroid.y;
}
}
iterations++;
} while (changed && iterations < MAX_ITERATIONS);
}
int main() {
Point points[] = {{5.1, 3.5}, {4.9, 3.0}, {4.7, 3.2}, {4.6, 3.1}, {5.0, 3.6}, {5.4, 3.9},
{4.6, 3.4}, {5.0, 3.4}, {4.4, 2.9}, {4.9, 3.1}, {5.4, 3.7}, {4.8, 3.4},
{4.8, 3.0}, {4.3, 3.0}, {5.8, 4.0}, {5.7, 4.4}, {5.4, 3.9}, {5.1, 3.5},
{5.7, 3.8}, {5.1, 3.8}, {5.4, 3.4}, {5.1, 3.7}, {4.6, 3.6}, {5.1, 3.3}};
Cluster clusters[NUM_CLUSTERS];
for (int i = 0; i < NUM_CLUSTERS; i++) {
clusters[i].centroid.x = points[i].x;
clusters[i].centroid.y = points[i].y;
}
kmeans(points, clusters, sizeof(points) / sizeof(points[0]));
printf("Final cluster centroids:\n");
for (int i = 0; i < NUM_CLUSTERS; i++) {
printf("Cluster %d: (%f, %f)\n", i + 1, clusters[i].centroid.x, clusters[i].centroid.y);
}
return 0;
}
```
这个例子中,我们使用了鸢尾花数据集中的部分样本,并将其划分到三个不同的簇中。在算法的迭代过程中,我们计算每个样本点与各个簇的距离,然后将其分配给距离最近的簇,并更新簇的中心点。重复这一过程直到算法收敛或达到最大迭代次数。
请注意,这只是一个简单的示例实现,可能需要根据实际需求进行适当的修改和优化。