不使用sklearn库函数,对乳腺癌数据集进行kmeans聚类分析并可视化,聚类设置簇的个数为2
时间: 2024-05-13 15:18:28 浏览: 92
好的,我可以给您提供代码实现。首先,需要先加载乳腺癌数据集,这里我们可以使用sklearn自带的乳腺癌数据集。代码如下:
```python
from sklearn.datasets import load_breast_cancer
import numpy as np
data = load_breast_cancer()
X = data.data
```
接下来,我们需要实现kmeans算法。这里我们采用欧几里得距离作为距离度量。代码如下:
```python
def calc_distance(point, centroid):
return np.sqrt(np.sum((point - centroid) ** 2))
def kmeans(X, k, max_iter=100):
centroids = X[np.random.choice(len(X), k, replace=False)]
labels = np.zeros(len(X))
for _ in range(max_iter):
for i, point in enumerate(X):
distances = []
for j, centroid in enumerate(centroids):
distances.append(calc_distance(point, centroid))
label = np.argmin(distances)
labels[i] = label
for j in range(k):
centroids[j] = np.mean(X[labels == j], axis=0)
return labels, centroids
```
最后,我们将聚类结果可视化。代码如下:
```python
import matplotlib.pyplot as plt
labels, centroids = kmeans(X, k=2)
plt.scatter(X[labels == 0][:, 0], X[labels == 0][:, 1], c='r', marker='o', label='cluster 1')
plt.scatter(X[labels == 1][:, 0], X[labels == 1][:, 1], c='b', marker='s', label='cluster 2')
plt.scatter(centroids[:, 0], centroids[:, 1], c='k', marker='*', s=150, label='centroids')
plt.legend()
plt.show()
```
这里我们只选取了前两个特征进行可视化。完整代码如下:
阅读全文