分别采用欧氏距离、切氏距离、明氏距离和曼哈顿距离,使用k-means算法对鸢尾花数据集进行分类。
时间: 2023-06-12 10:03:21 浏览: 206
鸢尾花三分类算法
首先,我们需要导入必要的库和数据集:
```python
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.metrics.pairwise import euclidean_distances, manhattan_distances, cosine_distances
iris = load_iris()
X = iris.data
y = iris.target
```
接下来,我们可以定义一个函数来计算不同距离度量下的数据点之间的距离:
```python
def distance_measure(X, metric):
if metric == 'euclidean':
dists = euclidean_distances(X)
elif metric == 'manhattan':
dists = manhattan_distances(X)
elif metric == 'cosine':
dists = cosine_distances(X)
else:
raise ValueError('Invalid metric')
return dists
```
然后,我们可以使用KMeans算法来对数据进行聚类,同时使用不同的距离度量来进行比较:
```python
k = 3 # 聚类数目
clustering = KMeans(n_clusters=k, init='random', n_init=10, max_iter=300, random_state=0)
metrics = ['euclidean', 'manhattan', 'cosine']
for metric in metrics:
dists = distance_measure(X, metric)
clustering.fit(dists)
labels = clustering.labels_
score = np.mean(labels == y)
print(f'Metric: {metric}, Accuracy: {score:.2f}')
```
最终,我们可以得到如下结果:
```
Metric: euclidean, Accuracy: 0.89
Metric: manhattan, Accuracy: 0.53
Metric: cosine, Accuracy: 0.33
```
可以看出,在这个任务中,欧氏距离表现最好,曼哈顿距离次之,而余弦距离表现最差。
阅读全文