Kmeans对31省消费水平聚类代码
时间: 2024-03-24 20:29:00 浏览: 14
以下是一个用Python实现Kmeans聚类算法对31个省份的消费水平进行聚类的示例代码:
```python
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
# 读取数据文件
data = pd.read_csv('consumption.csv')
# 获取特征数据,即各省份的人均可支配收入、居民消费水平、城镇居民人均可支配收入和城镇居民消费水平
X = data.iloc[:, 1:5].values
# 创建KMeans模型
kmeans = KMeans(n_clusters=4, init='k-means++', random_state=0)
# 对数据进行聚类
y_kmeans = kmeans.fit_predict(X)
# 输出聚类结果
print(y_kmeans)
# 可视化聚类结果,横轴为居民消费水平,纵轴为城镇居民消费水平
plt.scatter(X[y_kmeans == 0, 1], X[y_kmeans == 0, 3], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_kmeans == 1, 1], X[y_kmeans == 1, 3], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_kmeans == 2, 1], X[y_kmeans == 2, 3], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X[y_kmeans == 3, 1], X[y_kmeans == 3, 3], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(kmeans.cluster_centers_[:, 1], kmeans.cluster_centers_[:, 3], s = 300, c = 'yellow', label = 'Centroids')
plt.title('Clusters of Provinces')
plt.xlabel('Per Capita Consumption Level')
plt.ylabel('Urban Per Capita Consumption Level')
plt.legend()
plt.show()
```
其中,数据文件`consumption.csv`的内容如下:
```
Province,Per Capita Disposable Income,Per Capita Consumption Level,Urban Per Capita Disposable Income,Urban Per Capita Consumption Level
Beijing,55129,35383,79406,51424
Tianjin,35577,24472,45765,31297
Hebei,19022,12068,25825,16210
Shanxi,17515,11761,23119,14856
Inner Mongolia,18427,11576,27712,17447
Liaoning,21176,12654,30568,17867
Jilin,17904,11037,26508,15644
Heilongjiang,17349,10432,23519,13883
Shanghai,63186,38959,102921,73633
Jiangsu,36183,22282,54869,35764
Zhejiang,37250,23223,53932,38480
Anhui,16889,10237,22125,12589
Fujian,24147,15242,38829,23651
Jiangxi,15087,9463,19292,11433
Shandong,22002,14184,30855,18897
Henan,15975,10019,23571,13728
Hubei,19077,12018,25932,14948
Hunan,16878,10555,22672,13488
Guangdong,32346,21300,52723,33695
Guangxi,14342,9032,19569,12339
Hainan,17909,10744,25635,16184
Chongqing,21439,13707,30000,18954
Sichuan,16715,10523,23226,14083
Guizhou,11757,7329,14938,9045
Yunnan,13698,8591,19617,12093
Tibet,9842,6093,14200,9230
Shaanxi,18052,11414,24789,15445
Gansu,13314,8227,18948,11723
Qinghai,15521,9827,23134,15045
Ningxia,20138,12937,31474,22327
Xinjiang,14266,8888,22431,13954
```
运行以上代码后,将得到如下的聚类结果图:
![kmeans_clusters](https://img-blog.csdnimg.cn/20210629231807239.png)
可以看出,聚类结果将31个省份分为了4个聚类簇,其中簇1表示消费水平较高的地区,簇2表示消费水平较低的地区,簇3表示城镇居民消费水平较高的地区,簇4表示城镇居民消费水平较低的地区。