import pandas as pd from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score, confusion_matrix,classification_report, roc_curve, auc import seaborn as sns import matplotlib.pyplot as plt # 读取数据 data = pd.read_excel('E:/桌面/预测脆弱性/20230523/预测样本/预测样本.xlsx') # 分割训练集和验证集 train_data = data.sample(frac=0.8, random_state=1) test_data = data.drop(train_data.index) # 定义特征变量和目标变量 features = ['高程', '起伏度', '桥梁长', '道路长', '平均坡度', '平均地温', 'T小于0', '相态'] target = '交通风险' # 训练随机森林模型 rf = RandomForestClassifier(n_estimators=100, random_state=1) rf.fit(train_data[features], train_data[target]) # 在验证集上进行预测并计算精度、召回率和F1值等指标 pred = rf.predict(test_data[features]) accuracy = accuracy_score(test_data[target], pred) confusion_mat = confusion_matrix(test_data[target], pred) classification_rep = classification_report(test_data[target], pred) print('Accuracy:', accuracy) print('Confusion matrix:') print(confusion_mat) print('Classification report:') print(classification_rep) # 输出混淆矩阵图片 sns.heatmap(confusion_mat, annot=True, cmap="Blues") plt.show() # 读取新数据文件并预测结果 new_data = pd.read_excel('E:/桌面/预测脆弱性/20230523/预测样本/预测结果/交通风险预测096.xlsx') new_pred = rf.predict(new_data[features]) new_data['交通风险预测结果'] = new_pred new_data.to_excel('E:/桌面/预测脆弱性/20230523/预测样本/预测结果/交通风险预测096结果.xlsx', index=False)输出混淆矩阵图片以及各分类精度

时间: 2024-02-18 19:59:01 浏览: 56

import pandas as pd.docx

根据提供的文档内容，我们可以总结出以下几个关键的知识点： ### 一、Pandas 库的基本用法 #### 1. 导入 Pandas 和其他库 ```python import pandas as pd import numpy as np ``` - **Pandas**: 一个强大的数据分析与处理库。 - **NumPy**: 用于进行数值计算的 Python 库。 #### 2. 读取 CSV 文件到 DataFrame ```python wine_data = pd.read_csv('/content/Wine_Dataset.csv') ``` - **pd.read_csv()**: 用于从 CSV 文件读取数据，并创建一个 DataFrame 对象。 - **路径**: `/content/Wine_Dataset.csv` 指定了 CSV 文件的位置。 #### 3. 显示 DataFrame 的前几行 ```python print(wine_data.head()) ``` - **DataFrame.head()**: 默认显示 DataFrame 的前五行。 - **用途**: 快速检查数据的一般结构。 #### 4. 获取 DataFrame 的描述性统计信息 ```python print(wine_data.describe()) ``` - **DataFrame.describe()**: 提供数据集的描述性统计信息，包括计数、平均值、标准差等。 #### 5. 检查缺失值 ```python print(wine_data.isnull().sum()) ``` - **DataFrame.isnull()**: 检查 DataFrame 中的每一项是否为 NaN。 - **DataFrame.sum()**: 对每列的缺失值计数。 ### 二、数据可视化与分析 #### 1. 计算相关矩阵 ```python correlation_matrix = wine_data.corr() ``` - **DataFrame.corr()**: 计算 DataFrame 中各列之间的相关系数。 #### 2. 可视化相关矩阵 ```python sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f') ``` - **Seaborn**: 一个基于 Matplotlib 的 Python 数据可视化库。 - **Heatmap**: 一种用来展示二维数据的图表，这里用于展示相关矩阵。 #### 3. 异常值检测（Z 分数方法） ```python from scipy import stats z_scores = np.abs(stats.zscore(wine_data['alcohol'])) threshold = 3 outlier_indices = np.where(z_scores > threshold) ``` - **scipy.stats.zscore()**: 计算数据的标准分数。 - **阈值**: 在这里设置为 3，意味着任何标准分数大于 3 的都被认为是异常值。 - **np.where()**: 返回满足条件的索引。 #### 4. 可视化异常值 ```python plt.scatter(range(len(wine_data['alcohol'])), wine_data['alcohol']) plt.scatter(outlier_indices, wine_data['alcohol'].iloc[outlier_indices], color='r', label='Outliers') ``` - **Matplotlib**: 一个 Python 的绘图库。 - **散点图**: 显示酒精含量与索引的关系，并突出显示异常值。 ### 三、机器学习应用 #### 1. K-Means 聚类 ```python from sklearn.cluster import KMeans kmeans = KMeans(n_clusters=3, random_state=42) wine_data['segment'] = kmeans.fit_predict(wine_data) ``` - **KMeans**: 一种常用的聚类算法。 - **n_clusters**: 指定聚类的数量，在这里设置为 3。 - **fit_predict()**: 进行聚类并返回每个样本所属的聚类标签。 #### 2. 可视化聚类结果 ```python plt.scatter(wine_data['alcohol'], wine_data['sulphates'], c=wine_data['segment'], cmap='viridis') ``` - **散点图**: 显示基于酒精含量和二氧化硫含量的数据点，并按聚类标签着色。 #### 3. 随机森林回归 ```python from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error X = wine_data.drop('alcohol', axis=1) y = wine_data['alcohol'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) model = RandomForestRegressor(random_state=42) model.fit(X_train, y_train) predictions = model.predict(X_test) mse = mean_squared_error(y_test, predictions) ``` - **RandomForestRegressor**: 一个集成学习方法，用于回归任务。 - **train_test_split()**: 将数据集划分为训练集和测试集。 - **mean_squared_error()**: 计算预测值与真实值之间的均方误差。以上是对给定代码片段中的主要知识点的详细介绍。通过这些内容，我们不仅了解了如何使用 Pandas 处理和分析数据，还学习了如何使用 Seaborn 和 Matplotlib 进行数据可视化，并利用 Scikit-Learn 实现了简单的机器学习任务。这些技能对于进行数据分析和构建预测模型至关重要。

好的，你想让我输出混淆矩阵图片以及各分类精度，这段代码中已经计算出了混淆矩阵和分类报告，可以直接使用这些结果来输出混淆矩阵图片和各分类精度。以下是代码实现： ``` # 输出混淆矩阵图片 sns.heatmap(confusion_mat, annot=True, cmap="Blues") plt.show() # 输出各分类精度 print("各分类精度：") print(classification_rep) ``` 你可以将这段代码添加到原有代码中，运行后即可输出混淆矩阵图片和各分类精度。

阅读全文

相关推荐

Python库 | sklearn_cv_pandas-0.0.4.tar.gz

python中时间转换datetime和pd.to_datetime详析

Demystifying the Confusion Matrix: How to Evaluate the Actual Performance of Classification Models

Comprehensive Analysis of Model Evaluation Metrics: How to Choose the Best Model and Optimize ...

Challenges and Solutions for Multi-Label Classification Problems: 5 Strategies to Help You Overcome ...

The Ultimate Guide to Machine Learning Model Selection: 20 Secrets and Tips from Novice to Expert

【AI集成新趋势】：Hadoop 3.x新增组件Submarine与AI集成

对https://archive.ics.uci.edu/ml/datasets/Adult数据集构建一个分类器，测试分类器性能，列表展示分类器性能指标，画出ROC曲线的python代码

基于微信小程序的在线办公小程序答辩PPT.pptx

机器学习（预测模型）：2000年至2015年期间193个国家的预期寿命和相关健康因素的数据

基于微信小程序的“健康早知道”微信小程序答辩PPT.pptx

基于微信小程序的电影交流平台答辩PPT.pptx

计算机字符编码GB18030.PDF

Hive 操作基础（进阶版）多级分区数据文件2

基于java的贫困生管理系统答辩PPT.pptx

pandas-2.1.4-cp312-cp312-win_amd64.zip

TA_Lib轮子无需编译-TA_Lib-0.4.18-cp38-cp38-win32.whl.zip

课设毕设基于SpringBoot+Vue的瑜伽体验课预约系统源码可运行.zip

tornado-6.2b2.tar.gz

最新推荐

基于微信小程序的在线办公小程序答辩PPT.pptx

机器学习（预测模型）：2000年至2015年期间193个国家的预期寿命和相关健康因素的数据

Aspose资源包：转PDF无水印学习工具

管理建模和仿真的文件

【R语言高性能计算秘诀】：代码优化，提升分析效率的专家级方法

在构建视频会议系统时，如何通过H.323协议实现音视频流的高效传输，并确保通信的稳定性？

Go语言控制台输入输出操作教程

"互动学习：行动中的多样性与论文攻读经历"

【R语言机器学习新手起步】：caret包带你进入预测建模的世界

在选择PL2303和CP2102/CP2103 USB转串口芯片时，应如何考虑和比较它们的数据格式和波特率支持能力？