1. 利用随机森林求出Titanic数据集的训练集及测试集预测准确率。 2. 跑出影响因素的重要性排序图。 3. 跑出随机森林收敛图。 4. 将代码列出。

1. 利用随机森林求出Titanic数据集的训练集及测试集预测准确率的代码如下： ```python import pandas as pd from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score from sklearn.model_selection import train_test_split # 读取数据 data = pd.read_csv('titanic.csv') # 筛选特征和标签 features = data[['Pclass', 'Age', 'SibSp', 'Parch', 'Fare', 'Sex', 'Embarked']] target = data['Survived'] # 特征处理 features['Age'] = features['Age'].fillna(features['Age'].mean()) features['Fare'] = features['Fare'].fillna(features['Fare'].mean()) features['Sex'] = features['Sex'].map({'male': 0, 'female': 1}) features['Embarked'] = features['Embarked'].fillna('S') features['Embarked'] = features['Embarked'].map({'C': 0, 'Q': 1, 'S': 2}) # 划分训练集和测试集 x_train, x_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42) # 随机森林模型 rf = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42) rf.fit(x_train, y_train) # 预测结果 y_train_pred = rf.predict(x_train) y_test_pred = rf.predict(x_test) # 计算准确率 train_acc = accuracy_score(y_train, y_train_pred) test_acc = accuracy_score(y_test, y_test_pred) print("训练集准确率：", train_acc) print("测试集准确率：", test_acc) ``` 2. 跑出影响因素的重要性排序图的代码如下： ```python import matplotlib.pyplot as plt # 获取特征重要性 importances = rf.feature_importances_ indices = np.argsort(importances)[::-1] # 绘制特征重要性排序图 plt.figure() plt.title("Feature importances") plt.bar(range(features.shape[1]), importances[indices], color="r", align="center") plt.xticks(range(features.shape[1]), features.columns[indices], rotation=90) plt.xlim([-1, features.shape[1]]) plt.show() ``` 3. 跑出随机森林收敛图的代码如下： ```python import numpy as np # 随机森林的oob误差 n_estimators = [1, 2, 4, 8, 16, 32, 64, 100, 200] train_accs = [] test_accs = [] for n_estimator in n_estimators: rf = RandomForestClassifier(n_estimators=n_estimator, max_depth=5, oob_score=True, random_state=42) rf.fit(x_train, y_train) train_accs.append(rf.score(x_train, y_train)) test_accs.append(rf.score(x_test, y_test)) plt.figure() plt.plot(n_estimators, train_accs, '-o', label='train') plt.plot(n_estimators, test_accs, '-o', label='test') plt.xlabel('n_estimators') plt.ylabel('accuracy') plt.legend() plt.show() ``` 4. 完整代码如下： ```python import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score from sklearn.model_selection import train_test_split # 读取数据 data = pd.read_csv('titanic.csv') # 筛选特征和标签 features = data[['Pclass', 'Age', 'SibSp', 'Parch', 'Fare', 'Sex', 'Embarked']] target = data['Survived'] # 特征处理 features['Age'] = features['Age'].fillna(features['Age'].mean()) features['Fare'] = features['Fare'].fillna(features['Fare'].mean()) features['Sex'] = features['Sex'].map({'male': 0, 'female': 1}) features['Embarked'] = features['Embarked'].fillna('S') features['Embarked'] = features['Embarked'].map({'C': 0, 'Q': 1, 'S': 2}) # 划分训练集和测试集 x_train, x_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42) # 随机森林模型 rf = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42) rf.fit(x_train, y_train) # 预测结果 y_train_pred = rf.predict(x_train) y_test_pred = rf.predict(x_test) # 计算准确率 train_acc = accuracy_score(y_train, y_train_pred) test_acc = accuracy_score(y_test, y_test_pred) print("训练集准确率：", train_acc) print("测试集准确率：", test_acc) # 获取特征重要性 importances = rf.feature_importances_ indices = np.argsort(importances)[::-1] # 绘制特征重要性排序图 plt.figure() plt.title("Feature importances") plt.bar(range(features.shape[1]), importances[indices], color="r", align="center") plt.xticks(range(features.shape[1]), features.columns[indices], rotation=90) plt.xlim([-1, features.shape[1]]) plt.show() # 随机森林的oob误差 n_estimators = [1, 2, 4, 8, 16, 32, 64, 100, 200] train_accs = [] test_accs = [] for n_estimator in n_estimators: rf = RandomForestClassifier(n_estimators=n_estimator, max_depth=5, oob_score=True, random_state=42) rf.fit(x_train, y_train) train_accs.append(rf.score(x_train, y_train)) test_accs.append(rf.score(x_test, y_test)) plt.figure() plt.plot(n_estimators, train_accs, '-o', label='train') plt.plot(n_estimators, test_accs, '-o', label='test') plt.xlabel('n_estimators') plt.ylabel('accuracy') plt.legend() plt.show() ```

1. 利用随机森林求出Titanic数据集的训练集及测试集预测准确率。 2. 跑出影响因素的重要性排序图。 3. 跑出随机森林收敛图。 4. 将代码列出。

相关推荐

泰坦尼克数据集_用于数据分析练习

KaggleTitanicSurvival:Kaggle 项目预测泰坦尼克号乘客的生还。 我使用 scikit-learn 的随机森林进行预测

dec-tree-random-forest-titanic:用决策树和随机森林模型预测泰坦尼克号乘客的存活率

1.利用随机森林求出Titanic数据集的训练集及测试集预测准确率。 2.跑出影响因素的重要性排序图。 3.跑出随机森林收敛图。 4.将代码列出。

（1）使用KNN算法对泰坦尼克号数据集进行分类 （2）使用测试集测试训练好的模型，并打印准确率

python不调用库实现留出法验证ID3决策树，以titanic数据集为例

python不调用库实现留出法验证CART决策树，以titanic数据集为例

titanic.xlsx 数据集

泰坦尼克号数据集pandas 男/女性的生还数量及生还率

python以titanic数据集做茎叶图

导入‘titanic’数据集，查找缺失值，并删除无效行与列，并分别打印出删除前后的大小。

c语言下载titanic数据集

如何对titanic数据集进行pandas预处理

（1）使用KNN算法对泰坦尼克号数据集进行分类 （2）使用测试集测试训练好的模型，并打印准确率 将代码复制到这里： 准确率结果截图：

用决策树、神经网络、logistic回归在titanic数据集进行二分类问题。

可以帮我完成Titanic数据集的可视化分析吗

数据集为Titanic数据集

titanic数据集下载

帮我对titanic数据集进行pandas预处理

最新推荐

任务三、titanic数据集分类问题

信氧饮吧-奶茶管理系统

京瓷TASKalfa系列维修手册：安全与操作指南

管理建模和仿真的文件

【进阶】入侵检测系统简介

轨道障碍物智能识别系统开发

小波变换在视频压缩中的应用

"互动学习：行动中的多样性与论文攻读经历"

【进阶】Python高级加密库cryptography

linuxjar包启动脚本

KaggleTitanicSurvival:Kaggle 项目预测泰坦尼克号乘客的生还。我使用 scikit-learn 的随机森林进行预测

（1）使用KNN算法对泰坦尼克号数据集进行分类（2）使用测试集测试训练好的模型，并打印准确率

（1）使用KNN算法对泰坦尼克号数据集进行分类（2）使用测试集测试训练好的模型，并打印准确率将代码复制到这里：准确率结果截图：