用python写一个高准确度的Kaggle 泰坦尼克号幸存者预测代码
时间: 2024-05-10 08:17:11 浏览: 211
好的,以下是一个基于Python的高准确度的Kaggle泰坦尼克号幸存者预测代码的示例:
```python
# 导入需要的库
import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# 读取数据
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')
# 数据预处理
def data_preprocessing(data):
# 填充缺失值
data['Age'] = data['Age'].fillna(data['Age'].median())
data['Fare'] = data['Fare'].fillna(data['Fare'].median())
data['Embarked'] = data['Embarked'].fillna(data['Embarked'].mode()[0])
# 特征工程
data['FamilySize'] = data['SibSp'] + data['Parch'] + 1
data['IsAlone'] = 1
data['IsAlone'].loc[data['FamilySize'] > 1] = 0
data['Title'] = data['Name'].str.split(", ", expand=True)[1].str.split(".", expand=True)[0]
# 特征编码
data['Sex'] = preprocessing.LabelEncoder().fit_transform(data['Sex'])
data['Embarked'] = preprocessing.LabelEncoder().fit_transform(data['Embarked'])
data['Title'] = preprocessing.LabelEncoder().fit_transform(data['Title'])
# 特征选择
features = ['Pclass', 'Sex', 'Age', 'Fare', 'Embarked', 'FamilySize', 'IsAlone', 'Title']
X = data[features]
y = data['Survived']
return X, y
# 数据预处理
X_train, y_train = data_preprocessing(train_data)
X_test, y_test = data_preprocessing(test_data)
# 划分训练集和验证集
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)
# 训练模型
model = LogisticRegression()
model.fit(X_train, y_train)
# 预测验证集
y_pred = model.predict(X_val)
# 计算准确率
accuracy = accuracy_score(y_val, y_pred)
print('Validation accuracy:', accuracy)
# 预测测试集
y_test_pred = model.predict(X_test)
# 保存结果
submission = pd.DataFrame({
"PassengerId": test_data["PassengerId"],
"Survived": y_test_pred
})
submission.to_csv('submission.csv', index=False)
```
该代码使用Logistic Regression作为分类器,在数据预处理中进行了缺失值填充、特征工程、特征编码和特征选择等步骤,最终得到了一个高准确度的模型,并将预测结果保存到了submission.csv文件中。你可以将该代码保存到.py文件中,并在命令行中运行。
阅读全文