Python代码实现泰坦尼克号数据清洗
时间: 2023-11-05 15:10:40 浏览: 133
以下是Python代码实现泰坦尼克号数据清洗的示例:
```python
import pandas as pd
# 读取数据文件
titanic = pd.read_csv('titanic.csv')
# 删除无用列
titanic = titanic.drop(['PassengerId', 'Name', 'Ticket', 'Cabin'], axis=1)
# 处理缺失值
titanic['Age'] = titanic['Age'].fillna(titanic['Age'].median())
titanic['Embarked'] = titanic['Embarked'].fillna(titanic['Embarked'].mode()[0])
# 将分类变量转换为数字变量
titanic['Sex'] = titanic['Sex'].map({'female': 0, 'male': 1}).astype(int)
embarked_mapping = {'S': 0, 'C': 1, 'Q': 2}
titanic['Embarked'] = titanic['Embarked'].map(embarked_mapping).astype(int)
# 查看数据清洗后的结果
print(titanic.head())
```
代码的执行结果为:
```
Survived Pclass Sex Age SibSp Parch Fare Embarked
0 0 3 1 22.0 1 0 7.2500 0
1 1 1 0 38.0 1 0 71.2833 1
2 1 3 0 26.0 0 0 7.9250 0
3 1 1 0 35.0 1 0 53.1000 0
4 0 3 1 35.0 0 0 8.0500 0
```
阅读全文