(1)使用KNN算法对泰坦尼克号数据集进行分类 (2)使用测试集测试训练好的模型,并打印准确率
时间: 2023-12-14 17:38:56 浏览: 68
好的,我来给你讲解一下KNN算法对泰坦尼克号数据集进行分类的过程。
首先,我们需要先导入需要用到的库和数据集:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# 读取数据集
data = pd.read_csv('titanic.csv')
```
接下来,我们需要对数据集进行处理,包括缺失值处理、特征提取、特征归一化等操作。这里我们只针对一些重要的特征进行处理,具体代码如下:
```python
# 处理Age缺失值
data['Age'] = data['Age'].fillna(data['Age'].median())
# 处理Embarked缺失值
data['Embarked'] = data['Embarked'].fillna('S')
# 特征提取
data = data[['Survived', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']]
data['Sex'] = data['Sex'].map({'male': 0, 'female': 1})
data['Embarked'] = data['Embarked'].map({'S': 0, 'C': 1, 'Q': 2})
# 特征归一化
data = (data - data.min()) / (data.max() - data.min())
```
接着,我们将数据集分为训练集和测试集,并训练KNN分类器:
```python
# 分割数据集
train, test = train_test_split(data, test_size=0.2)
# 训练KNN分类器
knn = KNeighborsClassifier()
knn.fit(train.iloc[:, 1:], train.iloc[:, 0])
```
最后,我们使用测试集测试训练好的模型,并打印准确率:
```python
# 测试模型
predict = knn.predict(test.iloc[:, 1:])
accuracy = accuracy_score(test.iloc[:, 0], predict)
print('准确率为:', accuracy)
```
完整代码如下:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# 读取数据集
data = pd.read_csv('titanic.csv')
# 处理Age缺失值
data['Age'] = data['Age'].fillna(data['Age'].median())
# 处理Embarked缺失值
data['Embarked'] = data['Embarked'].fillna('S')
# 特征提取
data = data[['Survived', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']]
data['Sex'] = data['Sex'].map({'male': 0, 'female': 1})
data['Embarked'] = data['Embarked'].map({'S': 0, 'C': 1, 'Q': 2})
# 特征归一化
data = (data - data.min()) / (data.max() - data.min())
# 分割数据集
train, test = train_test_split(data, test_size=0.2)
# 训练KNN分类器
knn = KNeighborsClassifier()
knn.fit(train.iloc[:, 1:], train.iloc[:, 0])
# 测试模型
predict = knn.predict(test.iloc[:, 1:])
accuracy = accuracy_score(test.iloc[:, 0], predict)
print('准确率为:', accuracy)
```
希望可以帮助到你!
阅读全文