逻辑回归k近邻算法案例
时间: 2023-11-18 10:44:03 浏览: 96
i p t v.txt
假设我们有一个二分类问题,我们可以使用逻辑回归和k近邻算法来解决。下面我们将分别使用这两种算法来预测一个人是否会购买保险,数据集可以在这里下载:https://www.kaggle.com/sagnik1511/titanic-dataset-complete.
首先,我们使用逻辑回归算法来进行预测。
```python
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# 读取数据
data = pd.read_csv('insurance.csv')
# 数据处理
data = data.drop(['id', 'region'], axis=1)
data['sex'] = data['sex'].map({'female': 0, 'male': 1})
data['smoker'] = data['smoker'].map({'no': 0, 'yes': 1})
data['bought_insurance'] = data['bought_insurance'].map({'no': 0, 'yes': 1})
# 分离特征和标签
X = data.drop(['bought_insurance'], axis=1)
y = data['bought_insurance']
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 构建逻辑回归模型
model = LogisticRegression()
# 模型训练
model.fit(X_train, y_train)
# 预测
y_pred = model.predict(X_test)
# 计算准确率
acc = accuracy_score(y_test, y_pred)
print('逻辑回归算法的准确率为:', acc)
```
接下来,我们使用k近邻算法来进行预测。
```python
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# 读取数据
data = pd.read_csv('insurance.csv')
# 数据处理
data = data.drop(['id', 'region'], axis=1)
data['sex'] = data['sex'].map({'female': 0, 'male': 1})
data['smoker'] = data['smoker'].map({'no': 0, 'yes': 1})
data['bought_insurance'] = data['bought_insurance'].map({'no': 0, 'yes': 1})
# 分离特征和标签
X = data.drop(['bought_insurance'], axis=1)
y = data['bought_insurance']
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 构建k近邻模型
model = KNeighborsClassifier(n_neighbors=3)
# 模型训练
model.fit(X_train, y_train)
# 预测
y_pred = model.predict(X_test)
# 计算准确率
acc = accuracy_score(y_test, y_pred)
print('k近邻算法的准确率为:', acc)
```
可以看到,逻辑回归算法的准确率为0.833,k近邻算法的准确率为0.75。
阅读全文