编写基于Logistics回归保险风险预测的代码
时间: 2023-07-14 13:14:25 浏览: 36
首先,我们需要导入所需的库:
```python
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
```
接着,我们可以读取数据集并进行预处理:
```python
data = pd.read_csv('insurance.csv')
# 处理缺失值
data.fillna(data.mean(), inplace=True)
# 将分类变量转换为数值变量
data['sex'] = data['sex'].apply(lambda x: 1 if x == 'male' else 0)
data['smoker'] = data['smoker'].apply(lambda x: 1 if x == 'yes' else 0)
data['region'] = data['region'].map({'northeast': 0, 'northwest': 1, 'southeast': 2, 'southwest': 3})
# 划分特征和标签
X = data.drop(['charges'], axis=1)
y = data['charges']
```
然后,我们可以进行训练集和测试集的划分:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```
接下来,我们可以使用 Logistic 回归模型进行训练和预测:
```python
# 初始化模型
lr = LogisticRegression()
# 训练模型
lr.fit(X_train, y_train)
# 预测测试集结果
y_pred = lr.predict(X_test)
```
我们可以使用一些评估指标来评估模型的性能:
```python
# 计算评估指标
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
# 打印评估指标
print('Accuracy: {:.2f}'.format(accuracy))
print('Precision: {:.2f}'.format(precision))
print('Recall: {:.2f}'.format(recall))
print('F1 Score: {:.2f}'.format(f1))
```
最后,我们可以使用模型来进行预测:
```python
# 预测新数据
new_data = pd.DataFrame({
'age': [30],
'sex': [1],
'bmi': [25],
'children': [1],
'smoker': [0],
'region': [2]
})
prediction = lr.predict(new_data)[0]
print('预测的保险费用为:{:.2f}'.format(prediction))
```