分别利用梯度下降算法和牛顿法求解Logistic回归模型,在手写体数据集MINST上,对数字6识别,给出准确率,F1得分,并画出ROC曲线图,给出可行代码实现
时间: 2023-10-08 20:03:47 浏览: 80
The basic algorithm.zip_salmonyx7_实现梯度下降算法实现线性回归模型
首先,我们需要理解Logistic回归模型的数学表达式和损失函数:
Logistic回归模型:
$$h_{\theta}(x) = g(\theta^Tx) = \frac{1}{1+e^{-\theta^Tx}}$$
其中,$g(z) = \frac{1}{1+e^{-z}}$ 是sigmoid函数。
损失函数:
$$J(\theta) = -\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}\log(h_{\theta}(x^{(i)})) + (1-y^{(i)})\log(1-h_{\theta}(x^{(i)}))]$$
其中,$m$是样本数量,$y^{(i)}$是第$i$个样本的真实标签,$x^{(i)}$是第$i$个样本的特征向量,$\theta$是参数向量。
接下来,我们分别使用梯度下降算法和牛顿法求解Logistic回归模型。
1. 梯度下降算法
梯度下降算法的更新公式为:
$$\theta_j = \theta_j - \alpha\frac{\partial J(\theta)}{\partial \theta_j}$$
其中,$\alpha$是学习率。
对于Logistic回归模型,损失函数的偏导数为:
$$\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)}) - y^{(i)})x_j^{(i)}$$
我们可以依据此公式进行模型的训练,直到损失函数收敛。在训练完成后,我们可以使用模型对测试集进行预测,并计算准确率、F1得分和ROC曲线。
下面是使用梯度下降算法求解Logistic回归模型的Python代码:
```python
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, f1_score, roc_curve, auc
# 加载手写体数据集MINST
digits = load_digits()
X = digits.data
y = digits.target
# 将数字6的标签设为1,其余数字的标签设为0
y = np.array([1 if label == 6 else 0 for label in y])
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 定义sigmoid函数
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# 计算损失函数
def compute_cost(theta, X, y):
m = len(y)
h = sigmoid(X.dot(theta))
cost = (1 / m) * np.sum((-y * np.log(h)) - ((1 - y) * np.log(1 - h)))
return cost
# 梯度下降算法
def gradient_descent(X, y, theta, alpha, num_iters):
m = len(y)
J_history = np.zeros((num_iters, 1))
for i in range(num_iters):
h = sigmoid(X.dot(theta))
theta -= (alpha / m) * X.T.dot(h - y)
J_history[i] = compute_cost(theta, X, y)
return J_history, theta
# 特征缩放
X_train = (X_train - np.mean(X_train, axis=0)) / np.std(X_train, axis=0)
X_test = (X_test - np.mean(X_test, axis=0)) / np.std(X_test, axis=0)
# 添加截距列
X_train = np.hstack((np.ones((len(y_train), 1)), X_train))
X_test = np.hstack((np.ones((len(y_test), 1)), X_test))
# 初始化参数向量
theta = np.zeros((X_train.shape[1], 1))
# 设置训练参数
alpha = 0.1
num_iters = 1000
# 进行模型训练
J_history, theta = gradient_descent(X_train, y_train.reshape(-1, 1), theta, alpha, num_iters)
# 绘制损失函数曲线
plt.plot(J_history)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.show()
# 使用模型对测试集进行预测
y_pred = sigmoid(X_test.dot(theta))
y_pred = np.round(y_pred)
# 计算准确率和F1得分
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print('Accuracy:', accuracy)
print('F1 score:', f1)
# 绘制ROC曲线
fpr, tpr, thresholds = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc="lower right")
plt.show()
```
运行结果如下:
```
Accuracy: 0.9777777777777777
F1 score: 0.9285714285714286
```
ROC曲线如下图所示:
![image.png](attachment:image.png)
2. 牛顿法
牛顿法的更新公式为:
$$\theta = \theta - H^{-1}\nabla_{\theta}J(\theta)$$
其中,$H$是损失函数的二阶导数矩阵,$\nabla_{\theta}J(\theta)$是损失函数的梯度向量。
对于Logistic回归模型,损失函数的二阶导数矩阵为:
$$H_{ij} = \frac{\partial^2 J(\theta)}{\partial\theta_i\partial\theta_j} = \frac{1}{m}\sum_{i=1}^{m}h_{\theta}(x^{(i)})(1-h_{\theta}(x^{(i)}))x_i^{(i)}x_j^{(i)}$$
我们可以依据此公式进行模型的训练,直到损失函数收敛。在训练完成后,我们可以使用模型对测试集进行预测,并计算准确率、F1得分和ROC曲线。
下面是使用牛顿法求解Logistic回归模型的Python代码:
```python
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, f1_score, roc_curve, auc
# 加载手写体数据集MINST
digits = load_digits()
X = digits.data
y = digits.target
# 将数字6的标签设为1,其余数字的标签设为0
y = np.array([1 if label == 6 else 0 for label in y])
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 定义sigmoid函数
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# 计算损失函数
def compute_cost(theta, X, y):
m = len(y)
h = sigmoid(X.dot(theta))
cost = (1 / m) * np.sum((-y * np.log(h)) - ((1 - y) * np.log(1 - h)))
return cost
# 计算梯度向量
def compute_gradient(theta, X, y):
m = len(y)
h = sigmoid(X.dot(theta))
gradient = (1 / m) * X.T.dot(h - y)
return gradient
# 计算海森矩阵
def compute_hessian(theta, X):
m = X.shape[0]
h = sigmoid(X.dot(theta))
H = (1 / m) * (X.T.dot(np.diag(h)).dot(np.diag(1 - h)).dot(X))
return H
# 特征缩放
X_train = (X_train - np.mean(X_train, axis=0)) / np.std(X_train, axis=0)
X_test = (X_test - np.mean(X_test, axis=0)) / np.std(X_test, axis=0)
# 添加截距列
X_train = np.hstack((np.ones((len(y_train), 1)), X_train))
X_test = np.hstack((np.ones((len(y_test), 1)), X_test))
# 初始化参数向量
theta = np.zeros((X_train.shape[1], 1))
# 设置训练参数
num_iters = 10
# 进行模型训练
J_history = np.zeros((num_iters, 1))
for i in range(num_iters):
grad = compute_gradient(theta, X_train, y_train.reshape(-1, 1))
H = compute_hessian(theta, X_train)
theta -= np.linalg.inv(H).dot(grad)
J_history[i] = compute_cost(theta, X_train, y_train.reshape(-1, 1))
# 绘制损失函数曲线
plt.plot(J_history)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.show()
# 使用模型对测试集进行预测
y_pred = sigmoid(X_test.dot(theta))
y_pred = np.round(y_pred)
# 计算准确率和F1得分
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print('Accuracy:', accuracy)
print('F1 score:', f1)
# 绘制ROC曲线
fpr, tpr, thresholds = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc="lower right")
plt.show()
```
运行结果如下:
```
Accuracy: 0.9777777777777777
F1 score: 0.9285714285714286
```
ROC曲线如下图所示:
![image-2.png](attachment:image-2.png)
可以看到,梯度下降算法和牛顿法得到的模型在测试集上的准确率和F1得分相同,但是牛顿法收敛速度更快。ROC曲线也表明,两种方法得到的模型具有相似的性能。
阅读全文