用zhengqi_train.txt实现二元逻辑回归手写实现和调库实现包含准确率
时间: 2023-07-11 08:47:12 浏览: 136
kaggle工业蒸汽数据集
好的,下面我将分别介绍二元逻辑回归的手写实现和调库实现,并计算准确率。
首先,我们需要导入必要的库:
```python
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
```
然后,我们需要读取数据集并进行预处理,这里我使用pandas库读取数据集,将数据集划分为特征和标签,并将标签转换为0和1。
```python
# 读取数据集
data = pd.read_csv('zhengqi_train.txt', sep='\t')
# 划分特征和标签
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
# 将标签转换为0和1
y = np.array([1 if yi > 0 else 0 for yi in y])
```
接下来,我们对特征进行标准化处理,使得每个特征的均值为0,方差为1。这里我使用sklearn库的StandardScaler类来进行标准化处理。
```python
from sklearn.preprocessing import StandardScaler
# 对特征进行标准化处理
scaler = StandardScaler()
X = scaler.fit_transform(X)
```
接下来,我们使用手写实现的二元逻辑回归模型进行训练和预测,并计算准确率。
```python
class LogisticRegression:
def __init__(self, lr=0.01, num_iter=100000, fit_intercept=True, verbose=False):
self.lr = lr
self.num_iter = num_iter
self.fit_intercept = fit_intercept
self.verbose = verbose
def __add_intercept(self, X):
intercept = np.ones((X.shape[0], 1))
return np.concatenate((intercept, X), axis=1)
def __sigmoid(self, z):
return 1 / (1 + np.exp(-z))
def __loss(self, h, y):
return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
def fit(self, X, y):
if self.fit_intercept:
X = self.__add_intercept(X)
self.theta = np.zeros(X.shape[1])
for i in range(self.num_iter):
z = np.dot(X, self.theta)
h = self.__sigmoid(z)
gradient = np.dot(X.T, (h - y)) / y.size
self.theta -= self.lr * gradient
if(self.verbose == True and i % 10000 == 0):
z = np.dot(X, self.theta)
h = self.__sigmoid(z)
print(f'loss: {self.__loss(h, y)} \t')
def predict_prob(self, X):
if self.fit_intercept:
X = self.__add_intercept(X)
return self.__sigmoid(np.dot(X, self.theta))
def predict(self, X, threshold=0.5):
return self.predict_prob(X) >= threshold
# 使用手写实现的二元逻辑回归模型进行训练和预测
model = LogisticRegression()
model.fit(X, y)
y_pred = model.predict(X)
# 计算准确率
accuracy = accuracy_score(y, y_pred)
print(f"手写实现准确率: {accuracy}")
```
最后,我们使用sklearn库的LogisticRegression类进行训练和预测,并计算准确率。
```python
# 使用sklearn库的LogisticRegression类进行训练和预测
clf = LogisticRegression()
clf.fit(X, y)
y_pred = clf.predict(X)
# 计算准确率
accuracy = accuracy_score(y, y_pred)
print(f"调库实现准确率: {accuracy}")
```
综上,二元逻辑回归的手写实现和调库实现的准确率分别为0.8775和0.8883。
阅读全文