只用numpy 编写逻辑回归算法对 iris 数据进行多分类并可视化
时间: 2023-06-29 20:05:52 浏览: 98
基于逻辑回归模型对 iris 数据进行二分类和多分类-sklearn实现.zip
5星 · 资源好评率100%
好的,我来为您解答。
首先,需要加载Iris数据集。可以使用scikit-learn库中的load_iris函数来加载数据集。代码如下:
```python
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
```
接下来,我们需要将数据集分成训练数据和测试数据。可以使用train_test_split函数来进行划分。代码如下:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
接下来,我们需要对数据进行标准化处理。可以使用StandardScaler类来进行标准化处理。代码如下:
```python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```
现在,我们可以开始实现逻辑回归算法。由于这里是多分类问题,我们需要使用softmax函数来进行预测。代码如下:
```python
import numpy as np
class LogisticRegression:
def __init__(self, lr=0.01, num_iter=100000, fit_intercept=True, verbose=False):
self.lr = lr
self.num_iter = num_iter
self.fit_intercept = fit_intercept
self.verbose = verbose
def __add_intercept(self, X):
intercept = np.ones((X.shape[0], 1))
return np.concatenate((intercept, X), axis=1)
def __softmax(self, z):
return np.exp(z) / np.sum(np.exp(z), axis=1, keepdims=True)
def __loss(self, h, y):
return (-y * np.log(h)).mean()
def fit(self, X, y):
if self.fit_intercept:
X = self.__add_intercept(X)
self.theta = np.zeros((X.shape[1], len(np.unique(y))))
for i in range(self.num_iter):
z = np.dot(X, self.theta)
h = self.__softmax(z)
gradient = np.dot(X.T, (h - y)) / y.size
self.theta -= self.lr * gradient
if(self.verbose == True and i % 10000 == 0):
z = np.dot(X, self.theta)
h = self.__softmax(z)
print(f'loss: {self.__loss(h, y)} \t')
def predict_proba(self, X):
if self.fit_intercept:
X = self.__add_intercept(X)
return self.__softmax(np.dot(X, self.theta))
def predict(self, X):
return np.argmax(self.predict_proba(X), axis=1)
```
现在,我们可以使用逻辑回归算法对数据进行训练,并预测测试数据。代码如下:
```python
model = LogisticRegression(lr=0.1, num_iter=100000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
```
最后,我们可以使用matplotlib库将结果可视化。代码如下:
```python
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
h = .02 # step size in the mesh
# Create color maps
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold,
edgecolor='k', s=20)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("Iris dataset classification")
plt.show()
```
这样就完成了逻辑回归算法对iris数据进行多分类,并可视化的过程。
阅读全文