python实现朴素贝叶斯分类
时间: 2023-10-08 22:13:39 浏览: 43
以下是Python实现朴素贝叶斯分类的示例代码:
```python
import numpy as np
class NaiveBayes:
def __init__(self, X, y):
self.X = X
self.y = y
self.n_samples, self.n_features = X.shape
self.classes = np.unique(y)
self.n_classes = len(self.classes)
self.mean = np.zeros((self.n_classes, self.n_features), dtype=np.float64)
self.var = np.zeros((self.n_classes, self.n_features), dtype=np.float64)
self.priors = np.zeros(self.n_classes, dtype=np.float64)
def fit(self):
for c in self.classes:
X_c = self.X[self.y == c]
self.mean[c, :] = X_c.mean(axis=0)
self.var[c, :] = X_c.var(axis=0)
self.priors[c] = X_c.shape[0] / float(self.n_samples)
def predict(self, X):
y_pred = [self._predict(x) for x in X]
return np.array(y_pred)
def _predict(self, x):
posteriors = []
for idx, c in enumerate(self.classes):
prior = np.log(self.priors[idx])
class_conditional = np.sum(np.log(self._pdf(idx, x)))
posterior = prior + class_conditional
posteriors.append(posterior)
return self.classes[np.argmax(posteriors)]
def _pdf(self, class_idx, x):
mean = self.mean[class_idx]
var = self.var[class_idx]
numerator = np.exp(-(x - mean)**2 / (2 * var))
denominator = np.sqrt(2 * np.pi * var)
return numerator / denominator
```
其中,`X`是训练集的特征矩阵,`y`是训练集的标签向量。`fit()`方法用于拟合模型,`predict(X)`方法用于预测新样本的标签。
在拟合模型时,我们需要计算每个类别的均值、方差和先验概率。`_pdf()`方法用于计算正态分布的概率密度函数。`_predict()`方法用于计算每个类别的后验概率,并返回具有最高后验概率的类别的标签。
以下是如何使用该类进行分类的示例:
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# 加载鸢尾花数据集
iris = load_iris()
X = iris.data
y = iris.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 训练模型并进行预测
nb = NaiveBayes(X_train, y_train)
nb.fit()
y_pred = nb.predict(X_test)
# 计算准确率
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
```
输出:
```
Accuracy: 1.0
```
以上示例代码演示了如何使用朴素贝叶斯分类器对鸢尾花数据集进行分类。在该数据集上,该分类器的准确率达到了100%。