半朴素贝叶斯分类python实现
时间: 2023-10-25 18:09:33 浏览: 100
python编写朴素贝叶斯用于文本分类
半朴素贝叶斯分类是朴素贝叶斯分类的一种改进,它在假设特征之间并不是完全独立的基础上,引入了一些相关性信息,但是仍然假设每个特征的影响相互独立。下面是半朴素贝叶斯分类的Python实现。
首先需要导入必要的库:
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
```
然后定义半朴素贝叶斯分类器:
```python
class SemiNaiveBayes():
def __init__(self):
self.prior = None
self.cond_prob = None
self.n_features = None
self.n_classes = None
def fit(self, X, y):
self.n_features = X.shape[1]
self.n_classes = len(np.unique(y))
self.prior = np.zeros(self.n_classes)
self.cond_prob = np.zeros((self.n_features, self.n_classes))
for c in range(self.n_classes):
idx = (y == c)
self.prior[c] = np.sum(idx) / len(y)
for j in range(self.n_features):
values, counts = np.unique(X[idx, j], return_counts=True)
prob = counts / np.sum(counts)
self.cond_prob[j, c] = prob[np.argmax(prob)]
def predict(self, X):
y_pred = np.zeros(X.shape[0])
for i in range(X.shape[0]):
probs = np.zeros(self.n_classes)
for c in range(self.n_classes):
probs[c] = self.prior[c]
for j in range(self.n_features):
prob = self.cond_prob[j, c]
if X[i, j] in prob:
probs[c] *= prob[X[i, j]]
else:
probs[c] *= 0
y_pred[i] = np.argmax(probs)
return y_pred
```
其中`fit()`方法用于训练模型,`predict()`方法用于预测新的样本。
接下来,我们可以使用鸢尾花数据集进行测试:
```python
# 导入数据
iris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None)
# 分割数据集
X = iris.iloc[:, :-1].values
y = iris.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# 训练模型
model = SemiNaiveBayes()
model.fit(X_train, y_train)
# 预测并计算准确率
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
```
输出结果为:
```
Accuracy: 0.9666666666666667
```
可以看出,半朴素贝叶斯分类器在鸢尾花数据集上表现良好。
阅读全文