序实现一个简单的多值贝叶斯分类器
时间: 2024-01-24 15:05:30 浏览: 20
多值贝叶斯分类器是一种常见的分类算法,它可以用于处理多个离散或连续特征的分类问题。以下是一个简单的多值贝叶斯分类器的 Python 实现:
```python
from collections import Counter
import math
class NaiveBayesClassifier:
def __init__(self):
self.classes = []
self.class_prior_probs = {}
self.class_feature_probs = {}
def fit(self, X, y):
self.classes = list(set(y))
n_samples = len(y)
for c in self.classes:
self.class_prior_probs[c] = y.count(c) / n_samples
c_indexes = [i for i, label in enumerate(y) if label == c]
c_samples = [X[i] for i in c_indexes]
c_features = list(zip(*c_samples))
self.class_feature_probs[c] = {}
for i, feature in enumerate(c_features):
if isinstance(feature[0], str):
self.class_feature_probs[c][i] = dict(Counter(feature))
else:
self.class_feature_probs[c][i] = {'mean': sum(feature) / len(feature),
'std': math.sqrt(sum([(x - sum(feature) / len(feature)) ** 2 for x in feature]) / len(feature))}
def predict(self, X):
predictions = []
for sample in X:
max_prob = 0
pred_class = None
for c in self.classes:
prob = self.class_prior_probs[c]
for i, feature in enumerate(sample):
if isinstance(feature, str):
feature_count = self.class_feature_probs[c][i].get(feature, 0)
prob *= (feature_count + 1) / (sum(self.class_feature_probs[c][i].values()) + len(self.class_feature_probs[c][i]))
else:
mean = self.class_feature_probs[c][i]['mean']
std = self.class_feature_probs[c][i]['std']
prob *= 1 / (std * math.sqrt(2 * math.pi)) * math.exp(- (feature - mean) ** 2 / (2 * std ** 2))
if prob > max_prob:
max_prob = prob
pred_class = c
predictions.append(pred_class)
return predictions
```
这个分类器可以处理离散和连续特征。在 `fit()` 方法中,我们计算每个类别的先验概率和每个特征在每个类别中出现的条件概率。当特征是离散的时候,我们使用词频计数,当特征是连续的时候,我们使用高斯分布来估计条件概率。在 `predict()` 方法中,我们使用贝叶斯公式计算每个类别的后验概率,并选择具有最高概率的类别作为预测结果。