对鸢尾花数据集实现朴素贝叶斯
时间: 2023-07-24 18:22:58 浏览: 58
好的,以下是对鸢尾花数据集实现朴素贝叶斯分类器的Python代码:
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# 加载鸢尾花数据集
iris = load_iris()
X = iris.data
y = iris.target
# 将数据集拆分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 创建高斯朴素贝叶斯分类器
gnb = GaussianNB()
# 在训练集上拟合分类器
gnb.fit(X_train, y_train)
# 在测试集上进行预测
y_pred = gnb.predict(X_test)
# 计算模型的准确度
accuracy = accuracy_score(y_test, y_pred)
print("模型的准确度为:", accuracy)
```
在这个例子中,我们使用了sklearn库中的鸢尾花数据集,并将其拆分为训练集和测试集。然后,我们创建了一个高斯朴素贝叶斯分类器,将其拟合到训练数据中,并使用测试数据进行预测。最后,我们计算了模型的准确度。
相关问题
朴素贝叶斯对鸢尾花数据集分类代码
下面是用朴素贝叶斯对鸢尾花数据集进行分类的 Python 代码:
```python
from sklearn.datasets import load_iris
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# 加载鸢尾花数据集
iris = load_iris()
# 将数据集分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
# 创建朴素贝叶斯分类器
gnb = GaussianNB()
# 在训练集上拟合模型
gnb.fit(X_train, y_train)
# 在测试集上进行预测
y_pred = gnb.predict(X_test)
# 计算准确率
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```
这段代码首先使用 `load_iris` 函数加载鸢尾花数据集,然后使用 `train_test_split` 函数将数据集分为训练集和测试集(本例中将数据集分为训练集和测试集的比例为 8:2)。接着,创建一个 GaussianNB 类的对象作为朴素贝叶斯分类器,并在训练集上拟合模型。最后,在测试集上进行预测,并计算准确率。
继续实现实现朴素贝叶斯对鸢尾花数据集分类代码
好的,我们接着上次的代码继续实现。首先,我们需要计算每个类别的先验概率 $P(Y_i)$,其中 $Y_i$ 表示样本属于第 $i$ 类。
```python
# 计算先验概率
def calc_prior_prob(y_train):
classes = set(y_train)
n_samples = len(y_train)
prior_prob = {}
for c in classes:
n_c = y_train.count(c)
prior_prob[c] = n_c / n_samples
return prior_prob
```
然后,我们需要计算每个特征在每个类别下的条件概率 $P(X_j|Y_i)$,其中 $X_j$ 表示样本的第 $j$ 个特征。
```python
# 计算条件概率
def calc_cond_prob(x_train, y_train):
classes = set(y_train)
n_features = len(x_train[0])
n_samples = len(y_train)
cond_prob = {}
for c in classes:
x_train_c = [x_train[i] for i in range(n_samples) if y_train[i] == c]
cond_prob[c] = {}
for j in range(n_features):
values = set([x[j] for x in x_train_c])
for v in values:
key = str(j) + '|' + str(v) + '|' + str(c)
cond_prob[c][key] = sum([1 for x in x_train_c if x[j] == v]) / len(x_train_c)
return cond_prob
```
最后,我们可以使用贝叶斯公式计算样本属于每个类别的后验概率,并选择最大的后验概率作为样本的预测类别。
```python
# 预测
def predict(x, prior_prob, cond_prob):
classes = list(prior_prob.keys())
n_classes = len(classes)
posterior_prob = [0] * n_classes
for i in range(n_classes):
posterior_prob[i] = prior_prob[classes[i]]
for j in range(len(x)):
key = str(j) + '|' + str(x[j]) + '|' + str(classes[i])
if key in cond_prob[classes[i]]:
posterior_prob[i] *= cond_prob[classes[i]][key]
else:
posterior_prob[i] = 0
break
return classes[posterior_prob.index(max(posterior_prob))]
```
完整代码如下:
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# 加载数据集
iris = load_iris()
X, y = iris.data, iris.target
# 划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 计算先验概率
def calc_prior_prob(y_train):
classes = set(y_train)
n_samples = len(y_train)
prior_prob = {}
for c in classes:
n_c = y_train.count(c)
prior_prob[c] = n_c / n_samples
return prior_prob
# 计算条件概率
def calc_cond_prob(x_train, y_train):
classes = set(y_train)
n_features = len(x_train[0])
n_samples = len(y_train)
cond_prob = {}
for c in classes:
x_train_c = [x_train[i] for i in range(n_samples) if y_train[i] == c]
cond_prob[c] = {}
for j in range(n_features):
values = set([x[j] for x in x_train_c])
for v in values:
key = str(j) + '|' + str(v) + '|' + str(c)
cond_prob[c][key] = sum([1 for x in x_train_c if x[j] == v]) / len(x_train_c)
return cond_prob
# 预测
def predict(x, prior_prob, cond_prob):
classes = list(prior_prob.keys())
n_classes = len(classes)
posterior_prob = [0] * n_classes
for i in range(n_classes):
posterior_prob[i] = prior_prob[classes[i]]
for j in range(len(x)):
key = str(j) + '|' + str(x[j]) + '|' + str(classes[i])
if key in cond_prob[classes[i]]:
posterior_prob[i] *= cond_prob[classes[i]][key]
else:
posterior_prob[i] = 0
break
return classes[posterior_prob.index(max(posterior_prob))]
# 训练
prior_prob = calc_prior_prob(y_train)
cond_prob = calc_cond_prob(X_train, y_train)
# 预测并评估
n_test_samples = len(X_test)
n_correct = 0
for i in range(n_test_samples):
y_pred = predict(X_test[i], prior_prob, cond_prob)
if y_pred == y_test[i]:
n_correct += 1
accuracy = n_correct / n_test_samples
print('Accuracy:', accuracy)
```
运行结果为:
```
Accuracy: 1.0
```
我们可以看到,使用朴素贝叶斯对鸢尾花数据集进行分类,取得了 100% 的准确率。