以机器学习西瓜的好坏为例,写出朴素贝叶斯算法判断西瓜好坏的程序
时间: 2023-05-26 08:02:12 浏览: 196
假设用以下5个特征来描述西瓜:
1. 色泽:青绿(0)、乌黑(1)、浅白(2)
2. 根蒂:蜷缩(0)、稍蜷(1)、硬挺(2)
3. 敲声:清脆(0)、浊响(1)、沉闷(2)
4. 纹理:清晰(0)、稍糊(1)、模糊(2)
5. 脐部:凹陷(0)、稍凹(1)、平坦(2)
好瓜标记为1,坏瓜标记为0。
训练集如下:
|编号 | 色泽 | 根蒂 | 敲声 | 纹理 | 脐部 | 好瓜 |
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|1 | 2 | 2 | 0 | 2 | 0 | 0 |
|2 | 2 | 2 | 0 | 2 | 0 | 0 |
|3 | 1 | 2 | 0 | 2 | 0 | 1 |
|4 | 0 | 2 | 0 | 2 | 0 | 1 |
|5 | 0 | 1 | 0 | 2 | 0 | 1 |
|6 | 0 | 0 | 0 | 1 | 0 | 0 |
|7 | 1 | 0 | 0 | 1 | 0 | 1 |
|8 | 2 | 1 | 1 | 1 | 1 | 1 |
|9 | 0 | 0 | 1 | 1 | 1 | 1 |
|10 | 1 | 1 | 0 | 0 | 2 | 1 |
|11 | 0 | 1 | 1 | 0 | 2 | 1 |
|12 | 1 | 1 | 0 | 1 | 2 | 1 |
|13 | 1 | 2 | 2 | 2 | 1 | 0 |
|14 | 2 | 1 | 1 | 0 | 2 | 1 |
|15 | 0 | 1 | 1 | 2 | 1 | 0 |
代码实现如下:
```
# 好瓜的先验概率
p_good = 0.5
# 坏瓜的先验概率
p_bad = 0.5
# 色泽特征值与标记的映射关系
color_dict = {0: 'qing_lv', 1: 'wu_hei', 2: 'qian_bai'}
# 根蒂特征值与标记的映射关系
root_dict = {0: 'quan_su', 1: 'shao_quan', 2: 'ying_ting'}
# 敲声特征值与标记的映射关系
knock_dict = {0: 'qing_cui', 1: 'zhuo_xiang', 2: 'chen_men'}
# 纹理特征值与标记的映射关系
texture_dict = {0: 'qing_xi', 1: 'shao_hu', 2: 'mo_hu'}
# 脐部特征值与标记的映射关系
navel_dict = {0: 'ao_xian', 1: 'shao_ao', 2: 'ping_tan'}
# 训练集
X_train = [[2, 2, 0, 2, 0],
[2, 2, 0, 2, 0],
[1, 2, 0, 2, 0],
[0, 2, 0, 2, 0],
[0, 1, 0, 2, 0],
[0, 0, 0, 1, 0],
[1, 0, 0, 1, 0],
[2, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[1, 1, 0, 0, 2],
[0, 1, 1, 0, 2],
[1, 1, 0, 1, 2],
[1, 2, 2, 2, 1],
[2, 1, 1, 0, 2],
[0, 1, 1, 2, 1]]
# 训练集样本对应的标记
y_train = [0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0]
# 待判断的西瓜
x_pred = [0, 2, 0, 2, 1]
# 坏瓜的条件概率
p0 = 1
# 好瓜的条件概率
p1 = 1
# 计算坏瓜和好瓜的条件概率
for i in range(len(x_pred)):
if i == 0:
p0 *= X_train.count([x_pred[0], X_train[i][1], X_train[i][2], X_train[i][3], X_train[i][4]]) / y_train.count(0)
p1 *= X_train.count([x_pred[0], X_train[i][1], X_train[i][2], X_train[i][3], X_train[i][4]]) / y_train.count(1)
elif i == 1:
p0 *= X_train.count([X_train[i][0], x_pred[1], X_train[i][2], X_train[i][3], X_train[i][4]]) / y_train.count(0)
p1 *= X_train.count([X_train[i][0], x_pred[1], X_train[i][2], X_train[i][3], X_train[i][4]]) / y_train.count(1)
elif i == 2:
p0 *= X_train.count([X_train[i][0], X_train[i][1], x_pred[2], X_train[i][3], X_train[i][4]]) / y_train.count(0)
p1 *= X_train.count([X_train[i][0], X_train[i][1], x_pred[2], X_train[i][3], X_train[i][4]]) / y_train.count(1)
elif i == 3:
p0 *= X_train.count([X_train[i][0], X_train[i][1], X_train[i][2], x_pred[3], X_train[i][4]]) / y_train.count(0)
p1 *= X_train.count([X_train[i][0], X_train[i][1], X_train[i][2], x_pred[3], X_train[i][4]]) / y_train.count(1)
else:
p0 *= X_train.count([X_train[i][0], X_train[i][1], X_train[i][2], X_train[i][3], x_pred[4]]) / y_train.count(0)
p1 *= X_train.count([X_train[i][0], X_train[i][1], X_train[i][2], X_train[i][3], x_pred[4]]) / y_train.count(1)
# 计算后验概率
posterior_bad = p0 * p_bad
posterior_good = p1 * p_good
# 判断西瓜好坏
if posterior_bad > posterior_good:
print('坏瓜')
else:
print('好瓜')
```
阅读全文