使用Scikit-learn实现高斯朴素贝叶斯算法,数据集使用iris数据集,训练集占数据总量的80%,预测测试集样本类别及准确率,调整 random_state值比较准确率的变化。
时间: 2024-12-21 21:21:43 浏览: 5
在Python中,你可以使用scikit-learn库来实现高斯朴素贝叶斯算法,这是一个基于概率模型的分类器。首先,我们需要加载iris数据集,然后将其划分为训练集和测试集,其中80%的数据用于训练,剩下的20%用于测试。
以下是实现这个过程的一个简单步骤:
1. 导入必要的库:
```python
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
import numpy as np
```
2. 加载并预处理数据:
```python
iris = datasets.load_iris()
X = iris.data
y = iris.target
# 将数据划分为训练集和测试集
random_state = 42 # 可以改变random_state值来进行多次随机划分
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=random_state)
```
3. 创建并训练高斯朴素贝叶斯模型:
```python
gnb = GaussianNB()
gnb.fit(X_train, y_train)
```
4. 预测并计算准确率:
```python
y_pred = gnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy on the test set with random_state {random_state}: {accuracy*100:.2f}%")
```
5. 调整`random_state`值并重复步骤4,比如从0到99,可以观察准确率如何变化:
```python
accuracies = []
for rs in range(100):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=rs)
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
accuracies.append(accuracy_score(y_test, y_pred))
mean_accuracy = np.mean(accuracies)
std_accuracy = np.std(accuracies)
print(f"Average accuracy across different random states: {mean_accuracy * 100:.2f}% (std dev: {std_accuracy * 100:.2f}%)")
```
这将显示不同随机状态下模型的平均准确率及其标准差,帮助我们理解随机状态对结果的影响。
阅读全文