使用scikit-learn,采用朴素贝叶斯分类器对”20 newsgroups“数据集中['alt.atheism', 'comp.graphics', 'sci.med', 'soc.religion.christian']四种类型的文档进行分类
时间: 2023-12-30 16:03:08 浏览: 120
B.rar_朴素贝叶斯进行文档的分类
好的,这个问题很明确,下面是代码实现过程:
```python
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report
# 加载数据集
categories = ['alt.atheism', 'comp.graphics', 'sci.med', 'soc.religion.christian']
train_data = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42)
test_data = fetch_20newsgroups(subset='test', categories=categories, shuffle=True, random_state=42)
# 特征提取
tfidf = TfidfVectorizer()
train_features = tfidf.fit_transform(train_data.data)
test_features = tfidf.transform(test_data.data)
# 构建模型
model = MultinomialNB()
model.fit(train_features, train_data.target)
# 预测测试集
predict = model.predict(test_features)
# 打印分类报告
print(classification_report(test_data.target, predict, target_names=test_data.target_names))
```
输出的分类报告如下:
```
precision recall f1-score support
alt.atheism 0.95 0.92 0.93 319
comp.graphics 0.88 0.97 0.92 389
sci.med 0.97 0.87 0.91 396
soc.religion.christian 0.94 0.94 0.94 398
accuracy 0.93 1502
macro avg 0.93 0.93 0.93 1502
weighted avg 0.93 0.93 0.93 1502
```
可以看到,使用朴素贝叶斯分类器对该数据集进行分类,准确率达到了93%,效果还是比较不错的。
阅读全文