基于朴素贝叶斯的新闻分类pc代码
时间: 2024-12-11 21:13:45 浏览: 5
项目实战-朴素贝叶斯算法实现新闻分类源码及数据集.zip
5星 · 资源好评率100%
基于朴素贝叶斯的新闻分类是一种常见的文本分类方法。朴素贝叶斯分类器基于贝叶斯定理,假设特征之间相互独立。以下是一个使用Python进行新闻分类的示例代码:
```python
import numpy as np
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, accuracy_score
# 加载新闻数据集
newsgroups = fetch_20newsgroups(subset='all', categories=['comp.graphics', 'sci.med', 'rec.sport.baseball', 'talk.politics.guns'])
# 特征提取
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(newsgroups.data)
y = newsgroups.target
# 划分训练集和测试集
split_index = int(len(X) * 0.8)
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]
# 训练朴素贝叶斯分类器
clf = MultinomialNB()
clf.fit(X_train, y_train)
# 预测
y_pred = clf.predict(X_test)
# 评估
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred, target_names=newsgroups.target_names))
```
这段代码使用了`scikit-learn`库中的`fetch_20newsgroups`数据集,`TfidfVectorizer`进行特征提取,`MultinomialNB`作为朴素贝叶斯分类器,并使用`classification_report`和`accuracy_score`进行评估。
阅读全文