用sklearn的20newsgroups数据集做贝叶斯分类
时间: 2023-07-24 20:20:56 浏览: 135
好的,下面是使用sklearn中的20newsgroups数据集进行贝叶斯分类的示例代码:
```python
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
# 加载数据集
categories = ['alt.atheism', 'talk.religion.misc', 'comp.graphics', 'sci.space']
train_data = fetch_20newsgroups(subset='train', categories=categories)
test_data = fetch_20newsgroups(subset='test', categories=categories)
# 特征提取
vectorizer = TfidfVectorizer()
X_train = vectorizer.fit_transform(train_data.data)
X_test = vectorizer.transform(test_data.data)
# 训练模型
model = make_pipeline(TfidfVectorizer(), MultinomialNB())
model.fit(train_data.data, train_data.target)
# 预测
predicted = model.predict(test_data.data)
# 输出结果
for doc, category in zip(test_data.data, predicted):
print('%r => %s' % (doc[:50], train_data.target_names[category]))
```
这个示例代码加载了20newsgroups数据集的四个类别(atheism、religion、graphics、space),使用TfidfVectorizer将文本转换为特征,并使用MultinomialNB训练模型。最后,我们使用训练好的模型对测试集进行预测,并输出结果。
阅读全文