python针对新闻数据,使用朴素贝叶斯算法进行新闻分类
时间: 2024-02-17 16:04:35 浏览: 66
好的,下面是Python实现的代码:
```python
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
# 读取数据集
data = pd.read_csv('news.csv')
# 划分训练集和测试集
train_data = data.sample(frac=0.8, random_state=42)
test_data = data.drop(train_data.index)
# 特征提取
vectorizer = CountVectorizer(stop_words='english')
train_features = vectorizer.fit_transform(train_data['text'])
test_features = vectorizer.transform(test_data['text'])
# 训练模型
clf = MultinomialNB()
clf.fit(train_features, train_data['label'])
# 预测测试集并评估模型性能
pred = clf.predict(test_features)
accuracy = accuracy_score(test_data['label'], pred)
print('Accuracy:', accuracy)
```
其中,我们先读取新闻数据集,然后将其划分为训练集和测试集。接着,使用`CountVectorizer`对文本进行特征提取,将文本转化为词频向量。然后,使用`MultinomialNB`训练朴素贝叶斯分类器,并对测试集进行预测,最后使用`accuracy_score`计算准确率作为性能评估指标。
阅读全文