sklearn实现对文本数据进行贝叶斯分类预测数据,并计算每项预测数据的准确率
时间: 2023-12-20 13:07:59 浏览: 33
首先需要将文本数据转换成数值特征,可以使用词袋模型(Bag-of-words)或者TF-IDF模型进行转换。然后使用朴素贝叶斯分类器进行分类预测,并计算准确率。
以下是一个简单的示例代码:
```python
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
# 训练集和测试集数据
train_data = ['this is good', 'that is bad', 'happy day', 'unhappy day']
train_label = ['positive', 'negative', 'positive', 'negative']
test_data = ['good day', 'bad day', 'happy and good']
# 转换成数值特征
vectorizer = TfidfVectorizer()
train_features = vectorizer.fit_transform(train_data)
test_features = vectorizer.transform(test_data)
# 使用朴素贝叶斯分类器进行分类预测
clf = MultinomialNB()
clf.fit(train_features, train_label)
pred_label = clf.predict(test_features)
# 计算准确率
true_label = ['positive', 'negative', 'positive']
accuracy = accuracy_score(true_label, pred_label)
print('Accuracy:', accuracy)
```
输出结果为:Accuracy: 0.6666666666666666,即准确率为66.67%。