朴素贝叶斯分类器的分本分类python
时间: 2023-05-26 18:01:23 浏览: 41
代码示例:
首先需要导入必要的库:
```
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
```
接下来需要定义一个训练集和测试集,其中训练集包含了多个文档的文本内容和对应的分类。这里我们假设有两个文档,一个分类为“spam”(垃圾邮件),另一个分类为“ham”(正常邮件):
```
train_documents = ['send us your password', 'send us your review',
'review your password and review your notes',
'review us', 'send your password']
train_labels = ['spam', 'ham', 'ham', 'spam', 'spam']
test_documents = ['send us your feedback', 'review your notes']
```
接下来需要将文本内容转换为特征向量。这里使用CountVectorizer进行转换:
```
vectorizer = CountVectorizer()
train_features = vectorizer.fit_transform(train_documents)
test_features = vectorizer.transform(test_documents)
```
接下来就可以使用朴素贝叶斯分类器进行分类:
```
classifier = MultinomialNB()
classifier.fit(train_features, train_labels)
predictions = classifier.predict(test_features)
```
最后,我们可以输出分类结果:
```
for i, prediction in enumerate(predictions):
print('Prediction for "{0}" is "{1}"'.format(test_documents[i], prediction))
```
完整代码如下:
```
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
train_documents = ['send us your password', 'send us your review',
'review your password and review your notes',
'review us', 'send your password']
train_labels = ['spam', 'ham', 'ham', 'spam', 'spam']
test_documents = ['send us your feedback', 'review your notes']
vectorizer = CountVectorizer()
train_features = vectorizer.fit_transform(train_documents)
test_features = vectorizer.transform(test_documents)
classifier = MultinomialNB()
classifier.fit(train_features, train_labels)
predictions = classifier.predict(test_features)
for i, prediction in enumerate(predictions):
print('Prediction for "{0}" is "{1}"'.format(test_documents[i], prediction))
```