朴素贝叶斯进行垃圾邮件分类代码
时间: 2023-07-05 21:10:29 浏览: 95
好的,以下是一个简单的朴素贝叶斯进行垃圾邮件分类的代码示例:
```python
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix
# 读取数据
data = pd.read_csv('spam.csv', encoding='ISO-8859-1')
data = data[['v1', 'v2']]
data = data.rename(columns={'v1': 'label', 'v2': 'text'})
# 将标签转换为数字
data['label'] = np.where(data['label'] == 'spam', 1, 0)
# 划分数据集
train_size = int(len(data) * 0.8)
train_data = data[:train_size]
test_data = data[train_size:]
# 特征提取
vectorizer = CountVectorizer()
train_features = vectorizer.fit_transform(train_data['text'])
test_features = vectorizer.transform(test_data['text'])
# 训练模型
clf = MultinomialNB()
clf.fit(train_features, train_data['label'])
# 预测
train_pred = clf.predict(train_features)
test_pred = clf.predict(test_features)
# 评估模型
print('Train Accuracy:', accuracy_score(train_data['label'], train_pred))
print('Test Accuracy:', accuracy_score(test_data['label'], test_pred))
print('Confusion Matrix:\n', confusion_matrix(test_data['label'], test_pred))
```
需要注意的是,这只是一个简单的示例代码,实际垃圾邮件分类还需要更加复杂的特征提取和模型调优。
阅读全文