写一个垃圾邮件二分类代码
时间: 2024-05-08 13:15:52 浏览: 57
垃圾邮件分类
5星 · 资源好评率100%
这里是一个简单的垃圾邮件二分类代码,使用朴素贝叶斯算法:
```python
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
# 加载数据
data = pd.read_csv('spam.csv', encoding='latin-1')
data = data[['v1', 'v2']]
data = data.rename(columns={'v1': 'label', 'v2': 'text'})
data['label'] = data['label'].map({'ham': 0, 'spam': 1})
# 分割数据集
X_train, X_test, y_train, y_test = train_test_split(data['text'], data['label'], random_state=0)
# 特征提取
cv = CountVectorizer(stop_words='english')
X_train_cv = cv.fit_transform(X_train)
X_test_cv = cv.transform(X_test)
# 模型训练和预测
nb = MultinomialNB()
nb.fit(X_train_cv, y_train)
y_pred = nb.predict(X_test_cv)
# 输出准确率
accuracy = np.mean(y_pred == y_test)
print(f'Accuracy: {accuracy:.2%}')
```
其中,数据集使用了Kaggle上的[spam.csv](https://www.kaggle.com/uciml/sms-spam-collection-dataset),特征提取使用了CountVectorizer,模型使用了MultinomialNB。
阅读全文