python代码实现编程实现:朴素贝叶斯垃圾短信息分类。
时间: 2023-08-04 20:08:10 浏览: 164
下面是Python代码实现朴素贝叶斯垃圾短信息分类的示例:
```python
import pandas as pd
import numpy as np
import re
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
# 加载数据
data = pd.read_csv('spam.csv', encoding='latin-1')
data = data[['v1', 'v2']]
data.columns = ['label', 'message']
# 数据预处理
data['message'] = data['message'].apply(lambda x: re.sub('[^a-zA-Z]', ' ', x))
data['message'] = data['message'].apply(lambda x: x.lower())
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(data['message'], data['label'], test_size=0.2, random_state=42)
# 特征提取
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()
X_train_cv = cv.fit_transform(X_train)
X_test_cv = cv.transform(X_test)
# 朴素贝叶斯分类
nb = MultinomialNB()
nb.fit(X_train_cv, y_train)
y_pred = nb.predict(X_test_cv)
# 模型评估
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
```
其中,代码使用了sklearn库中的MultinomialNB朴素贝叶斯分类器,同时使用了CountVectorizer将文本数据转换成向量形式。代码中使用了spam.csv数据集,可以根据实际情况修改数据集路径和数据预处理方式。
阅读全文