基于朴素贝叶斯实现垃圾短信分类
时间: 2023-05-18 21:06:51 浏览: 83
可以使用以下代码实现基于朴素贝叶斯的垃圾短信分类:
```python
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
# 垃圾短信数据集
messages = [
"Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005.",
"URGENT! You have won a 1 week FREE membership in our $100,000 Prize Jackpot!",
"Hi, this is John from XYZ Corp. We would like to offer you a job interview.",
"Congratulations! You have been selected to receive a free gift. Please click the link to claim.",
"Your package has been delivered. Please confirm receipt by replying to this message."
]
# 标签:0表示正常短信,1表示垃圾短信
labels = [1, 1, 0, 1, 0]
# 将文本转换为向量
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(messages)
# 训练朴素贝叶斯分类器
clf = MultinomialNB()
clf.fit(X, labels)
# 预测新的短信是否为垃圾短信
new_messages = ["You have won a free iPhone. Click the link to claim."]
new_X = vectorizer.transform(new_messages)
predicted_labels = clf.predict(new_X)
print(predicted_labels)
```
输出结果为 `[1]`,表示新的短信被分类为垃圾短信。