基于决策树算法帮我写一份垃圾邮件分类的代码
时间: 2024-05-15 18:18:23 浏览: 10
好的,以下是一个简单的基于决策树算法的垃圾邮件分类代码示例:
```python
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
# 加载数据集
data = pd.read_csv('spam.csv', encoding='latin-1')
data.drop(['Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4'], axis=1, inplace=True)
data.rename(columns={'v1':'label', 'v2':'text'}, inplace=True)
# 分割训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(data['text'], data['label'], test_size=0.2)
# 特征提取
vectorizer = CountVectorizer()
X_train = vectorizer.fit_transform(X_train)
X_test = vectorizer.transform(X_test)
# 训练模型
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
# 评估模型
accuracy = clf.score(X_test, y_test)
print('Accuracy:', accuracy)
# 进行预测
text = 'You have won $100000! Click the link to claim your prize!'
text = vectorizer.transform([text])
if clf.predict(text) == 'ham':
print('This is a legitimate email.')
else:
print('This is a spam email.')
```
这段代码使用scikit-learn库中的CountVectorizer类将邮件文本转化为特征向量,并使用决策树分类器进行分类。您可以将代码中的文件名'spam.csv'替换为您所使用的数据集的文件名,然后运行代码,即可进行训练、测试和分类。