决策树实现新闻分类的代码
时间: 2024-06-22 19:03:41 浏览: 129
决策树在新闻分类任务中是一种常用的机器学习算法,它可以依据文本特征自动划分新闻类别。以下是一个简单的Python示例,使用scikit-learn库实现决策树分类器对新闻进行分类:
```python
# 导入所需的库
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
# 假设你已经有了一个名为news_data的数据集,包含'content'列(新闻内容)和'label'列(类别标签)
data = pd.read_csv('news_dataset.csv')
# 数据预处理(例如,去除停用词、分词等)
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(data['content']) # 文本转换为向量
y = data['label'] # 类别标签
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 创建并训练决策树模型
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
# 预测测试集
y_pred = clf.predict(X_test)
# 评估模型性能
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100}%")
# 如果你想展示决策树结构,可以这样:
import graphviz
dot_data = tree.export_graphviz(clf, out_file=None, feature_names=vectorizer.get_feature_names(), class_names=data['label'].unique(), filled=True, rounded=True)
graphviz.Source(dot_data).view()
```
阅读全文