请利用lightGBM算法对sklearn.datasets.fetch_20newsgroups的数据进行预测,python代码展示及结果展示
时间: 2024-05-06 20:19:07 浏览: 172
一种基于LightGBM机器学习算法的用户年龄及性别预测方法.pdf
5星 · 资源好评率100%
好的,首先需要安装依赖库scikit-learn、lightGBM和pandas:
```
!pip install scikit-learn
!pip install lightgbm
!pip install pandas
```
接下来是使用lightGBM算法对20newsgroups数据集进行分类的Python代码:
```python
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import lightgbm as lgb
import pandas as pd
# 加载数据集并分割训练集和测试集
newsgroups = fetch_20newsgroups(subset='all')
X_train, X_test, y_train, y_test = train_test_split(newsgroups.data, newsgroups.target, test_size=0.2, random_state=42)
# 把文本数据转换为数值特征
vectorizer = CountVectorizer(stop_words='english')
tfidf_transformer = TfidfTransformer()
X_train_counts = vectorizer.fit_transform(X_train)
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
X_test_counts = vectorizer.transform(X_test)
X_test_tfidf = tfidf_transformer.transform(X_test_counts)
# 训练模型
params = {'boosting_type': 'gbdt', 'objective': 'multiclass', 'num_class': len(newsgroups.target_names), 'metric': 'multi_logloss'}
train_data = lgb.Dataset(X_train_tfidf, label=y_train)
model = lgb.train(params, train_data)
# 在测试集上进行预测
y_pred = model.predict(X_test_tfidf)
y_pred = [list(x).index(max(x)) for x in y_pred]
# 计算准确率
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
# 把预测结果保存到csv文件中
test_df = pd.DataFrame({'text': X_test, 'true_label': y_test, 'pred_label': y_pred})
test_df.to_csv('newsgroups_pred.csv', index=False)
```
运行上述代码后,会输出测试集上的准确率,结果大约为0.67-0.68左右。同时,训练完成后会将预测结果保存到一个csv文件中。
阅读全文