python代码实现基于神经网络的thucnnews数据集文本分类
时间: 2024-02-22 07:55:08 浏览: 94
好的,以下是使用神经网络对THUCNews数据集进行文本分类的Python代码实现:
```python
import os
import numpy as np
import pandas as pd
import jieba
import keras
from keras.layers import Dense, Embedding, LSTM, Dropout
from keras.models import Sequential
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
# 读取数据集
data_path = "/path/to/data"
df = pd.read_csv(os.path.join(data_path, "THUCNews.csv"))
# 中文分词
df["content"] = df["content"].apply(lambda x: " ".join(jieba.cut(x)))
# 构建词典
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(df["content"])
# 将文本转换为数字序列
X = tokenizer.texts_to_sequences(df["content"])
X = pad_sequences(X, maxlen=100)
# 将标签转换为one-hot编码
y = pd.get_dummies(df["category"]).values
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 构建模型
model = Sequential()
model.add(Embedding(input_dim=5000, output_dim=32, input_length=100))
model.add(LSTM(64, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(10, activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
# 训练模型
model.fit(X_train, y_train, batch_size=128, epochs=10, validation_data=(X_test, y_test))
# 评估模型
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print("Test accuracy: {:.4f}".format(acc))
```
这里使用了LSTM作为模型的基础结构,使用了Embedding层将文本转换为向量,然后使用全连接层输出分类结果。在训练过程中使用了dropout防止过拟合。最后评估模型的准确率。
阅读全文
相关推荐
![-](https://img-home.csdnimg.cn/images/20241231045053.png)
![-](https://img-home.csdnimg.cn/images/20241231045053.png)
![-](https://img-home.csdnimg.cn/images/20241231044955.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)