首页tensorflow进行文本分类 python代码

tensorflow进行文本分类 python代码

时间: 2023-07-24 08:25:47 浏览: 95

人工智能-循环神经网络-CNN-RNN中文文本分类，基于TensorFlow

环境 Python 2/3 (感谢howie.hu调试Python2环境) TensorFlow 1.3以上 numpy scikit-learn scipy 数据集使用THUCNews的一个子集进行训练与测试，数据集请自行到THUCTC：一个高效的中文文本分类工具包下载，请遵循数据提供方的开源协议。本次训练使用了其中的10个分类，每个分类6500条数据。

以下是使用TensorFlow进行文本分类的Python代码示例： ```python import tensorflow as tf from tensorflow.keras import layers # 准备数据 train_text = ['example text 1', 'example text 2', 'example text 3'] train_labels = [0, 1, 1] # 标签为0或1 # 创建tokenizer tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=10000) tokenizer.fit_on_texts(train_text) # 将文本转换为序列 train_sequences = tokenizer.texts_to_sequences(train_text) # 对序列进行填充 train_padded_sequences = tf.keras.preprocessing.sequence.pad_sequences(train_sequences, maxlen=50, padding='post') # 构建模型 model = tf.keras.Sequential([ layers.Embedding(10000, 16, input_length=50), layers.GlobalAveragePooling1D(), layers.Dense(16, activation='relu'), layers.Dense(1, activation='sigmoid') ]) # 编译模型 model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # 训练模型 model.fit(train_padded_sequences, train_labels, epochs=10) # 预测新数据 test_text = ['example text 4'] test_sequences = tokenizer.texts_to_sequences(test_text) test_padded_sequences = tf.keras.preprocessing.sequence.pad_sequences(test_sequences, maxlen=50, padding='post') predictions = model.predict(test_padded_sequences) ``` 这个代码示例使用了一个简单的文本分类模型，包括Embedding层，GlobalAveragePooling1D层和两个Dense层。你可以根据自己的需求修改模型结构和超参数。

阅读全文