首页使用双向LSTM训练词向量的代码

使用双向LSTM训练词向量的代码

时间: 2023-10-24 22:16:30 浏览: 157

以下是使用双向LSTM训练词向量的Python代码示例： ```python import tensorflow as tf from tensorflow.keras.layers import Input, Embedding, Bidirectional, LSTM, Dense from tensorflow.keras.models import Model from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences # 构建模型 def build_model(vocab_size, embedding_dim, lstm_units): inputs = Input(shape=(None,)) x = Embedding(vocab_size, embedding_dim)(inputs) x = Bidirectional(LSTM(lstm_units))(x) outputs = Dense(embedding_dim, activation='linear')(x) model = Model(inputs=inputs, outputs=outputs) return model # 训练词向量 def train_word_embeddings(texts, embedding_dim=100, lstm_units=128, batch_size=64, epochs=10): # 构建 Tokenizer tokenizer = Tokenizer() tokenizer.fit_on_texts(texts) # 构建训练数据 sequences = tokenizer.texts_to_sequences(texts) padded_sequences = pad_sequences(sequences) # 构建模型 model = build_model(len(tokenizer.word_index) + 1, embedding_dim, lstm_units) model.compile(loss='mse', optimizer='adam') # 训练模型 model.fit(padded_sequences, padded_sequences, batch_size=batch_size, epochs=epochs) # 获取词向量 embeddings = model.layers[1].get_weights()[0] word_index = tokenizer.word_index word_embeddings = {word: embeddings[idx] for word, idx in word_index.items()} return word_embeddings ``` 使用方法： ```python texts = ['I like to eat apple', 'He hates to eat banana', 'She loves to eat orange'] word_embeddings = train_word_embeddings(texts) ``` 其中`texts`为训练文本，`word_embeddings`为训练得到的词向量。

阅读全文