使用双向LSTM训练词向量的代码
时间: 2023-10-24 18:16:30 浏览: 153
双向LSTM-CRF和ELMo用于命名实体识别,词性标记等。-Python开发
以下是使用双向LSTM训练词向量的Python代码示例:
```python
import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, Bidirectional, LSTM, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
# 构建模型
def build_model(vocab_size, embedding_dim, lstm_units):
inputs = Input(shape=(None,))
x = Embedding(vocab_size, embedding_dim)(inputs)
x = Bidirectional(LSTM(lstm_units))(x)
outputs = Dense(embedding_dim, activation='linear')(x)
model = Model(inputs=inputs, outputs=outputs)
return model
# 训练词向量
def train_word_embeddings(texts, embedding_dim=100, lstm_units=128, batch_size=64, epochs=10):
# 构建 Tokenizer
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
# 构建训练数据
sequences = tokenizer.texts_to_sequences(texts)
padded_sequences = pad_sequences(sequences)
# 构建模型
model = build_model(len(tokenizer.word_index) + 1, embedding_dim, lstm_units)
model.compile(loss='mse', optimizer='adam')
# 训练模型
model.fit(padded_sequences, padded_sequences, batch_size=batch_size, epochs=epochs)
# 获取词向量
embeddings = model.layers[1].get_weights()[0]
word_index = tokenizer.word_index
word_embeddings = {word: embeddings[idx] for word, idx in word_index.items()}
return word_embeddings
```
使用方法:
```python
texts = ['I like to eat apple', 'He hates to eat banana', 'She loves to eat orange']
word_embeddings = train_word_embeddings(texts)
```
其中`texts`为训练文本,`word_embeddings`为训练得到的词向量。
阅读全文