2.1 Unsupervised Feature-beased Approaches
Learning widely applicable representations of words has been an active area of
research for decades, including non-neural (Brown et al., 1992; Ando and Zhang, 2005;
Blitzer et al., 2006) and neural (Mikolov et al., 2013; Pennington et al., 2014) methods.
Pre-trained word embeddings are an integral part of modern NLP systems, offering
significant improvements over embeddings learned from scratch (Turian et al., 2010). To
pre-train word embedding vectors, left-to-right language modeling objectives have been
used (Mnih and Hinton, 2009), as well as objectives to discriminate correct from
incorrect words in left and right context (Mikolov et al., 2013).
几十年来,学习广泛适用的单词表示一直是一个活跃的研究领域,包括非神经
(Brown et al., 1992; Ando and Zhang, 2005; Blitzer et al., 2006)和神经(Mikolov et
al., 2013) ; Pennington et al., 2014) 方法。预训练的词嵌入是现代 NLP 系统不可或
缺的一部分,与从头开始学习的嵌入相比,提供了显着的改进(Turian 等人,
2010)。为了预训练词嵌入向量,使用了从左到右的语言建模目标(Mnih 和 Hinton,
2009),以及在左右上下文中区分正确单词和不正确单词的目标(Mikolov 等,
2013)。
These approaches have been generalized to coarser granularities, such as sentence
embeddings (Kiros et al., 2015; Logeswaran and Lee, 2018) or paragraph embeddings
(Le and Mikolov, 2014). To train sentence representations, prior work has used objectives
to rank candidate next sentences (Jernite et al., 2017; Logeswaran and Lee, 2018), left-to-
right generation of next sentence words given a representation of the previous sentence
(Kiros et al., 2015), or denoising autoencoder derived objectives (Hill et al., 2016).
这些方法已被推广到更粗粒度的方法,例如句子嵌入(Kiros et al., 2015;
Logeswaran and Lee, 2018)或段落嵌入(Le and Mikolov, 2014)。为了训练句子表
示,先前的工作使用目标来对候选下一个句子进行排名(Jernite 等人,2017;
Logeswaran 和 Lee,2018),给定前一个句子的表示,从左到右生成下一个句子单
词(Kiros 等人) al., 2015),或降噪自编码器派生目标 (Hill et al., 2016)。
ELMo and its predecessor (Peters et al., 2017, 2018a) generalize traditional word
embedding research along a different dimension. They extract context-sensitive features
from a left-to-right and a right-to-left language model. The contextual representation of
each token is the concatenation of the left-to-right and right-to-left representations. When
integrating contextual word embeddings with existing task-specific architectures, ELMo
advances the state of the art for several major NLP benchmarks (Peters et al., 2018a)
including question answering (Rajpurkar et al., 2016), sentiment analysis (Socher et al.,
2013), and named entity recognition (Tjong Kim Sang and De Meulder, 2003). Melamud
et al. (2016) proposed learning contextual representations through a task to predict a
single word from both left and right context using LSTMs. Similar to ELMo, their model
is feature-based and not deeply bidirectional. Fedus et al. (2018) shows that the Cloze
task can be used to improve the robustness of text generation models.
ELMo 及其前身 (Peters et al., 2017, 2018a) 将传统的词嵌入研究沿不同的维度
进行了推广。他们从从左到右和从右到左的语言模型中提取上下文相关的特征。每