Universal Sentence Encoder
Daniel Cer
a
, Yinfei Yang
a
, Sheng-yi Kong
a
, Nan Hua
a
, Nicole Limtiaco
b
,
Rhomni St. John
a
, Noah Constant
a
, Mario Guajardo-C
´
espedes
a
, Steve Yuan
c
,
Chris Tar
a
, Yun-Hsuan Sung
a
, Brian Strope
a
, Ray Kurzweil
a
a
Google Research
Mountain View, CA
b
Google Research
New York, NY
c
Google
Cambridge, MA
Abstract
We present models for encoding sentences
into embedding vectors that specifically
target transfer learning to other NLP tasks.
The models are efficient and result in
accurate performance on diverse transfer
tasks. Two variants of the encoding mod-
els allow for trade-offs between accuracy
and compute resources. For both vari-
ants, we investigate and report the rela-
tionship between model complexity, re-
source consumption, the availability of
transfer task training data, and task perfor-
mance. Comparisons are made with base-
lines that use word level transfer learning
via pretrained word embeddings as well
as baselines do not use any transfer learn-
ing. We find that transfer learning using
sentence embeddings tends to outperform
word level transfer. With transfer learn-
ing via sentence embeddings, we observe
surprisingly good performance with min-
imal amounts of supervised training data
for a transfer task. We obtain encourag-
ing results on Word Embedding Associ-
ation Tests (WEAT) targeted at detecting
model bias. Our pre-trained sentence en-
coding models are made freely available
for download and on TF Hub.
1 Introduction
Limited amounts of training data are available for
many NLP tasks. This presents a challenge for
data hungry deep learning methods. Given the
high cost of annotating supervised training data,
very large training sets are usually not available
for most research or industry NLP tasks. Many
models address the problem by implicitly per-
forming limited transfer learning through the use
Figure 1: Sentence similarity scores using embed-
dings from the universal sentence encoder.
of pre-trained word embeddings such as those
produced by word2vec (Mikolov et al., 2013) or
GloVe (Pennington et al., 2014). However, recent
work has demonstrated strong transfer task per-
formance using pre-trained sentence level embed-
dings (Conneau et al., 2017).
In this paper, we present two models for produc-
ing sentence embeddings that demonstrate good
transfer to a number of other of other NLP tasks.
We include experiments with varying amounts of
transfer task training data to illustrate the relation-
ship between transfer task performance and train-
ing set size. We find that our sentence embeddings
can be used to obtain surprisingly good task per-
formance with remarkably little task specific train-
ing data. The sentence encoding models are made
publicly available on TF Hub.
Engineering characteristics of models used for
transfer learning are an important consideration.
We discuss modeling trade-offs regarding mem-
ory requirements as well as compute time on CPU
and GPU. Resource consumption comparisons are
made for sentences of varying lengths.
arXiv:1803.11175v2 [cs.CL] 12 Apr 2018