Deep Multi-Task Learning with Shared Memory
Pengfei Liu Xipeng Qiu
∗
Xuanjing Huang
Shanghai Key Laboratory of Intelligent Information Processing, Fudan University
School of Computer Science, Fudan University
825 Zhangheng Road, Shanghai, China
{pfliu14,xpqiu,xjhuang}@fudan.edu.cn
Abstract
Neural network based models have achieved
impressive results on various specific tasks.
However, in previous works, most models are
learned separately based on single-task su-
pervised objectives, which often suffer from
insufficient training data. In this paper, we
propose two deep architectures which can be
trained jointly on multiple related tasks. More
specifically, we augment neural model with an
external memory, which is shared by several
tasks. Experiments on two groups of text clas-
sification tasks show that our proposed archi-
tectures can improve the performance of a task
with the help of other related tasks.
1 Introduction
Neural network based models have been shown to
achieved impressive results on various NLP tasks ri-
valing or in some cases surpassing traditional mod-
els, such as text classification (Kalchbrenner et al.,
2014; Socher et al., 2013; Liu et al., 2015a), seman-
tic matching (Hu et al., 2014; Liu et al., 2016a),
parser (Chen and Manning, 2014) and machine
translation (Bahdanau et al., 2014).
Usually, due to the large number of parameters
these neural models need a large-scale corpus. It is
hard to train a deep neural model that generalizes
well with size-limited data, while building the large
scale resources for some NLP tasks is also a chal-
lenge. To overcome this problem, these models often
involve an unsupervised pre-training phase. The fi-
nal model is fine-tuned on specific task with respect
∗
Corresponding author.
to a supervised training criterion. However, most
pre-training methods are based on unsupervised ob-
jectives (Collobert et al., 2011; Turian et al., 2010;
Mikolov et al., 2013), which is effective to improve
the final performance, but it does not directly opti-
mize the desired task.
Multi-task learning is an approach to learn multi-
ple related tasks simultaneously to significantly im-
prove performance relative to learning each task in-
dependently. Inspired by the success of multi-task
learning (Caruana, 1997), several neural network
based models (Collobert and Weston, 2008; Liu et
al., 2015b) are proposed for NLP tasks, which uti-
lized multi-task learning to jointly learn several tasks
with the aim of mutual benefit. The characteristic
of these multi-task architectures is they share some
lower layers to determine common features. After
the shared layers, the remaining layers are split into
multiple specific tasks.
In this paper, we propose two deep architectures
of sharing information among several tasks in multi-
task learning framework. All the related tasks are in-
tegrated into a single system which is trained jointly.
More specifically, inspired by Neural Turing Ma-
chine (NTM) (Graves et al., 2014) and memory
network (Sukhbaatar et al., 2015), we equip task-
specific long short-term memory (LSTM) neural
network (Hochreiter and Schmidhuber, 1997) with
an external shared memory. The external memory
has capability to store long term information and
knowledge shared by several related tasks. Different
with NTM, we use a deep fusion strategy to integrate
the information from the external memory into task-
specific LSTM, in which a fusion gate controls the
arXiv:1609.07222v1 [cs.CL] 23 Sep 2016