A Unified Architecture for Natural Language Processing:
Deep Neural Networks with Multitask Learning
Ronan Collobert collober@nec-labs.com
Jason Weston jasonw@nec-labs.com
NEC Labs America, 4 Independence Way, Princeton, NJ 08540 USA
Abstract
We describe a single convolutional neural net-
work architecture that, given a sentence, out-
puts a host of language processing predic-
tions: part-of-speech tags, chunks, named en-
tity tags, semantic roles, semantically similar
words and the likelihood that the sentence
makes sense (grammatically and semanti-
cally) using a language model. The entire
network is trained jointly on all these tasks
using weight-sharing, an instance of multitask
learning. All the tasks use labeled data ex-
cept the language model which is learnt from
unlab e led text and represents a novel form of
semi-supervised learning for the shared tasks.
We show how both multitask learning and
semi-supervised learning improve the general-
ization of the shared tasks, resulting in state-
of-the-art performance.
1. Introduction
The field of Natural Language Processing (NLP) aims
to convert human language into a formal representa-
tion that is easy for computers to manipulate. Current
end applications include information extraction, ma-
chine translation, summarization, search and human-
computer interfaces.
While complete semantic understanding is still a far-
distant goal, researchers have taken a divide and con-
quer approach and identified several sub-tasks useful
for application development and analysis. These range
from the syntactic, such as part-of-speech tagging,
chunking and parsing, to the semantic, such as word-
sense disambiguation, semantic-role labeling, named
entity extraction and anaphora resolution.
App earing in Proceedings of the 25
th
International Confer-
ence on Machine Learning, Helsinki, Finland, 2008. Copy-
right 2008 by the author(s)/owner(s).
Currently, most research analyzes those tasks sepa-
rately. Many systems possess few characteristics that
would help develop a unified architecture which would
presumably be necessary for deeper s em antic tasks. In
particular, many systems possess three failings in this
regard: (i) they are shallow in the sense that the clas-
sifier is often linear, (ii) for good performance with
a linear classifier they must incorporate many hand-
engineered features specific for the task; and (iii) they
cascade features learnt separately from other tasks,
thus propagating errors.
In this work we attempt to define a unified architecture
for Natural Language Processing that learns features
that are relevant to the tasks at hand given very lim-
ited prior knowledge. This is achieved by training a
deep neural network, building upon work by (Bengio &
Ducharme, 2001) and (Collobert & Weston, 2007). We
define a rather general convolutional network architec-
ture and describe its application to many well known
NLP tasks including part-of-speech tagging, chunking,
named-entity recognition, learning a language model
and the task of semantic role-labeling.
All of these tasks are integrated into a single system
which is trained jointly. All the tasks except the lan-
guage model are supervised tasks with labeled training
data. The language model is trained in an unsuper-
vised fashion on the entire Wikip e dia website. Train-
ing this task jointly with the other tasks comprises a
novel form of semi-supervised learning.
We focus on, in our opinion, the most difficult of
these tasks: the semantic role-labeling problem. We
show that both (i) multitask learning and (ii) semi-
supervised learning significantly improve performance
on this task in the absence of hand-engineered features.
We also show how the combined tasks, and in par-
ticular the unsupervised task, learn powerful features
with clear semantic information given no human su-
pervision other than the (labeled) data from the tasks
(see Table 1).