深度多任务学习：共享记忆增强模型

需积分: 13 161 浏览量更新于2024-09-10 1 收藏 410KB PDF 举报

"Deep Multi-Task Learning with Shared Memory" 是一篇研究论文，主要探讨了如何利用深度学习进行多任务学习，以提高各个任务的性能。该论文提出了两种新的深度架构，这些架构允许多个任务之间共享一个外部记忆，从而更有效地利用相关任务的信息。在传统的神经网络模型中，每个任务通常独立训练，基于单一任务的监督目标，这可能导致训练数据不足的问题。然而，多任务学习旨在通过同时学习多个相关的任务来克服这一问题，它能挖掘不同任务之间的共性，提高模型的泛化能力。论文中提到的两种新架构就是为了解决这个问题。第一种架构可能是将共享层添加到神经网络中，这些共享层可以提取共同的特征，然后通过特定的任务层进行任务特定的处理。这种设计使得模型能够从多个任务中学习通用的表示，而不仅仅是单个任务的表示。第二种架构引入了一个外部共享记忆模块。这个记忆模块可以看作是一种动态的知识库，它能存储来自不同任务的信息，并在需要时被各个任务访问和更新。通过这种方式，模型可以利用其他相关任务的上下文信息，增强其对当前任务的理解和预测能力。在文本分类任务的实验中，这两种架构都显示出了优越的性能。通过与其他相关任务共享学习到的知识，一个任务的性能得到了提升，这证明了共享记忆和多任务学习的有效性。多任务学习在自然语言处理（NLP）领域具有广泛的应用，例如情感分析、命名实体识别、机器翻译等。通过共享记忆，模型可以更好地理解语言的通用模式，同时适应不同的任务需求。这种方法不仅提高了模型的效率，还可能降低对大量标注数据的依赖，这对于数据稀缺的任务尤其有价值。 "Deep Multi-Task Learning with Shared Memory" 提出了一种创新的方法，通过在多任务环境中利用共享记忆，增强了神经网络模型的学习能力和泛化性能，对于推动NLP领域的进步有着重要的意义。

Deep Multi-Task Learning with Shared Memory

Pengfei Liu Xipeng Qiu

∗

Xuanjing Huang

Shanghai Key Laboratory of Intelligent Information Processing, Fudan University

School of Computer Science, Fudan University

825 Zhangheng Road, Shanghai, China

{pﬂiu14,xpqiu,xjhuang}@fudan.edu.cn

Abstract

Neural network based models have achieved

impressive results on various speciﬁc tasks.

However, in previous works, most models are

learned separately based on single-task su-

pervised objectives, which often suffer from

insufﬁcient training data. In this paper, we

propose two deep architectures which can be

trained jointly on multiple related tasks. More

speciﬁcally, we augment neural model with an

external memory, which is shared by several

tasks. Experiments on two groups of text clas-

siﬁcation tasks show that our proposed archi-

tectures can improve the performance of a task

with the help of other related tasks.

1 Introduction

Neural network based models have been shown to

achieved impressive results on various NLP tasks ri-

valing or in some cases surpassing traditional mod-

els, such as text classiﬁcation (Kalchbrenner et al.,

2014; Socher et al., 2013; Liu et al., 2015a), seman-

tic matching (Hu et al., 2014; Liu et al., 2016a),

parser (Chen and Manning, 2014) and machine

translation (Bahdanau et al., 2014).

Usually, due to the large number of parameters

these neural models need a large-scale corpus. It is

hard to train a deep neural model that generalizes

well with size-limited data, while building the large

scale resources for some NLP tasks is also a chal-

lenge. To overcome this problem, these models often

involve an unsupervised pre-training phase. The ﬁ-

nal model is ﬁne-tuned on speciﬁc task with respect

∗

Corresponding author.

to a supervised training criterion. However, most

pre-training methods are based on unsupervised ob-

jectives (Collobert et al., 2011; Turian et al., 2010;

Mikolov et al., 2013), which is effective to improve

the ﬁnal performance, but it does not directly opti-

mize the desired task.

Multi-task learning is an approach to learn multi-

ple related tasks simultaneously to signiﬁcantly im-

prove performance relative to learning each task in-

dependently. Inspired by the success of multi-task

learning (Caruana, 1997), several neural network

based models (Collobert and Weston, 2008; Liu et

al., 2015b) are proposed for NLP tasks, which uti-

lized multi-task learning to jointly learn several tasks

with the aim of mutual beneﬁt. The characteristic

of these multi-task architectures is they share some

lower layers to determine common features. After

the shared layers, the remaining layers are split into

multiple speciﬁc tasks.

In this paper, we propose two deep architectures

of sharing information among several tasks in multi-

task learning framework. All the related tasks are in-

tegrated into a single system which is trained jointly.

More speciﬁcally, inspired by Neural Turing Ma-

chine (NTM) (Graves et al., 2014) and memory

network (Sukhbaatar et al., 2015), we equip task-

speciﬁc long short-term memory (LSTM) neural

network (Hochreiter and Schmidhuber, 1997) with

an external shared memory. The external memory

has capability to store long term information and

knowledge shared by several related tasks. Different

with NTM, we use a deep fusion strategy to integrate

the information from the external memory into task-

speciﬁc LSTM, in which a fusion gate controls the

arXiv:1609.07222v1 [cs.CL] 23 Sep 2016

下载后可阅读完整内容，剩余9页未读，立即下载

Coco_wjy

粉丝: 96
资源: 2

深度多任务学习：共享记忆增强模型

多任务学习

多任务学习multitask learning.pdf

multi_task_learning:多任务功能学习

MATLAB Matrix Parallel Computing: Leveraging Multi-core Advantages to Boost Computing Speed, A Three...

Deep Learning Model Compression Techniques: How to Reduce Model Size While Maintaining Performance

Time Series Forecasting with Ensemble Learning: Expert Guide to Enhancing Accuracy

Comparison of OpenCV with Python Versions in Medical Image Analysis: Accuracy and Efficiency, ...

几篇CVPR关于multi-task的整理

yolov10预训练模型.rar

Linux Socket编程、IO模型及进程间通信的完整实用案例

最新资源