没有合适的资源?快使用搜索试试~ 我知道了~
首页多模态双线性池在课本答题中的应用
"这篇研究论文探讨了课本答题的专心听讲的多模态双线性池(Essay-Anchor Attentive Multi-modal Bilinear Pooling,EAMB)方法,旨在解决在中学课程中回答复杂问题的任务,即Textbook Question Answering (TQA)。TQA任务不仅需要理解图像,还需要处理长篇幅的文章。传统的双线性模型虽然擅长学习问题与图像之间的高级关联,但在处理长篇文章时效率较低。 论文中提出的EAMB方法创新地将长篇文章编码到问题和图像的联合空间中。通过提取关键词,形成所谓的“论文锚点”(Essay-anchors),这些锚点在潜在空间中代表了文章的信息。为了聚焦于问题中的关键词,论文提出了一种新的网络架构,该架构能特别关注并强调这些问题关键词在多模态数据融合中的作用。 具体来说,EAMB采用了注意力机制,使模型能够对问题中的关键信息给予更多权重,从而更有效地结合图像和文本信息。这种方法有助于提高模型理解和回答复杂问题的能力,尤其是在涉及长篇文章的情境下。通过这样的多模态双线性池化,模型可以捕捉到不同模态数据间的深层交互,同时优化处理长文本的效率。 论文作者包括李掬政、苏航、朱俊以及张博,他们来自清华大学计算机科学与技术系和北京国家信息科学与技术研究中心。研究团队的工作表明,EAMB方法在TQA任务上可能具有显著的优势,为未来在教育领域的智能问答系统提供了新的研究方向和技术支持。" 这篇研究论文深入探讨了多模态学习在解答中学教材问题中的应用,通过创新的EAMB方法解决了长文章理解的难题,对于提升人工智能在教育场景下的智能辅助教学能力具有重要意义。
资源详情
资源推荐
ESSAY-ANCHOR ATTENTIVE MULTI-MODAL BILINEAR POOLING FOR
TEXTBOOK QUESTION ANSWERING
Juzheng Li, Hang Su, Jun Zhu, Bo Zhang
∗
Department of Computer Science and Technology, Tsinghua Lab of Brain and Intelligence
Beijing National Research Center for Information Science and Technology, BNRist Lab
Tsinghua University, 100084, China
lijuzheng09@gmail.com; {suhangss, dcszj, dcszb}@tsinghua.edu.cn
ABSTRACT
Textbook Question Answering (TQA) [1] is a newly proposed
task to answer arbitrary questions in middle school curricula,
which has particular challenges to understand the long essays
in additional to the images. Bilinear models [2, 3] are effec-
tive at learning high-level associations between questions and
images, but are inefficient to handle the long essays. In this
paper, we propose an Essay-anchor Attentive Multi-modal Bi-
linear pooling (EAMB), a novel method to encode the long
essays into the joint space of the questions and images. The
essay-anchors, embedded from the keywords, represent the
essay information in a latent space. We propose a novel net-
work architecture to pay special attention on the keywords
in the questions, consequently encoding the essay informa-
tion into the question features, and thus the joint space with
the images. We then use the bilinear models to extract the
multi-modal interactions to obtain the answers. EAMB suc-
cessfully utilizes the redundancy of the pre-trained word em-
bedding space to represent the essay-anchors. This avoids
the extra learning difficulties from exploiting large network
structures. Quantitative and qualitative experiments show the
outperforming effects of EAMB on the TQA dataset.
Index Terms— Textbook Question Answering, Word
Embedding, Multi-Modal Bilinear Pooling, Attention Mech-
anisms
1. INTRODUCTION
The computer vision community has witnessed a great
progress on the Visual Question Answering (VQA) tasks in
the recent years. With large multi-modal datasets [4, 5] and
methods [4, 6] available, machines are able to answer short
questions with given images. However, VQA tasks are far
from real-world situations. Human answers a question not
∗
The work is supported by the National NSF of China (Nos. 61571261,
61620106010, 61621136008, 61332007, and U1611461), Beijing Natural
Science Foundation (No. L172037), Tsinghua Tiangong Institute for Intelli-
gent Computing and the NVIDIA NVAIL Program, and partially funded by
Microsoft Research Asia and Tsinghua-Intel Joint Research Institute.
Erosion and Deposition
by Flowing Water
How Flowing Water Causes
Erosion and Deposition
Water Speed and Erosion
Particle Size and Erosion
How many actions are
depicted in the diagram?
a. 6
b. 4
c. 8
d. 7
Question
Visual Context
Textual Context
Apply Concepts
Introduction
Lesson Objectives
Lesson Summary
Points to Consider
Question Stem
Options
Right Answer
Title
Subheads
Contents
Supplementary
Materials
Long
Essay
Fig. 1: An example question of the TQA task. It consists of a question stem
and several candidate answers. A textual context is definitely given to explain
the background, including a long essay and some supplementary materials.
We combine the materials into the long essay in this paper. A visual context
usually includes an image.
only by the current scene, but also with abundant background
knowledge. Textbook Question Answering (TQA) is a newly
proposed task that aims to make QA situations closer to
the real world [1]. The TQA dataset is drawn from middle
school curricula. A TQA question consists of a long essay, a
short question stem, an image and several candidate answers
(Fig. 1).
The TQA task is challenging because the multi-modal
context includes the long essays. Recent multi-modal meth-
ods usually encode the visual and textual data into a joint
space to learn their interactions. But for the TQA task, re-
current neural networks (RNNs) are not capable to encode
such long essays. Moreover, the recent progress in attention
or memory mechanisms [6, 1] usually requires to exploit a
large scale of add-on network structures, which will definitely
reduce the learning efficiency.
In this paper, we propose an Essay-anchor Attentive
Multi-modal Bilinear pooling (EAMB) to address the long-
essay issue of the TQA task. EAMB embeds the long essays
into a continuous space represented by the essay-anchors col-
lectively. Each essay-anchor is corresponding to a keyword
978-1-5386-1737-3/18/$31.00
c
2018 IEEE
下载后可阅读完整内容,剩余5页未读,立即下载
weixin_38626075
- 粉丝: 7
- 资源: 925
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- JSP+SSM科研管理系统响应式网站设计案例
- 推荐一款超级好用的嵌入式串口调试工具
- PHP域名多维查询平台:高效精准的域名搜索工具
- Citypersons目标检测数据集:Yolo格式下载指南
- 掌握MySQL面试必备:程序员面试题解析集锦
- C++软件开发培训:核心技术资料深度解读
- SmartSoftHelp二维码工具:生成与解析条形码
- Android Spinner控件自定义字体大小的方法
- Ubuntu Server on Orangepi3 LTS 官方镜像发布
- CP2102 USB驱动程序的安装与更新指南
- ST-link固件升级指南:轻松更新程序步骤
- Java实现的质量管理系统Demo功能分析与操作
- Everything高效文件搜索工具:快速精确定位文件
- 基于B/S架构的酒店预订系统开发实践
- RF_Setting(E22-E90(SL)) V1.0中性版功能解析
- 高效转换M3U8到MP4:免费下载工具发布
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功