基于全局参数共享的文本级图神经网络文本分类

160 浏览量更新于2024-08-26 收藏 1.07MB PDF 举报

"这篇研究论文探讨了如何使用文本级图神经网络进行文本分类，通过构建针对每个输入文本的全局参数共享图，解决了基于GNN方法的在线测试和高内存消耗问题。" 在自然语言处理领域，图神经网络（Graph Neural Networks, GNN）已经成为一种强大的工具，尤其在处理复杂结构和保持全局信息方面表现优异。近年来，GNN技术已经被广泛应用于文本分类任务，因为它能够捕捉文本中的语义关系和依赖性。然而，基于GNN的早期方法通常面临一些实际挑战，如固定语料库级别的图结构不支持在线测试，以及对内存消耗高的问题。论文"TextLevel Graph Neural Network for Text Classiﬁcation"由Lianzhe Huang、Dehong Ma、Sujian Li、Xiaodong Zhang和Houfeng WANG共同撰写，发布于2019年自然语言处理的国际联合会议。作者们提出了一种新的GNN模型，该模型针对每个输入的文本构建图，并且采用全局参数共享，而不是为整个语料库构建单一的图。这种方法消除了不同文本之间依赖性的负担，使得模型更适应在线测试的需求，同时减少了内存消耗。传统的GNN模型通常需要预先构建一个基于整个语料库的图，这在处理大规模数据时可能导致计算效率低和内存需求大。而新的模型则可以针对每个独立文本实例动态构建图，这样不仅能够更好地适应文本的多样性，还能够在保持GNN的优势的同时，降低系统运行的资源需求。在模型的设计上，每个文本被表示为一个图，其中节点代表词汇或更高层次的语义单元，边则表示词汇间的关联。通过GNN的多层传播过程，节点间的信息得以传递和聚合，最终得到的图表示可以用于文本分类任务。由于全局参数共享，模型在新文本上的应用变得更为便捷，不需要重新计算整个图结构，从而提高了推理速度。此外，论文可能还详细讨论了实验设置、性能评估和与其他方法的对比，以验证所提模型的有效性和优势。可能的实验结果会包括在多个标准文本分类数据集上的准确率、召回率和F1分数等指标，进一步证明了该模型在处理文本分类任务时的优越性能。这篇研究论文为解决GNN在文本分类中的局限性提供了一个创新的解决方案，推动了GNN在自然语言处理领域的应用和发展。通过构建文本级的图并实现全局参数共享，模型能够更高效地处理在线测试，降低了资源需求，为未来的研究和实践提供了新的思路。

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing

and the 9th International Joint Conference on Natural Language Processing, pages 3442–3448,

Hong Kong, China, November 3–7, 2019.

2019 Association for Computational Linguistics

3442

Text Level Graph Neural Network for Text Classiﬁcation

Lianzhe Huang, Dehong Ma, Sujian Li, Xiaodong Zhang and Houfeng WANG

MOE Key Lab of Computational Linguistics, Peking University, Beijing, 100871, China

{hlz, madehong, lisujian, zxdcs, wanghf}@pku.edu.cn

Abstract

Recently, researches have explored the graph

neural network (GNN) techniques on text clas-

siﬁcation, since GNN does well in handling

complex structures and preserving global in-

formation. However, previous methods based

on GNN are mainly faced with the practical

problems of ﬁxed corpus level graph structure

which do not support online testing and high

memory consumption. To tackle the problems,

we propose a new GNN based model that

builds graphs for each input text with global

parameters sharing instead of a single graph

for the whole corpus. This method removes

the burden of dependence between an individ-

ual text and entire corpus which support online

testing, but still preserve global information.

Besides, we build graphs by much smaller

windows in the text, which not only extract

more local features but also signiﬁcantly re-

duce the edge numbers as well as memory con-

sumption. Experiments show that our model

outperforms existing models on several text

classiﬁcation datasets even with consuming

less memory.

1 Introduction

Text classiﬁcation is a fundamental problem of

natural language processing (NLP), which has lots

of applications like SPAM detection, news ﬁlter-

ing, and so on (Jindal and Liu, 2007; Aggarwal

and Zhai, 2012). The essential step for text classi-

ﬁcation is text representation learning.

With the development of deep learning, neu-

ral networks like Convolutional Neural Net-

works (CNN) (Kim, 2014) and Recurrent Neu-

ral Networks (RNN) (Hochreiter and Schmidhu-

ber, 1997) have been employed for text repre-

sentation. Recently, a new kind of neural net-

work named Graph Neural Network (GNN) has

attracted wide attention (Battaglia et al., 2018).

GNN was ﬁrst proposed in (Scarselli et al., 2009)

and has been used in many tasks in NLP includ-

ing text classiﬁcation (Defferrard et al., 2016), se-

quence labeling (Zhang et al., 2018a), neural ma-

chine translation (Bastings et al., 2017), and rela-

tional reasoning (Battaglia et al., 2016). Deffer-

rard et al. (2016) ﬁrst employed Graph Convolu-

tional Neural Network (GCN) in text classiﬁca-

tion task and outperformed the traditional CNN

models. Further, Yao et al. (2019) improved Def-

ferrard et al. (2016)’s work by applying article

nodes and weighted edges in the graph, and their

model outperformed the state-of-the-art text clas-

siﬁcation methods.

However, these GNN-based models usually

adopt the way of building one graph for the whole

corpus, which causes the following problems in

practice. First, high memory consumption is re-

quired due to numerous edges. Because this kind

of methods build a single graph for the whole cor-

pus and use edges with ﬁxed weights, which con-

siderably limits the expression ability of edges,

they have to use a large connection window to get

a global representation. Second, it is difﬁcult for

this kind of models to conduct the online test, be-

cause the structure and parameters of their graph

are dependent on the corpus and cannot be modi-

ﬁed after training.

To address the above problems, we propose a

new GNN based method for text classiﬁcation. In-

stead of building a single corpus level graph, we

produce a text level graph for each input text. For

a text level graph, we connect word nodes within a

reasonably small window in the text rather than di-

rectly fully connect all the word nodes. The repre-

sentations of the same nodes and weights of edges

are shared globally and can be updated in the text

level graphs through a massage passing mecha-

nism, where a node takes in the information from

neighboring nodes to update its representation. Fi-

nally, we summarize the representations of all the

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38663526

粉丝: 3
资源: 940

基于全局参数共享的文本级图神经网络文本分类

广元市广播电视环网建设及市本级有线电视网络项目.docx

广元市广播电视环网建设及市本级有线电视网络项目.doc

项目管理项目报告广元市广播电视环网建设及市本级有线电视网络项目.pdf

mysq 查询条件本级及下级 csdn

帮我用VBA编写遍历全文档所有带本级两个字的表格，把这些表格里面本级所在单元格往下数第15行，把这行合并

支持在网络不通情况下，本级以文件包形式导出上报数据包时的注意事项

通过SQL怎么查询本级及下级单位的数据

java代码操作本级软件

用VBA帮我编写遍历含有本级的单元格且含有多个合并单元格的表格，选择该表格根据需要数量复制数量

最新资源