递归正则化深度图CNN：大规模多层级文本分类新策略

8 浏览量更新于2024-08-26 收藏 487KB PDF 举报

递归正则化深度图-CNN（Graph-CNN）在大规模分层文本分类中的应用是当前研究领域的热点。传统的文本分类方法通常依赖于词袋模型（Bag-of-Words），这种方法在处理具有广泛主题层次结构的文本时可能会遇到挑战，因为单一的词汇表示可能无法捕捉到不同层级语义的复杂性。随着深度学习技术的发展，特别是卷积神经网络（CNN）在图像识别中的成功，人们开始探索如何将其应用于文本数据，以挖掘更深层次的语义特征。本文旨在探讨如何将深度图-CNN用于大规模的分层文本分类任务。作者提出了一种新颖的深度学习模型，该模型首先通过图结构对文本进行建模，这有助于捕捉词语之间的关系和上下文信息。不同于简单的词袋模型，这种图-CNN能够自动学习多层次的文本表示，从而更好地适应不同粒度的文本分类需求。在图-CNN的设计中，作者引入了递归正则化这一关键概念，目的是防止过拟合，确保模型在大规模训练数据上的泛化能力。递归正则化通过在深层网络中加入约束，使得模型参数在整个网络结构中保持一致性，进一步提升了模型的稳定性和性能。为了验证模型的有效性，研究者在多个大规模的分层文本数据集上进行了实验，包括但不限于新闻分类、产品类别标注等场景。结果表明，与传统方法相比，递归正则化深度图-CNN不仅提高了分类精度，而且在处理复杂多层主题分类时展现出更强的表达力和泛化能力。这篇研究论文不仅介绍了递归正则化深度图-CNN的具体架构和工作原理，还展示了其在实际文本分类任务中的优越性能，对于推动深度学习在文本挖掘领域的应用和发展具有重要意义。未来的研究可能关注如何进一步优化模型的效率，或者将其扩展到其他自然语言处理任务中。

Large-Scale Hierarchical Text Classiﬁcation with

Recursively Regularized Deep Graph-CNN

Hao Peng

Jianxin Li

Yu He

Yaopeng Liu

Mengjiao Bao

Lihong Wang

†

Yangqiu Song

‡

Qiang Yang

‡

School of Computer Science & Engineering, Beihang University, Beijing, China

†

National Computer Network Emergency Response Technical Team/Coordination Center of China

‡

Department of Computer Science & Engineering, Hong Kong University of Science and

Technology, Hong Kong

{penghao,lijx,heyu,liuyp,baomj}@act.buaa.edu.cn wlh@isc.org.cn

{yqsong,qyang}@cse.ust.hk

ABSTRACT

Text classiﬁcation to a hierarchical taxonomy of topics is a common

and practical problem. Traditional approaches simply use bag-of-

words and have achieved good results. However, when there are

a lot of labels with different topical granularities, bag-of-words

representation may not be enough. Deep learning models have

been proven to be effective to automatically learn different levels of

representations for image data. It is interesting to study what is the

best way to represent texts. In this paper, we propose a graph-CNN

based deep learning model to ﬁrst convert texts to graph-of-words,

and then use graph convolution operations to convolve the word

graph. Graph-of-words representation of texts has the advantage

of capturing non-consecutive and long-distance semantics. CNN

models have the advantage of learning different level of semantics.

To further leverage the hierarchy of labels, we regularize the deep

architecture with the dependency among labels. Our results on

both RCV1 and NYTimes datasets show that we can signiﬁcantly

improve large-scale hierarchical text classiﬁcation over traditional

hierarchical text classiﬁcation and existing deep models.

1 INTRODUCTION

Topical text classiﬁcation is a fundamental text mining problem

for many applications, such as news classiﬁcation [

], question

answering [

], search result organization [

], online advertis-

ing [

], etc. When there are many labels, hierarchical catego-

rization of texts has been recognized as a natural and effective

way to organize texts and it has been well studied in the past two

decades [

]. Most of the above traditional

approaches represent text as sparse lexical features such as bag-of-

words (BOW) and/or n-grams due to simplicity and effectiveness [

Different kinds of feature engineering, such as none-phrases or key-

phrases, were shown no signiﬁcant improvement on themselves,

while the majority voting over the results of different features are

signiﬁcantly better [36].

Permission to make digital or hard copies of part or all of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full citation

on the ﬁrst page. Copyrights for third-party components of this work must be honored.

For all other uses, contact the owner/author(s).

WWW2018, Lony, France

DOI: 10.475/123

Recently, deep learning has been proven to be effective to perform

end-to-end learning of hierarchical feature representations, and has

made groundbreaking progress on object recondition in computer

vision and speech recognition problems [

]. Two popular deep

learning architectures have attracted more attention for text data,

i.e., recurrent neural networks (RNNs) [

]

and convolutional

neural networks (CNNs) [

]. RNNs are more powerful on short

messages or word level syntactics or semantics [

]. When they are

applied to long documents, hierarchical RNNs can be developed [

However, hierarchical RNNs assume that the documents and sen-

tences are considered as natural boundaries for the deﬁnition of the

hierarchy where only regular texts and formal languages satisfy this

constraint. Different from RNNs, CNNs use convolutional masks

to sequentially convolve over the data. For texts, a simple mecha-

nism is to recursively convolve the nearby lower-level vectors in the

sequence to compose higher-level vectors [

]. This way of using

CNNs simply evaluates the semantic compositionality of consec-

utive words, which corresponds to the n-grams used in traditional

text modeling [

]. Similar to images, such convolution can naturally

represent different levels of semantics shown by the text data. Higher

level represents semantics captured by larger “n”-grams.

For document-level topical classiﬁcation of texts, the sequential

information of words might not be as important as it is for language

models [

] or sentiment analysis [

]. For example, when we write

“I love this restaurant! I think it is good. It has great sandwich. But

the service may not be very efﬁcient sine there are always a lot of

people...”, we can easily identify it’s topic as “food” but sentiment

analysis should be conducted more carefully since there is a word

“but.” For topic classiﬁcation, the key words, phrases, and their

composition are more important. In this case, rather than sequential

information, the non-consecutive phrases and long-distance word

dependency are more important for computing the composition of

semantics. For example, in a document, the words “restaurant” and

“sandwich” may not co-occur in a small window. However, “menu”

may co-occur with both of them somewhere else in the document,

and the composition of all of three words is a very strong signal to

classify the document to be “food” related topics. Therefore, a more

Here we ignore the discussion of recursive neural networks [

] since it requires

knowing the tree structure of text, which is not as efﬁcient as the others when dealing

with large scale data.

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38563871

粉丝: 1
资源: 959

递归正则化深度图CNN：大规模多层级文本分类新策略

IJCAI2017:具有递归正则化的分层特征选择

MNIST分类的两级RCN模型参考实现

基于深度学习的车标识别算法的研究与实现.pdf

深度学习算法教程(Deeplearning Algorithms Tutorial) 完整版PDF

有毒蘑菇预测数据集-mushrooms

【故障预测模型的监督学习】：CNN-BiLSTM的方法论

深度解析迁移学习：如何在图像分类中获得5大关键优势？

邮件分类与过滤：rfc822库文件实现智能化邮件管理

深度学习与神经网络算法初探

探索深度学习：从基础到实践

最新资源