华为发布NEZHA：预训练模型提升中文NLP理解

需积分: 10 34 浏览量更新于2024-09-07 收藏 144KB PDF 举报

在自然语言处理（NLP）领域，预训练语言模型已经取得了显著的成功，它们通过在大规模语料库上进行预训练，能够捕捉文本中的深层上下文化信息，从而提升在各种自然语言理解（NLU）任务中的性能。华为新发布的研究成果“哪吒词向量”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding），是针对中文语言理解的一项技术报告。 NEZHA模型借鉴了BERT（Bidirectional Encoder Representations from Transformers）[1]的技术基础，并在此基础上进行了创新。BERT本身是基于Transformer架构，通过双向编码器来捕获词汇的前后文关系。华为的研究团队对其进行了改进，提出了功能性相对位置编码（Functional Relative Positional Encoding），这是一种有效的位置编码方案，它能够更好地处理中文等非均匀分布的词序问题，提高了模型对词语顺序的敏感度和理解能力。此外，整词遮罩（Whole Word Masking）也是NEZHA的一个关键特性。不同于BERT的随机单词遮罩，整词遮罩策略确保在掩码过程中保留整个词汇，这样有助于模型更好地理解和学习词汇的上下文含义，特别是在处理汉语这种词汇频繁嵌套的语言时。 NEZHA的预训练是在大规模的中文语料库上进行的，这可能包括但不限于新闻、书籍、网页等，通过大量数据的训练，模型能够积累丰富的语言知识和模式。预训练完成后，研究人员将模型应用于各种中文NLU任务，如文本分类、命名实体识别、情感分析等，通过微调进一步适应特定场景，提高任务执行的精度和效率。华为的NEZHA研究代表了中文预训练语言模型的一个前沿实践，它通过优化的位置编码和整词遮罩策略，提升了模型在处理中文文本时的上下文理解能力，对于推动中文NLP的发展具有重要意义。在未来的工作中，随着技术的不断迭代，我们期待看到更多的创新和突破，使中文语言处理在更广泛的领域得到广泛应用。

arXiv:1909.00204v2 [cs.CL] 5 Sep 2019

NEZHA: NEURAL CONTEXTUALIZED REPRESENTATION FOR

CHINESE LANGUAGE UNDERSTANDING

TECHNICAL REPORT

Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao,

Yasheng Wang, Jiashu Lin

∗

, Xin Jiang, Xiao Chen, Qun Liu

Noah’s Ark Lab,

∗

HiSilicon, Huawei Tech nologies

{wei.junqiu1, renxiaozhe, lixiaoguang11, wenyong.huang, liao.yi,

wangyasheng, linjiashu, jiang.xin, chen.xiao2, qun.liu}@huawei.com

September 6, 2019

ABSTRACT

The pre-trained language models have achieved great successes in various natural language under-

standing (N LU) tasks due to its capacity to capture the deep contextualize d information in text by

pre-train ing on large-scale corpora. In this technical report, we present our practice of pre-training

languag e models named NEZHA (NEural contextualiZed representation for CHinese lAnguage un-

derstandin g) on Chinese corpora an d ﬁnetuning for the Chinese NLU tasks. The current version of

NEZHA is based on BERT [1] with a collection of proven improvements, which include Functional

Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strat-

egy, Mixed Precision Training and the LAMB Optimizer in tra ining the models. The experimen tal

results show that NEZHA achieves the state-of-the-art performances when ﬁnetuned on several rep-

resentative Chinese tasks, including named entity recognition (People’s Daily NER), sentence match-

ing (LCQMC), Chinese sentiment classiﬁcation (ChnSenti) and natural language inference (XNLI).

Keywords Pre-trained Language Models · NEZHA · Chinese Language Understanding

1 Introduction

Pre-trained lan guage models such as ELMo [2], BERT [1], ERNIE-Baidu [3, 4], ERNIE-Tsinghua [5], XLNet [6],

RoBERTa [7] and MegatronLM

have de monstrated remarkable successes in modeling co ntextualized word represen-

tations b y utilizing the massive amount of training text. As a fundamental technique in n a tural language processing

(NLP), the language m odels pre-tra ined on text could be easily transferred to lear n downstream NLP tasks with ﬁne-

tuning, which achieve the state-of-the-art performances on many tasks including sentiment analysis, machine reading

comprehension, sentence matching, named entity recognition and n a tural language inference.

The existing pre-trained lang uage models are mostly learned from English corpora (e.g., Book sCorpus and English

Wikipedia). There are several attempts to train the models spec iﬁca lly for the Chinese language, includin g Google’s

BERT [1] for Chinese, ERNIE-Baidu [3, 4] and BERT-WWM [8]. All of the models are based on Transformer [9]

and tr a ined on two unsupervised tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP). In the

MLM ta sk, th e model learns to recover the masked words in the training sen te nces. In the NSP task, it tries to predict

whether o ne sentence is the next sentence of the other. One of the main differences among the Chinese models lies

their word masking strategy in the MLM task. Google’s BERT masks each Chinese character or WordPiece token [10]

indepen dently. ERNIE-Baidu furthe r makes the MLM task more challenging by masking the entities or phrases in a

sentence as a whole, wh ere each entity or phrase may c ontain multiple characters or tokens. BERT-WWM takes a

similar strategy called Whole Word Masking (WWM), which enforces that all the tokens belonging to a Chinese word

should be masked together. Besides, in the most recently published ERNIE-Baidu 2.0 [4], additional pre-training tasks

such as Token-Document Relation Prediction and Sentence Reordering, are also incorporated.

https://nv-adlr.github.io/MegatronLM

下载后可阅读完整内容，剩余7页未读，立即下载

聂小闲

粉丝: 9
资源: 5

华为发布NEZHA：预训练模型提升中文NLP理解

32哪吒闹海.ppt

10哪吒闹海.doc

汽车行业周报：造车新势力5月销量出炉，蔚来小鹏领先，哪吒反超理想.pdf

哪吒_qrcode.png

10、哪吒闹海.doc

少年英雄小哪吒Flash课件.swf

通达信指标公式源码哪吒闹海.doc

汽车行业周报：造车新势力5月销量出炉，蔚来小鹏领先，哪吒反超理想.zip

房地产 -NEZHA哪吒原创国潮IP.pdf

哪吒分析.ipynb

最新资源