NJU-Parser：提升汉语语义依赖解析的成就与方法

研究论文

87 浏览量更新于2024-08-26 收藏 227KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"NJU-Parser在语义依赖分析方面的成就" 在本文中，作者介绍了NJU-Parser在2012年SemEval任务5：中文语义依赖解析方面的工作。该系统基于MSTParser（最大边集树解析器），并提出了两种有效的方法：通过标点符号拆分句子和提取词的最后一个字符作为词元（lemma）。实验结果显示，这两种方法的结合使系统在LAS（精确度）上提高了约1%，最终在九个参赛系统中获得第二名。此外，他们还尝试处理多级标签，但未见明显改进。 1. 引言 SemEval-2012的任务5旨在寻找提升中文语义依赖解析的方法。语义依赖解析是自然语言处理中的一个重要领域，它涉及到理解句子中词汇之间的深层语义关系，例如动词与宾语、主语之间的依赖。对于中文来说，由于其独特的语法结构和无词形变化的特点，这个任务尤其具有挑战性。 2. NJU-Parser的改进策略 - **句子拆分**：通过标点符号拆分句子是一种常见的预处理步骤，有助于简化句子结构，便于解析器理解和处理复杂的句法结构。这可以提高解析器对句子的理解准确度，尤其是在处理长句和复杂句时。 - **词元提取**：提取词的最后一个字符作为词元是针对中文特性的一种处理方式。中文词通常由多个字符组成，词元可以帮助识别词的基元形式，从而更好地理解词汇的语义角色。 3. 实验结果结果表明，采用上述两种策略后，系统的精确度有显著提升，特别是在局部精度（LAS）这一关键指标上。LAS是衡量解析器性能的重要标准，反映了预测的依赖关系与人工标注的匹配程度。 4. 多级标签处理的尝试尽管NJU-Parser在基本的语义依赖解析上取得了成功，但处理多级标签（即一个词汇可能有多个语义角色）的尝试并未取得预期的效果。这可能是由于多级标签的复杂性和现有模型的局限性导致的，需要进一步的研究来解决这个问题。 5. 结论 NJU-Parser的成就展示了在中文语义依赖解析上采用创新方法的有效性。尽管存在挑战，如多级标签的处理，但该工作为未来的研究提供了有价值的参考，尤其是在如何优化解析算法和适应中文特性方面。 6. 展望对于未来的工作，可以考虑更深入地探索中文的特性和结构，结合深度学习或转型-based解析技术，以改进多级标签的处理，并进一步提高语义依赖解析的性能。同时，研究如何将这些方法扩展到其他语言也是重要的研究方向。

资源详情

资源推荐

First Joint Conference on Lexical and Computational Semantics (*SEM), pages 519–523,

Montr

eal, Canada, June 7-8, 2012.

2012 Association for Computational Linguistics

NJU-Parser: Achievements on Semantic Dependency Parsing

Guangchao Tang

Bin Li

1,2

Shuaishuai Xu

Xinyu Dai

Jiajun Chen

State Key Lab for Novel Software Technology, Nanjing University

Research Center of Language and Informatics, Nanjing Normal University

Nanjing, Jiangsu, China

{tanggc, lib, xuss, dxy, chenjj}@nlp.nju.edu.cn

Abstract

In this paper, we introduce our work on

SemEval-2012 task 5: Chinese Semantic De-

pendency Parsing. Our system is based on

MSTParser and two effective methods are

proposed: splitting sentence by punctuations

and extracting last character of word as lemma.

The experiments show that, with a combina-

tion of the two proposed methods, our system

can improve LAS about one percent and final-

ly get the second prize out of nine participat-

ing systems. We also try to handle the multi-

level labels, but with no improvement.

1 Introduction

Task 5 of SemEval-2012 tries to find approaches to

improve Chinese sematic dependency parsing

(SDP). SDP is a kind of dependency parsing. Cur-

rently, there are many dependency parsers availa-

ble, such as Eisner’s probabilistic dependency

parser (Eisner, 1996), McDonald’s MSTParser

(McDonald et al. 2005a; McDonald et al. 2005b)

and Nivre’s MaltParser (Nivre, 2006).

Despite of elaborate models, lots of problems

still exist in dependency parsing. For example, sen-

tence length has been proved to show great impact

on the parsing performance. (Li et al., 2010) used a

two-stage approach based on sentence fragment for

high-order graph-based dependency parsing. Lack-

ing of linguistic knowledge is also blamed.

Three methods are promoted in this paper try-

ing to improve the performance: splitting sentence

by commas and semicolons, extracting last charac-

ter of word as lemma and handling multi-level la-

bels. Improvements could be achieved through the

first two methods while not for the third.

2 Overview of Our System

Our system is based on MSTParser which is one of

the state-of-the-art parsers. MSTParser tries to ob-

tain the maximum spanning tree of a sentence. For

projective parsing task, it takes Eisner’s algorithm

(Eisner, 1996) to get the dependency tree in O(n

)

time. Meanwhile, Chu-Liu-Edmond’s algorithm

(Chu and Liu, 1965) is applied for non-projective

task, which takes O(n

) time.

Three methods are adopted to MSTParser in our

system:

1) Sentences are split into sub-sentences by

commas and semicolons, for which there

are two ways. Splitting sentences by all

commas and semicolons is used in our

primary system. In our contrast system, we

use a classifier to determine whether a

comma or semicolon can be used to split

the sentence. In the primary and contrast

system, the proto sentences and the sub-

sentences are trained and tested separately

and the outputs are merged in the end.

2) In a Chinese word, the last character usual-

ly contains main sense or semantic class.

We treat the last character of the word as

word lemma and find it gets a slightly im-

provement in the experiment.

3) An experiment trying to solve the problem

of multi-level labels was conducted by

parsing different levels separately and con-

sequently merging the outputs together.

The experiment results have shown that the first

two methods could enhance the system perfor-

mance while further improvements could be ob-

tained through a combination of them in our sub-

submitted systems.

519

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38736652

粉丝: 1
资源: 938

NJU-Parser：提升汉语语义依赖解析的成就与方法

k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.3.0镜像

NJU-OS:南京大学操作系统OSLab

请提供网络安全学习资料 要详细具体的

nju pa2-3 ics

第1关：单周期MIPS CPU设计源码

python爬取地图切片

linux安装kiwix

使用栈结构实现表达式求值

python randomfields

mobaxterm连接不上虚拟机

eclipse安装教程中文版

cannot import name 'Mapping' from 'collections'

kiwix-destop 怎么安装

'readonly' option is set (add ! to override)

[root@localhost ~]# yum install zenmap 已加载插件：fastestmirror, langpacks Loading mirror speeds from cached hostfile * base: mirrors.aliyun.com * extras: mirrors.nju.edu.cn * updates: mirrors.nju.edu.cn 没有可用软件包 zenmap。 错误：无须任何处理,怎么回事，怎么解决

ERRB asserted

c++ nju 期末考试

最新资源

请提供网络安全学习资料要详细具体的

[root@localhost ~]# yum install zenmap 已加载插件：fastestmirror, langpacks Loading mirror speeds from cached hostfile * base: mirrors.aliyun.com * extras: mirrors.nju.edu.cn * updates: mirrors.nju.edu.cn 没有可用软件包 zenmap。错误：无须任何处理,怎么回事，怎么解决