NiuParser: A Chinese Syntactic and Semantic Parsing Toolkit
Jingbo Zhu Muhua Zhu
∗
Qiang Wang Tong Xiao
Natural Language Processing Lab.
Northeastern University
zhujingbo@mail.neu.edu.cn zhumuhua@gmail.com
wangqiangneu@gmail.com xiaotong@mail.neu.edu.cn
Abstract
We present a new toolkit - NiuParser -
for Chinese syntactic and semantic anal-
ysis. It can handle a wide range of Natural
Language Processing (NLP) tasks in Chi-
nese, including word segmentation, part-
of-speech tagging, named entity recogni-
tion, chunking, constituent parsing, depen-
dency parsing, and semantic role label-
ing. The NiuParser system runs fast and
shows state-of-the-art performance on sev-
eral benchmarks. Moreover, it is very easy
to use for both research and industrial pur-
poses. Advanced features include the Soft-
ware Development Kit (SDK) interfaces
and a multi-thread implementation for sys-
tem speed-up.
1 Introduction
Chinese has been one of the most popular world
languages for years. Due to its complexity and
diverse underlying structures, processing this lan-
guage is a challenging issue and has been clearly
an important part of Natural Language Processing
(NLP). Many tasks are proposed to analyze and
understand Chinese, ranging from word segmen-
tation to syntactic and/or semantic parsing, which
can benefit a wide range of natural language ap-
plications. To date, several systems have been
developed for Chinese word segmentation, part-
of-speech tagging and syntactic parsing (exam-
ples include Stanford CoreNLP
1
, FudanNLP
2
, LT-
P
3
and etc.) though some of them are not opti-
mized for Chinese.
∗
This work was done during his Ph.D. study in North-
eastern University.
1
http://nlp.stanford.edu/software/
corenlp.shtml
2
http://fudannlp.googlecode.com
3
http://www.ltp-cloud.com/intro/en/
In this paper we present a new toolkit for
Chinese syntactic and semantic analysis (cal-
l it NiuParser
4
). Unlike previous systems, the
NiuParser toolkit can handle most of Chinese
parsing-related tasks, including word segmenta-
tion, part-of-speech tagging, named entity recog-
nition, chunking, constituent parsing, dependency
parsing, and semantic role labeling. To the best
of our knowledge we are the first to report that all
seven of these functions are supported in a single
NLP package.
All subsystems in NiuParser are based on sta-
tistical models and are learned automatically from
data. Also, we optimize these systems for Chinese
in several ways, including handcrafted rules used
in pre/post-processing, heuristics used in various
algorithms, and a number of tuned features. The
systems are implemented with C++ and run fast.
On several benchmarks, we demonstrate state-of-
the-art performance in both accuracy/F1 score and
speed.
In addition, NiuParser can be fit into large-scale
tasks which are common in both research-oriented
experiments and industrial applications. Several
useful utilities are distributed with NiuParser, such
as the Software Development Kit (SDK) inter-
faces and a multi-thread implementation for sys-
tem speed-up.
The rest of the demonstration is organized as
follows. Section 2 describes the implementation
details of each subsystem, including statistical ap-
proaches and some enhancements with handcraft-
ed rules and dictionaries. Section 3 represents the
ways to use the toolkit. We also show the perfor-
mance of the system in Section 4 and finally we
conclude the demonstration and point out the fu-
ture work of NiuParser in Section 5.
4
http://www.niuparser.com/index.en.
html