A Language-Independent Hybrid Approach for
Multi-Word Expression Extraction
Yinghong Liang
Department of
Software Engineering
Jingling Institute of
Technology
Nanjing,China
liangyh@jit.edu.cn
Hongye Tan
Department of
Computer and
information
technology
Shanxi University
Shanxi,China
hytan_2006@126.co
m
Hui Li
Department of
Software Engineering
Jingling Institute of
Technology
Nanjing,China
lihui@jit.edu.cn
Zhigang Wang
Department of
Software Engineering
Jingling Institute of
Technology
Nanjing,China
friend@jit.edu.cn
Wenming Gui
Department of
Software Engineering
Jingling Institute of
Technology
Nanjing,China
gwm@jit.edu.cn
Abstract—Failing to identify multi-word expression (MWE)
may cause serious problems for many Natural Language
Processing (NLP) tasks. Previous approaches heavily depend on
language specific knowledge and pre-existing natural language
processing (NLP) tools. However, many languages (including
Chinese language) have less such resources and tools compared
to English. An automatically learn effective features from corpus,
without relying on language specific resources is needed. In this
paper, we develop a hybrid approach that combines Bi-
directional long short-term memory (Bi-LSTM), word
correlation degree calculation and weakly supervised K-means
cluster to capture both sequence information and correlation
degree of phrase from specific contexts, and use them to train a
multi-word expression detector for multiple languages without
any manually encoded features. Experiment result shows that the
extraction results of Chinese and English multi-word expression
using this hybrid approach is better than that of baseline
algorithm, which verified that the hybrid approach is effective.
Keywords—Multi-Word Expression; Bi-LSTM; Language-
Independent
I. I
NTRODUCTION
With the deeply study in the field of natural language
processing, the researchers found that a major factor of
affecting the performance promotion was related with the
accurate extraction of multi-word expression (MWE). Most
researchers use the definition of MWE defined by Sag et al.in
2002[1]: A single meaning unit, which combines two or more
words together. For example:
English S1: I only [want some more] [white coffee].
Chinese S2:ᰙᲘ[⍇ᆼ◑]ˈԆᘰ⵰[ᘀᘁнᆹ]Ⲵᗳ
ᛵоሬᐸ㿱Ҷа䶒.
Translation S2˖He met his tutor [in a rather nervous
state] [after a bath] in the morning.
In S1:[want some more]is a compound verb and [white
coffee]is a compound noun.
In S2:[
⍇ᆼ◑
] [after a bath] is the verb phrase in loose
structure, and [
ᘀᘁнᆹ
] [in a rather nervous state] is an
idiom.
MWE is a special part of phrase recognition, which is
regarded as a difficult and bottleneck problem in the field of
Natural Language Processing. The construction of Chinese
MWE data is a time-consuming work. In order to avoid this
problem, most of researchers used English and Chinese
parallel corpus to extract the Chinese MWEs [2-6]. A part of
researchers labeled the small scale of corpus, and then used
this corpus to extract the Chinese MWE.
Most of previous methods considered multi-word
expression (MWE) as a classi¿cation problem and designed a
lot of lexical and syntactic features. These features are often
derived from speci¿c language resources, which make these
methods dif¿cult to be applied to different languages.
For example, in S1, when predicting the type of a
compound verb candidate “want some more”, the forward
sequence information such as “I” can help the classi¿er label
“want” as the beginning of a compound verb. In addition,
considering S2, “
⍇ᆼ◑
”[after a bath]is a verb phrase in loose
structure, “
⍇◑
” is a phrases, “
⍇
”is relative to following
context information “
◑
”. However, for feature engineering
methods, it is hard to establish a relation between “
⍇
”and
“
◑
”, “ I” and “want”, because there is no direct dependency
path between them.
Recently, deep learning techniques have been widely
used in modeling complex structures and proven effective for
many NLP tasks, such as relation extraction [7] and sentiment
analysis [8]. A key advantage of these neural architectures is
that they can capture the meaning of linguistic phenomena
ranging from individual words[9] to longer-range linguistic
contexts at the sentence level [10] .Bi-directional long short-
term memory (Bi-LSTM) model[11]is a two-way recurrent
neural network (RNN) [12] which can capture both the
preceding and following context information of each word.
In this work, we present Bi-LSTM neural network to
model sequence information from speci¿c contexts, which
does not require manual effort for finding the best relevant
features. Sequence is a language-independent structure for
MWE extraction. Taking advantage of word semantic
978-1-5090-6182-2/17/$31.00 ©2017 IEEE 3273