数据驱动词典中的上位词-同义词关系及物性探讨

60 浏览量更新于2024-08-26 收藏 793KB PDF 举报

本文探讨了数据驱动词汇分类法中的一个重要概念——上位词（hypernym）与下位词（hyponym）关系的及物性。在自然语言理解和知识组织中，词汇分类体系，尤其是超类-子类结构，扮演着核心角色。近年来，随着大数据和语料库技术的发展，大规模的、基于使用的数据驱动词汇分类系统得到了广泛应用，如WordNet和Freebase等。超类-子类关系被认为是这些分类体系的基础，它们不仅用于对数据进行分类，还支持泛化推理。文章的焦点集中在超类-子类关系的一个关键特性——及物性，即如果A是B的超类，且B是C的超类，那么通常情况下A也应被视为C的超类。这种关系对于诸如信息检索、自然语言生成、问答系统以及机器翻译等应用具有重要意义。然而，与人工构建的本体论和分类体系不同，数据驱动的词汇分类法中，超类-子类关系的及物性并非总是成立的。这可能是由于数据收集过程中固有的偏差，语境的多样性，或者词汇使用习惯的复杂性。研究者们发现，尽管数据驱动的方法可以捕捉到大量实际语料中的上下位关系，但在某些特定情况下，这些关系可能并不满足严格的数学意义上的传递性。为了深入理解这一现象，研究人员Jiaqing Liang、Yi Zhang、Yanghua Xiao等人合作，他们通过分析大规模数据集，对比人工构建的分类系统，试图揭示数据驱动词汇分类中及物性规则的动态性和不确定性。他们可能运用统计方法、深度学习算法，甚至是基于规则的模型来探究这种关系的形成机制，并提出了可能的改进策略，以便更好地利用这些数据驱动的分类资源。本文的研究成果有助于我们更精确地评估数据驱动词汇分类的有效性，并为构建更加智能、适应性强的语言处理系统提供了有价值的见解。同时，它也提醒我们在设计和应用这些自动化的语言模型时，需考虑其在特定上下文中的局限性和潜在问题。这项研究为词汇分类领域的理论发展和实际应用提供了新的视角和挑战。

On the Transitivity of Hypernym-Hyponym Relations

in Data-Driven Lexical Taxonomies

Jiaqing Liang, Yi Zhang, Yanghua Xiao

∗

Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University

l.j.q.light@gmail.com, {z

yi11, shawyh}@fudan.edu.cn

Haixun Wang

Facebook, USA

haixun@gmail.com

Wei Wang

School of Computer Science,

Fudan University

weiwang1@fudan.edu.cn

Pinpin Zhu

Xiaoi Research, Shanghai Xiaoi

Robot Technology Co. LTD., China.

pp@xiaoi.com

Abstract

Taxonomy is indispensable in understanding natural lan-

guage. A variety of large scale, usage-based, data-driven

lexical taxonomies have been constructed in recent years.

Hypernym-hyponym relationship, which is considered as the

backbone of lexical taxonomies can not only be used to cat-

egorize the data but also enables generalization. In particu-

lar, we focus on one of the most prominent properties of the

hypernym-hyponym relationship, namely, transitivity, which

has a signiﬁcant implication for many applications. We show

that, unlike human crafted ontologies and taxonomies, transi-

tivity does not always hold in data-driven lexical taxonomies.

We introduce a supervised approach to detect whether tran-

sitivity holds for any given pair of hypernym-hyponym rela-

tionships. Besides solving the inferencing problem, we also

use the transitivity to derive new hypernym-hyponym re-

lationships for data-driven lexical taxonomies. We conduct

extensive experiments to show the effectiveness of our ap-

proach.

Introduction

Knowledge bases are playing an increasingly important

role in many applications. Most knowledge bases, including

WordNet (Miller 1995), Cyc (Lenat and Guha 1989), and

Freebase (Bollacker et al. 2008), are manually crafted by

human experts or community efforts. The coverage of man-

ual knowledge bases, such as WordNet, is far from being

complete (Sang 2007). For example, the concepts and in-

stances below Animals and People in WordNet is quite lim-

ited (Pantel and Pennacchiotti 2006; Hovy, Kozareva, and

Riloff 2009).

Much attention thus has been paid on deriving knowledge

bases by automatic extraction from big corpora. The data-

driven approaches produce many knowledge bases such as

KnowItAll (Etzioni et al. 2004), NELL (Mitchell et al.

∗

Correspondence author. This paper was supported by National

Key Basic Research Program of China under No.2015CB358800,

by the National NSFC (No.61472085, U1509213), by Shanghai

Municipal Science and Technology Commission foundation key

project under No.15JC1400900, by Shanghai Municipal Science

and Technology project under No.16511102102.

 2017, Association for the Advancement of Artiﬁcial

2015), and Probase (Wu et al. 2012). Data-driven knowledge

bases in general are larger than manual knowledge bases,

covering more entities, concepts as well as their relation-

ships. For example, Freebase has thousands of types, while

Probase has millions of concepts. With a larger coverage,

data-driven knowledge bases are better at supporting large

scale text understanding and many other tasks.

Data-driven Lexical Taxonomy

In this paper, we concentrate on a particular knowledge

base: lexical taxonomy built by data-driven approaches. A

lexical taxonomy consists of the hypernym-hyponym rela-

tions between terms. One term A is a hypernym of another

term B if A’s meaning covers the meaning of B or much

broader (Sang 2007). For example, furniture is a hy-

pernym of chair. The opposite term for hypernym is hy-

ponym. So chair is a hyponym of furniture.Weuse

the expression hyponym(A, B) for a hypernym-hyponym re-

lationship, which means A is a hyponym of B.

Hypernym-hyponym relations are backbones of text un-

derstanding. The reason hypernym-hyponym relationships

hold such signiﬁcance is that they enable generalization,

which lies at the core of human cognition as well as at the

core of machine inferencing for text understanding. To see

this, hyponym(iphone, smart phone) enables machine

to understand the search intent of iphone (i.e. smart

phone). hyponym(galaxy s4, smart phone) further

allows to recommend the related keyword galaxy s4.

Many automatically harvested lexical taxonomies such as

Probase, YAGO (Suchanek, Kasneci, and Weikum 2007),

WikiTaxonomy (Ponzetto and Strube 2008), are extracted

from web corpora or Wikipedia by certain syntactic pat-

terns (such as Hearst patterns (Hearst 1992)) or heuristic

rules. For example, a sentence “...famous basketball play-

ers such as Michael Jordan ...” is considered an evidence

for the claim that term Michael Jordan is a hyponym

of term famous basketball player, while this sen-

tence follows one Hearst pattern.

Problem Statement

In this paper, we focus on one of the most important proper-

ties of the hypernym-hyponym relationship: transitivity.For

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)

1185

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38749305

粉丝: 0
资源: 932

数据驱动词典中的上位词-同义词关系及物性探讨

最新资源