基于语义网的实体核心化Bootstrap方法

168 浏览量更新于2024-08-27 收藏 1.97MB PDF 举报

本文探讨了"Bootstrapping Object Coreferencing on the Semantic Web"，由胡伟、瞿裕忠和孙行智合作撰写，发表在《计算机科学技术学报》2011年第四期，卷26，页码662-674。他们的研究关注的是语义网络中的对象识别问题，即对象核心ference，这是一个重要的任务，旨在识别不同主体赋予同一对象的不同URI，从而提升数据网络的信息整合和一致性。对象核心ference是基于语义Web的一项关键挑战，因为同一个实体在不同的上下文中可能被表示为不同的URI。作者提出了一种基于"bootstrapping"（自举）的方法来解决这个问题。这种方法首先定义了一个对象URI的核心，这个核心由来自相同来源、具有语义等价性、功能属性（如反函数属性）以及最大或最小数量限制的URI组成。这样做的目的是确保在不依赖于外部资源的情况下，通过内在关联性和逻辑规则识别潜在的对象同义群体。具体步骤包括： 1. **核心构建**：对于一个对象URI，通过分析其基本属性，如共享的源、一致的语义含义以及共同的功能关系，形成初始的核心集合。 2. **扩展与细化**：利用已识别的核心，进一步挖掘其他相关URI，可能通过链接结构、领域知识图谱或者共指消解算法，增强核心的精确度。 3. **迭代优化**：通过反复迭代，不断更新和扩大核心，同时排除那些不符合核心特征的候选URI，直到达到稳定状态或满足预设的精度阈值。 4. **评估与验证**：评估方法的性能，通过比较系统输出的同义群体与人工标注的结果，以及对实际应用中的效果进行测试，以确保方法的有效性和实用性。 5. **应用与改进**：将提出的bootstrap方法应用于实际的语义Web应用程序中，根据反馈不断调整和优化算法，以适应不断变化的网络环境和新的知识发现。这项工作的重要性在于它为解决语义Web上的数据融合问题提供了一种创新的解决方案，有助于提高信息检索的准确性和效率，对于推动语义Web的发展和技术的实际应用具有重要意义。同时，它也为后续的研究者在处理大规模、异构数据的同义识别问题上提供了有价值的参考框架。

664 J. Comput. Sci. & Technol., July 2011, Vol.26, No.4

Technically speaking, the architecture of our ap-

proach is similar to [24]. The study in [24] dedicated

a large-scale clustering to ontology terms. It looked up

synonyms to establish a kernel, and extended the ker-

nel with identical terms in labels or identiﬁers. There

are two diﬀerences between our approach and [24]: 1)

we adopt OWL built-in vocabulary elements to build

the kernel, which have standard semantics and usually

are trustworthy, while [24] depended on thesauri, which

may be imprecise in some cases; and 2) we propose

ranking methods for the coreferent URIs, while [24]

used a uniform threshold to ﬁlter wrong URIs, which is

hard to decide across diﬀerent domains.

Furthermore, our work is relevant to the problem

known as duplicate detection and coreference resolution

in the database and NLP ﬁelds, which have been exten-

sively studied in the past several decades

[3,13,15,25-26]

These methods are treated as similarity-based due to

lack of formal semantics to deﬁne equivalence. On the

Semantic Web, however, OWL provides well-deﬁned se-

mantics for the equivalence relation, which must be

carefully considered. There also exist many works

that address instance matching in ontology (or schema)

matching

[27-28]

. But they have not been aware of the

characteristics of the Web, while our method uses the

dereferenceability of URIs to classify the coreference

conﬁdence.

3 Finding Coreferent URIs

In the section, we will introduce our algorithm BOCr

for bootstrapping object coreferencing.

The overview of BOCr is illustrated in Fig.2, which

consists of two major iterations. Taking an object

URI u as input, the algorithm initializes an empty

queue Q and pushes u into Q. E is deﬁned to

record the equivalence relations between URIs. For

building the kernel (Lines 4∼19), BOCr iteratively

picks up each unchecked URI v in Q, and starts

four parallel threads: CorefBySameAs(), CorefByIFP(),

CorefByFP() and CorefByCard(), in order to perform

coreferencing on v (see Lines 6∼13). Diﬀerent equi-

valence relations with diﬀerent marks (e.g., “same-as”)

are put into E through AddEquivRel() for further rank-

ing, while U

keeps the newly found URIs in each ite-

ration to avoid duplicate coreferencing in Q. The kernel

iteration converges when there is no URI in Q. Then,

the normalized textual descriptions of the URIs in the

kernel are extracted for extension. ExtendByDesc()

searches the objects with the same descriptions as the

ones in the kernel (Line 22). BOCr returns a set of

URIs U that denote the same object as u, and a set of

same-as, IFP, FP and cardinality relations E.

Input: A URI u that denotes an object.

Output: A coreferent URI set U , and a set of

same-as, IFP, FP and cardinality relations E.

1 U ← {u};

2 Q.Push(u); /* Q is a queue */

3 E ← ∅;

4 while Q 6= ∅ do /* Kernel */

5 v ← Q.Pop();

6 U

← CorefBySameAs(v);

7 AddEquivRel(E, v, U

, “same-as”);

8 U

← CorefByIFP(v);

9 AddEquivRel(E, v, U

, “IFP”);

10 U

← CorefByFP(v);

11 AddEquivRel(E, v, U

, “FP”);

12 U

← CorefByCard(v);

13 AddEquivRel(E, v, U

, “cardinality”);

14 U

← (U

∪ U

)\U;

15 for each v

∈ U

16 Q.Push(v

);

17 end

18 U ← U ∪ U

;

19 end

20 for each v ∈ U do /* Extension */

21 s ← Desc(v);

22 U

← ExtendByDesc(s);

23 U ← U ∪ U

;

24 end

25 return U, E;

Fig.2. Algorithm BOCr.

Example. Fig.3 illustrates a set of coreferent URIs

regarding Chris Bizer at the Free University Berlin,

which are represented with the solid pattern for the

kernel and the dotted one for the extension. Sup-

posing that sw:chris-bizer is the URI to start

from. By searching for the objects linking with

owl:sameAs, several coreferent URIs are discovered,

such as ontoworld:Chris Bizer. At the time, if we

have knowledge of some IFPs, which are shown with

asterisks (∗) in the ﬁgure, we ﬁnd the objects hav-

ing the same values as well, e.g., bizer:chris. No-

tice that the construction of the kernel is an itera-

tive process, and the same-as or IFP relations (see the

solid lines in the ﬁgure) are recorded. For instance,

dblp:Christian Bizer is reached from bizer:chris,

and they have a same-as relation. Next, the kernel is

extended by using the descriptions of the URIs from

the kernel (e.g., “chris bizer”, “chris” and “christian

bizer”). This may cause errors, so ranking strategies to

reﬂect their conﬁdences are required. We will introduce

our methods in Section 4.

3.1 Coreferencing by Same-As

Let U be a set of URI references, B be a set of blank

剩余12页未读，继续阅读

weixin_38748263

粉丝: 6
资源: 893

基于语义网的实体核心化Bootstrap方法

Assessing the Significance of Performance Differences on the PASCAL VOC

Python库 | bootstrapping_tools-0.4.0.tar.gz

Eliciting Bootstrapping_ The Development of Introductory Statistics

context_bootstrapping

bootstrapping-angular

dataset bootstrapping

bootstrapping算法

Bootstrapping the clusterStarting MySQL (BC-RDB Cluster)....... ERROR! The server quit without updat

bootstrapping成功率

bootstrapping算法STATA

最新资源