没有合适的资源?快使用搜索试试~ 我知道了~
首页Probase+:解决概念分类中缺失链接的问题
Probase+:解决概念分类中缺失链接的问题
3 下载量 28 浏览量
更新于2024-07-15
收藏 1.21MB PDF 举报
"Probase +:推断概念分类法中的缺失链接"这篇研究论文深入探讨了在构建大规模概念分类体系或语义网络时遇到的一个关键问题——缺失链接。当前,许多研究集中在自动从大量文本语料库中生成这些知识结构,但Probase +的研究者发现,这些分类法中存在的缺失链接成为阻碍它们在实际应用中广泛采纳的主要障碍。因为缺失的连接会破坏概念分类法声称支持的推理能力。 传统的Probase是一个先进的数据驱动型概念分类系统,它试图通过自动化的手段来构建和扩展。然而,论文指出,尽管Probase在初始构建中取得了显著成就,但在实际应用中,如知识库的完整性提升或推理过程中,那些未被捕捉的链接关系至关重要。为了克服这一挑战,作者提出了一个协作过滤框架,该框架旨在利用文本语料库中的信息来推断出缺失的链接。这种方法不仅能够填充这些空白,还力求在推断过程中保持高准确度,例如,他们将Probase扩展到包含约510万条额外的联系(大约增加了30%),且准确性达到了90%以上。 实验部分展示了修订后的概念分类法的质量,这表明通过Probase +技术,即使在面对缺失链接时,也能够有效地增强知识表示的完整性和实用性。研究的关键术语包括知识库完成(knowledge base completion)和协作过滤(collaborative filtering),这两种技术在本工作中起到了核心作用,旨在提升概念分类法在现实世界中的实用价值和影响力。这篇论文提供了一个有效的策略来解决概念分类法中的缺失链接问题,对于推动此类知识图谱在实际场景中的应用具有重要意义。
资源详情
资源推荐
to a term c (representing an entity or a concept). Specifically, if
most similar terms of c have h as the hypernym, c is likely to have
the hypernym h. We propose an iterative framework to realize
this idea. Algorithm 1 outlines the framework. In each itera-
tion, we use CF-based approach to find missing hypernyms
for each term c in the current taxonomy. Specifically, for each
c, we first find top-K terms that are similar to c (line 3). Each
hypernym (h)ofthesetop-K terms but c is a candidate hyper-
nym (line 4). We rank candidate hypernyms by aggregating
the votes from the top-K similar terms of c by a scoring func-
tion (line 5). We add the candidates with a score larger than
the threshold u as the missing hypernyms of c (line 6). Fig. 3
illustrates the key roles in the CF-based recommendation
model, where TðcÞ is the intersection of c’s existing hyper-
nyms and c’s similar terms’ hypernyms. TðcÞ will be used as
prior knowledge to estimate the frequency of newly discov-
ered isA relations (Section 3.3) and to tune a threshold for the
selection of the final hypernyms (Section 4.2).
Algorithm 1. CF-Based Missing isA Relationship Finding
Input: Taxonomy T , parameters K; u
1: while iteration < max
iteration do
2: for term c 2 T do
3: SimðcÞ top-K similar terms of c;
4: CandidateðcÞ ½
S
s2SimðcÞ
hypeðsÞ hype ðcÞ;
5: Rank candidates in CandidateðcÞ by a scoring
function f;
6: Update T by attaching c to any x 2 CandidateðcÞ s.t.
fðx Þu;
7: end for
8: iteration iteration þ 1;
9: end while
Relationship to User-Based KNN Recommendation. Our solu-
tion is essentially the user-based KNN approach, which is
widely used in recommender systems. There are two types of
recommendation algorithms: user-based [24], [25] and item-
based [26], [27]. In out settings, the user is the term for which
we need to find its missing hypernyms as well as its similar
terms, while item means the missing hypernyms. Item-based
algorithms in general are not applicable in our problem
because a similar hypernym is not necessarily an appropriate
hypernym for a specific instance, as illustrated in Example 2.
Although recommender systems have been extensively stud-
ied, we still need great efforts to tailor the generic user-based
KNN approach to our problems because of a completely dif-
ferent setting.
1.4 Challenges and Contributions
Using CF to solve our problem is not trivial. In general, we
still have the following obstacles to overcome.
C1: Similarity metric. We need an effective semantic simi-
larity metric to find similar terms. Since Probase has ambigu-
ous words or phrases, and noisy or missing isA relationships,
it is not easy to ensure the effectiveness of the metric. We
will address this issue in Section 2.
C2: Relationship frequency. In Probase, each isA relationship
is associated with a frequency that the relationship is observed
from corpora. The frequency is critical for the successful usage
of Probase in real applications [23], [28]. While the generic CF
framework can only give a likelihood that a concept is a right
hypernym of a term. We still need great efforts to estimate an
appropriate frequency for the predicted missing relationships.
In Section 3 we propose a regression model to estimate the
frequency of newly found missing relationships.
C3: Parameter tuning. There are two key parameters in our
solution (as shown in Algorithm 1): K (used for the selection
of the similar concepts of c)andu (used for the selection of
final missing hypernyms). In general, a fixed setting is not an
optimal choice due to the heterogeneity of data. We propose
heuristics to set a best K and u for different c in Section 4.
C4: Efficiency. Probase has millions of terms and tens of
millions of relationships. A straightforward solution is not
efficient. In Section 2.5, we identify the bottleneck (the com-
putation of similarity metrics). We further propose a sum-
mary based pruning strategy to speedup the computation
in Section 5.
In summary, our contributions are as follows:
We propose a collaborative filtering based inferenc-
ing framework to find missing isA relationships in a
data-driven taxonomy.
We propose effective similarity metrics (to measure a
pair of terms) and ranking mechanisms (of candidate
missing hypernyms) to enable the effective inference
of missing isA relationships under the CF framework.
We identify the performance bottleneck and propose a
corresponding speedup strategy. We also give param-
eter tuning solutions to improve the effectiveness.
We conduct extensive experiments to justify the pro-
posed models and solutions. We build a taxonomy
Probase+ containing 5.1 million (about 30 percent)
more isA relationships than its prototype Probase, and
its accuracy is above 90 percent. As far as we know, it
is the largest conceptual taxonomy ever known.
The rest of the paper is organized as follows. In Section 2,
we give the similarity metrics to find similar terms. In
Section 3, we elaborate the ranking scores of candidate
hypernyms. In Section 4, we describe our method of param-
eter tuning. In Section 5, we optimize the performance of
our solution. In Section 6, we report the effectiveness of our
approach based on comprehensive experiments. We discuss
related works in Section 7, and conclude in Section 8.
2SIMILARITY METRICS
To apply collaborative filtering, we need to find terms that
are similar to a given term. In general, our framework is
Fig. 2. Probase fragment: “car” and “automobile.”
Fig. 3. The CF model.
LIANG ET AL.: PROBASE+: INFERRING MISSING LINKS IN CONCEPTUAL TAXONOMIES 1283
剩余14页未读,继续阅读
weixin_38735119
- 粉丝: 7
- 资源: 876
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 构建Cadence PSpice仿真模型库教程
- VMware 10.0安装指南:步骤详解与网络、文件共享解决方案
- 中国互联网20周年必读:影响行业的100本经典书籍
- SQL Server 2000 Analysis Services的经典MDX查询示例
- VC6.0 MFC操作Excel教程:亲测Win7下的应用与保存技巧
- 使用Python NetworkX处理网络图
- 科技驱动:计算机控制技术的革新与应用
- MF-1型机器人硬件与robobasic编程详解
- ADC性能指标解析:超越位数、SNR和谐波
- 通用示波器改造为逻辑分析仪:0-1字符显示与电路设计
- C++实现TCP控制台客户端
- SOA架构下ESB在卷烟厂的信息整合与决策支持
- 三维人脸识别:技术进展与应用解析
- 单张人脸图像的眼镜边框自动去除方法
- C语言绘制图形:余弦曲线与正弦函数示例
- Matlab 文件操作入门:fopen、fclose、fprintf、fscanf 等函数使用详解
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功