没有合适的资源?快使用搜索试试~ 我知道了~
首页l-SkyDiv查询:提升天际线实用性的新方法
l-SkyDiv查询:提升天际线实用性的新方法
0 下载量 12 浏览量
更新于2024-07-15
收藏 584KB PDF 举报
“l-SkyDiv查询:有效提高天际线的实用性” 在数据库管理和数据分析领域,Skyline查询是一种用于找出多维数据集中不受其他对象支配的最优对象的查询方式。天际线查询的结果集合,即天际线,对于理解和识别数据集中的关键特征至关重要。然而,随着数据集基数(对象数量)和维度的增加,天际线集合可能会变得极其庞大,这不仅增加了计算复杂性,而且对用户来说可能过于复杂,难以消化和利用。 针对这一问题,文章提出了l-SkyDiv查询,这是一种创新的方法,旨在返回具有最大多样性的l个天际线对象,从而提高查询结果的实用性和用户可理解性。作者通过理论分析证明了l-SkyDiv查询是一个NP-Hard问题,这意味着在最坏情况下,找到最优解的计算难度会随问题规模的增加而呈指数级增长。 为了解决这个问题,文章提出了三种启发式算法,这些算法的时间复杂度是多项式的,能够在保证效率的同时,实现l-SkyDiv查询。启发式算法通常是在无法找到全局最优解时,寻找接近最优解的策略,它们在实际应用中表现出良好的性能。 作者进行了详尽的理论分析和大量实验,验证了所提出的启发式算法的有效性和效率。实验结果表明,这些算法能够在保持计算速度的同时,提供有价值的、精简的天际线结果,这对于用户理解和利用数据集的关键信息至关重要。 l-SkyDiv查询和相关的启发式算法为大数据环境下的多维数据分析提供了一个新的视角,它简化了用户获取和理解天际线信息的过程,提高了数据探索的效率。这项工作对数据库查询优化、数据挖掘以及信息检索等领域有着重要的理论与实践意义,有助于推动天际线查询技术的发展,使其更好地服务于实际应用。
资源详情
资源推荐
HUANG ZhenHua, et al. Sci China Inf Sci September 2010 Vol. 53 No. 9 1787
Refs. [7–11] dealt with all-subspace skyline computation. Refs. [7, 8] discussed all-subspace skyline
computation primarily from the viewpoint of the query semantics. Ref. [9] first proposed the concept
of skycube, which is a union of skyline results over all non-empty subsets of the k dimensions. Ref.
[9] also investigated efficient implementation of the skycube computation and presented two approaches,
i.e. bottom-up and top-down algorithms, to compute the skyline results of all 2
k
− 1 subspaces. Ref.
[10] investigates the important issue of updating the skycube in a dynamic environment. In order to
balance the query cost and update cost, ref. [10] proposed a new structure, the compressed skycube
which concisely represented the complete skycube. Also, ref. [10] thoroughly explored the properties of
the compressed skycube and provided an efficient object-aware update scheme. Ref. [11] first proposed
an efficient algorithm, called IMASCIR, to maintain the skycube views.
The fixed-subspace and all-subspace skyline computation algorithms have some serious drawbacks, and
hence arbitrary-subspace skyline computation [12–15] has attracted much attention recently. Refs. [12,
13] first proposed an index-based method, SUBSKY, to compute skylines for arbitrary single subspace.
Based on the data distribution, the SUBSKY algorithm creates an anchor point for each cluster, and builds
aB
+
-tree on the L
∞
distance between each tuple to its corresponding anchor. Ref. [14] proposed novel
notions of maximal partial-dominating space, maximal partial-dominated space and the maximal equality
space and utilizes these concepts as the foundation for realizing arbitrary-subspace skyline computation.
Query processing involves mostly simple pruning operations while skyline computation is done only on
a small subset of candidate skylines in the subspace. Ref. [15] presented an efficient cell-dominance
computation algorithm (CDCA) for processing arbitrary single subspace skyline query. The CDCA
algorithm uses the regular grid index [19] and prunes all the cells which are dominated by any other ones,
and hence it can dramatically reduce the number of comparisons between objects.
However, the algorithms in [1–15] return the whole skyline set, and therefore the number of skylines
will grow exponentially as the cardinality and dimensionality of input objects increase. Clearly, in real
applications, the result returned by the existing algorithms is completely useless to users.
3Thel-SkyDiv query
In order to let users focus on the most diverse skylines, in this section, we present a new type of l-SkyDiv
query. Also, we propose to extend SQL with an l-SKYDIV OF keyword, which can efficiently enhance
the query engine functions of RDBMSs.
Definition 1 (Dominance relationship). Give two k-dimensional objects p and r. p dominates r if they
satisfy the following conditions:
1) ∀i ∈ [1,k],p[i] r[i];
2) ∃j ∈ [1,k],p[j] <r[j].
For simplicity, the relationship that p dominates r is denoted by r ≺ p
Definition 2 (Skyline set). Let AD be the set of k-dimensional objects. Then, the skyline set of AD
canbeexpressedas∇(AD)={p|p ∈ AD ∧¬∃r ∈ AD, p ≺ r}.
If an object p belongs to ∇(AD), then p is a skyline (object).
Definition 3 (Skyline distance). Let AD be the set of k-dimensional objects, and |AD| be the car-
dinality of AD. Then for any two skylines p(p[1],p[2],...,p[k]) and r(r[1],r[2],...,r[k]), their skyline
distance sd(p, r) can be expressed as
sd(p, r)=
q
1ik
|p[i] − r[i]|
q
, (1)
where q is an integer defining the nature of the distance function (q=2 for Euclidean distance).
Based on Definition 3, for a set Ω which consists of l skylines (called l-SkySet), we can define its skyline
distance sd(Ω) below:
sd(Ω) =
∀p∈Ω
∀r∈Ω∧r=p
sd(p, r). (2)
剩余14页未读,继续阅读
Syndergaard
- 粉丝: 6
- 资源: 938
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 计算机人脸表情动画技术发展综述
- 关系数据库的关键字搜索技术综述:模型、架构与未来趋势
- 迭代自适应逆滤波在语音情感识别中的应用
- 概念知识树在旅游领域智能分析中的应用
- 构建is-a层次与OWL本体集成:理论与算法
- 基于语义元的相似度计算方法研究:改进与有效性验证
- 网格梯度多密度聚类算法:去噪与高效聚类
- 网格服务工作流动态调度算法PGSWA研究
- 突发事件连锁反应网络模型与应急预警分析
- BA网络上的病毒营销与网站推广仿真研究
- 离散HSMM故障预测模型:有效提升系统状态预测
- 煤矿安全评价:信息融合与可拓理论的应用
- 多维度Petri网工作流模型MD_WFN:统一建模与应用研究
- 面向过程追踪的知识安全描述方法
- 基于收益的软件过程资源调度优化策略
- 多核环境下基于数据流Java的Web服务器优化实现提升性能
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功