优化优势类计算：一种快速算法

61 浏览量更新于2024-08-28 收藏 61KB PDF 举报

"这篇文章提出了一种快速算法，用于在有序信息系统中计算优势类别，以提高属性约简和规则提取的效率。传统粗糙集理论依赖于等价关系，无法处理具有偏好排序的数据，而基于优势关系的粗糙集则能解决这一问题。文章指出，优势类别的计算是计算成本的关键因素，新算法通过减少搜索空间显著提升了计算速度，并在10个UCI数据集上验证了其效果，特别适用于大规模数据。" 正文：粗糙集理论（Rough Set Theory, TRS）是处理不确定性和不完整性信息的有效工具，它通过等价关系对概念的上下近似集进行定义。然而，TRS的一个局限性在于，它只关注属性值是否可区分，而忽视了属性值中的偏好信息。在现实世界的数据中，往往存在偏好排序，比如商品的销量、用户的评价等级等，这就需要一种新的理论来更好地处理这类数据。基于优势关系的粗糙集理论（Dominance-based Rough Set）应运而生，它能够处理偏好排序的数据，更好地反映实际情境中的优先级。优势关系不仅考虑属性值是否可区分，还考虑值之间的优劣关系，因此能更精确地刻画数据的内在结构。然而，优势类别的计算通常成本较高，直接影响到属性约简和规则提取的效率，这是该领域亟待解决的问题。为了改善这一情况，文章提出了一种快速算法，该算法专注于有序信息系统中的优势类别计算。在有序信息系统中，数据是有顺序或层次的，如时间序列数据、排名数据等。算法的核心是通过动态地减少搜索空间，随着计算过程的推进，逐渐剔除劣等类别，从而优化计算过程，降低计算复杂度。实验部分，研究者使用了10个来自UCI机器学习库的数据集进行验证。结果显示，提出的快速算法在计算优势类别时的效率显著高于传统方法，尤其对于大规模数据集，其性能提升更为明显。这表明，新算法对于处理具有大量属性和实例的复杂信息系统的决策和规则提取具有很高的实用价值。总结来说，这篇论文贡献了一种针对有序信息系统的优势类别快速计算方法，该方法通过优化搜索策略，有效地降低了计算复杂度，提高了处理偏好排序数据的效率。这对于大数据环境下的知识发现和决策支持具有重要意义，为粗糙集理论在实际应用中的扩展和优化提供了新的思路。

HK.NCCP International Journal of Intelligent Information and Management Science

ISSN: 2307-0692, Volume 5, Issue 6, December, 2016

A Fast Algorithm for Computing

Dominance Classes

Yan LI, Qun YU

Key Lab. Of Machine Learning and Computational Intelligence, College of Mathematics and Information Science

Hebei University, Baoding, 071002, China

Abstract: Traditional rough set theory (TRS) is based on the concept of equivalence relation to define upper

and lower approximation sets of a given target concept, and therefore uncertainties in information systems can

be represented. By using equivalence relations, TRS only considers whether attribute values are distinguished

or not, regardless of the preference information contained in attribute values. Rough sets based on dominance

relations effectively solve this problem and can deal with preference-ordered data. In these dominance-based

approaches, the computational cost of the dominance classes greatly affects the efficiency of attribute reduc-

tion and rule extraction. This paper presents an efficient method of computing dominance classes in an or-

dered information system by rapidly reducing the search space. Based on the definition of dominance class,

the inferior class of an object is gradually removed from the universe with the increase of the attributes in the

computation process. Experiments on ten UCI data sets show that the proposed algorithm obviously improves

the efficiency of computing dominance classes, especially for large-scale data.

Keywords: Rough set, dominance class, ordered information systems, fast algorithm

1. Introduction

Rough set theory [1] is a mathematical theory proposed

by Professor Z.Pawlak in 1982 to deal with imprecise,

incomplete and incompatible knowledge. It has been

widely used in machine learning, data mining and pattern

recognition [2, 3] and other fields. In practical problems,

especially in multi-criteria decision analysis, some

attribute values are preference-ordered. For example, the

attribute "score" can be numerical or can be divided into

three attribute values: high, medium and low. This type

of attributes is often used to evaluate examples in the

universe, for example, to score a student in a few subjects.

In this case, the order information contained in the

attribute values must be considered for more accurate

decision making. In this case, Greco et al. firstly pro-

posed dominance relation based rough set approach

(DRSA) in 1999 [4,5] by replacing equivalence relations

in TRS with dominance relations and considering the

preference ordered information among objects. DRSA is

very useful to deal with practical problems with conti-

nuous-valued partial ordered attributes. In the literature

[6,7,12], information systems with dominance relations

are referred to as ordered information systems.

Many scholars have made extensive studies on domin-

ance based rough sets [8-11], and most research work

focuses on attribute reduction under dominance relations.

Note that the computation of dominance classes is a ne-

cessary step in most related algorithms, and traditional

algorithms [12,13] need to compare the values of each

attribute of all samples, consuming a large amount of

time and memory. This will greatly affect the efficiency

of computing dominance classes and further affect the

computation of the approximate sets, attribute reduction

[18], as well as rule extraction[19]. Nowadays, efficiently

processing of large-scale data with dominance relations

has become a main concern [20,21]. However, most of

the existing acceleration algorithms are designed for the

computation of equivalence classes [14-17], and there-

fore we propose a fast method to improve the computa-

tional efficiency of dominance classes by rapidly reduc-

ing the search space. Obviously, this method can be fur-

ther used in attribute reduction, rule extraction and other

related algorithms.

2. Basic Concepts

Definition 2.1 (Information system) An information sys-

tem is a 4-tuple

(,,,)

SUAVf

= ,where

{

}

,,...

= is a

non-empty finite set of objects;

{

}

,...,

= is a non-

empty finite set of attributes;

is the value set of

attributes

; :

fUAV

×→ is an information function that

specifies the attribute value of each object

in, that is,

(

)

,fxa

∈

for every ,

xUaA

∈∈

For a given information system S, if there is a partial or-

der relation

≥

“

”

on the range of the attribute

∈

, we

call

as a criterion.

xyUxy

∈≥

represents that

is at

least as good as

under criterion

,that is,

is better

than

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38713586

粉丝: 3
资源: 933

优化优势类计算：一种快速算法

一种基于遗传算法的C语言等级考试自动组卷优化算法.pdf

全国计算机速录等级考试(ppt文档).ppt

计算机等级考试：算法与数据结构解析

计算机等级考试C语言

一年全国计算机等级考试一级B模拟试题集.pdf

全国计算机等级考试一级模拟试题集.pdf

狼群算法——一种新的启发式算法.zip

NCRE计算机等级考试一级MSoffice模拟试题.pdf

计算机等级考试数据库考题.pdf

全国计算机等级考试二级公共基础押题

最新资源