Ensemble of Cost-Sensitive Hypernetworks for
Class-Imbalance Learning
Jin Wang, Ping-li Huang, Kai-wei Sun, Bao-lin Cao, Rui Zhao
Chongqing Key Laboratory of Computational Intelligence
Chongqing University of Posts and Telecommunications
Chongqing 400065, PR China
wangjin@cqupt.edu.cn
Abstract—Hypernetwork is a probabilistic graphic model of
learning and memory inspired by biomolecular networks, which
is very useful for discovering higher-order correlations among
multiple attributes. However, as many traditional machine
learning algorithms, hypernetworks may bias towards the
majority class, thus producing poor predictive accuracy over the
minority class when learining with imbalacned datasets. In this
paper, three hypernetwork-based models, namely ensemble of
cost-sensitive hypernetworks (EN-CS-HN), ensemble of cost-
sensitive hypernetworks with under-sampling (EN-CS-HN-
UNDE), and ensemble of cost-sensitive hypernetworks with
synthetic minority over-sampling technique (EN-CS-HN-SMOTE)
are proposed respectively. To examine the performance of the
proposed schemes, we conduct experiments on ten imbalanced
datasets collected from UCI machine learning repository,
wherein the proposed methods are compared with various state-
of-the-art approaches using three metrics: G-Mean, F-Measure
and area under the receiver operating characteristic curve (AUC-
ROC). Experimental results show that the proposed methods are
able to surpass or match the previously known best algorithms on
most of the ten datasets.
Keywords-imbalanced classification; hypernetworks; ensemble
learning; cost-sensitive learning; under-sampling; SMOTE
I. INTRODUCTION
Imbalanced data classification is one of the leading
challenging problems in knowledge discovery and real-world
data mining [1]. It refers to the classification of datasets
wherein some classes have much fewer instances than other
classes. We assume that the positive class is the minority class,
and the negative class is the majority class. Class imbalance
has a serious impact on the performance of classifiers. When
learning from imbalanced datasets, traditional machine learning
algorithms usually produce high classification accuracy over
negative class while obtaining poor results over positive class.
For the past few years, several approaches have been
proposed for dealing with imbalanced data classification [1, 2].
The existing methods can be categorized into two fields: data-
oriented strategies and algorithms-related approaches. At the
data level, re-sampling strategies such as under-sampling [3]
and over-sampling [4] are extremely explored. Algorithm-
related approaches include ensemble learning [5, 6], cost-
sensitive learning [7] and so on. However, neither of these
methods alone can address the class imbalance problem
effectively. For example, the under-sampling strategy may lead
to information loss since many potential useful samples are
discarded. The over-sample strategy has the disadvantages of
long training time and overfitting when a lot of synthetic
samples are added. In most cases of cost-sensitive learning, the
misclassification costs are difficult to define.
Hypernetworks are a bio-inspired probabilistic graphical
model based on undirected graphs [8]. Generally speaking, a
hypernetwork is a hypergraph whose hyperedges are weighted.
Unlike common graph, an edge of which can only connect two
vertices at most, a hyperedge in a hypergraph can connect more
than two vertices. In this case, higher-order correlations of
vertices are explicitly represented in hyperedges. Up to now,
hypernetworks have been successfully used to solve various
machine learning problems [8, 9].
Hypernetworks assumes that the class distribution of
datasets is balanced. In the process of hypernetworks learning,
hyperedges which are critical for differentiating classes will be
copied and added, and hyperedges with poor distinguishing
ability will be discarded, aiming to extract hyperedges that can
cover as many samples as possible. However, within the
context of class-imbalance learning problem, most of samples
in the minority class are usually viewed as noises. Therefore,
the number of hyperedges corresponding to the majority class
significantly surpasses that of hyperedges corresponding to the
minority class. As a result, most of the minority samples are
misclassified in a traditional hypernetwork.
In this paper, a modified hypernetwork is proposed to deal
with the class imbalance problem in three ways:
1. Building a cost-sensitive hypernetwork model. By
assigning a higher misclassification cost to false negatives than
to false positives, hypernetworks are driven to focus on the
learning of the minority class, which develops the original
hypernetworks into a cost-sensitive one.
2. Introducing an ensemble strategy to the cost-sensitive
hypernetworks. In many cost-sensitive learning cases, the
actual misclassification cost information is generally
unavailable. In this paper, a genetic algorithm (GA)-based
This work was partially supported by the National Natural Science
Foundation of China (61203308, 61075019), and the Natural Science
Foundation Project of CQ CSTC under Grant No. cstc2013jcyjA40063, No.
cstc2012jjA40034.
2013 IEEE International Conference on Systems, Man, and Cybernetics
978-1-4799-0652-9/13 $31.00 © 2013 IEEE
DOI
1883
2013 IEEE International Conference on Systems, Man, and Cybernetics
978-1-4799-0652-9/13 $31.00 © 2013 IEEE
DOI 10.1109/SMC.2013.324
1883