on Kernel Principle Component Analysis (KPCA) and dif-
ferential evolution (DE) based SVM classifier. The HSEC
performs feature engineering by selecting features corre-
sponding to the time sequence and the dimensional reduc-
tion of electricity price data features. The HFS uses the
fusion of two feature selectors based on GCA rather than
one to give an appropriate selection of features. Different
from Principle Component Analysis (PCA) which is not
suitable for high dimensional non-linear data [28], KPCA
uses kernel function to deal with this dilemma. The main
contributions of this paper are summarized as follows:
We propose an integrated electricity price forecast-
ing framework to make accurate big data forecasting
in smart grid. To the best of our knowledge, it is the
first attempt in this paper that feature selection,
extraction and classification are integrated in this
framework design for the studied problem.
To achieve this framework, we first propose a GCA-
based HFS, combining Relief-F algorithm and Ran-
dom Forest (RF), to calculate the feature importance
and control the feature selection. For feature extrac-
tion, we use KPCA to further reduce the redundancy
among the selected features. We are the first to study
the redundancy among the selected features in the
electricity price forecasting field. We also design a
DE-SVM algorithm to tune the super parameters of
SVM, which has a higher accuracy than existing
classifiers.
The performance of our proposal is evaluated by
several extensive simulations that based real world
data traces of grid price and workload. The numeri-
cal results show that our proposal has better perfor-
mance than benchmark approaches.
The rest of this paper are organized as follows. Section 2
surveys the proposed price forecasting framework.
Sections 3 and 4 describe the feature selection and feature
extraction, respectively. In Section 5, the enhanced SVM
classifier based on DE is demonstrated. Section 6 shows the
experimental results for verifying our proposed framework.
The paper is concluded in Section 7 finally.
2SYSTEM FRAMEWORK
Fig. 1 depicts the system framework of HSEC. The modules
in this framework are made up with three parts, i.e., feature
engineering (feature selection, feature extraction) and
classification.
2.1 Design Goals
The goal of our framework is to do efficient and accurate
forecasting of electricity price. To achieve this, we need to
process the raw data, figure out the selected features and
carefully tune the classifier. Thus, the following metrics are
important for the processing performance of our proposed
framework.
Accuracy of classification: This is the core goal of our
framework design.
Dimensional reduction rate: In this framework, the per-
formance of feature engineering influences the accu-
racy of classification directly.
Time-efficiency: Applied in electricity price forecast-
ing, the framework should run fast.
2.2 Framework Overview
The primary issue in electricity price forecasting is accuracy.
However, various factors influence the electricity, which
makes the classifier training difficult. To enhance the accu-
racy of the proposed framework, we develop a parallelized
HFS, a KPCA-based feature extraction, and a DE-SVM
based classifier.
TheHSECbeginswithstandardizingtherawdata,
which corresponds to the first part in Fig. 1. This standard-
ization process is crucial for t he implementation of the
whole framework. Second, da ta flow into the GCA based
HFS, wh ere d ata will be used to train Relief-F and RF in
parallel. Fig. 2 illustrates the details of HFS. This feature
selector decides whether a feature is reserved by an index
which is given by Relief-F and RF and is called feature
importance. Due to the decoupling design of this selection
algorithm, this process could execute distributively. Third,
KPCA will be performed in the selected features for fur-
ther removal of redundant features. In our proposed
framework, factors incorporate depending on feature
importance and r edund ancy. For example, the weather
condition may affect the g eneration of solar and wind
energy, this constrain will be reflected in the redundancy
among weather, sola r, and wind. Finally, the processed
data is sent to build SVM. Since SVM is controlled by sev-
eral super parameters, we use DE algorithm to tune these
parameters. Table 1 introduces the major notations used in
this paper. We will describe the details of these modules
in the next three s ections.
3 GCA BASED HYBRID FEATURE SELECTOR
This section describes the process of features selection. We
propose a new parallel HFS based on GCA by fusing RF
and Relief-F, and it is controlled by a new proposed thresh-
old m. The fusion of RF and Relief-F brings a feature selec-
tion that is more accurate. The Relief-F and RF can give
feature importance, respectively. These two approaches are
both efficient. Features are first roughly selected by GCA,
Fig. 2. HFS structure.
36 IEEE TRANSACTIONS ON BIG DATA, VOL. 5, NO. 1, JANUARY-MARCH 2019