0278-0046 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2019.2922941, IEEE
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS
Abstract—Batch-end quality modeling is used to predict
the quality by using batch measurements and generally
involves a large number of predictor variables. However,
not all of the variables are beneficial for the prediction.
Conventional multiway partial least squares (PLS) may not
function properly for batch-end quality modeling because
of many irrelevant predictor variables. This study proposes
an optimized sparse PLS (OSPLS) modeling approach for
simultaneous batch-end quality prediction and
relevant-variable selection. The effect of irrelevant
variables on the quality-prediction performance is analyzed,
and the importance of the relevant-variable selection is
emphasized. Then, an OSPLS batch-end quality modeling
approach is developed by incorporating the variable
resolution optimization and sparse PLS modeling. The
quality-prediction accuracy and modeling interpretability
are improved because only quality-relevant variables are
selected, and quality-irrelevant variables are eliminated.
Based on the selected quality-relevant variables, a statistic
is established for monitoring the quality status. The
proposed OSPLS-based modeling and monitoring
approach is applied on a fed-batch penicillin fermentation
process and an industrial injection molding process. The
results are compared with the state-of-the-art methods to
verify the effectiveness of the OSPLS approach.
Index Terms—Sparse modeling, optimized sparse partial
least square, batch-end quality prediction, batch processes,
soft sensing
This work was supported in part by National Natural Science
Foundation of China under Grants 61603138 and 21878081, in part by
Shanghai Pujiang Program under Grant 17PJD009, in part by Hong
Kong Research Grant Council Project under Grant 16207717, and in
part by the Programme of Introducing Talents of Discipline to
Universities (the 111 Project) under Grant B17017. (Corresponding
author: X. Yan)
Q. Jiang and X. Yan are with the Key Laboratory of Advanced Control
and Optimization for Chemical Processes of Ministry of Education, East
China University of Science and Technology, Shanghai 200237, P.R.
China (e-mail: qchjiang@ecust.edu.cn; xfyan@ecust.edu.cn).
H. Yi is with the College of Electronic Engineering and Control
Science, Nanjing University of Technology, Nanjing 211816, P.R. China
(email: jsyihui@126.com).
F. Gao is with the Department of Chemical and Biomolecular
Engineering, The Hong Kong University of Science and Technology,
Clear Water Bay, Kowloon, Hong Kong (e-mail: kefgao@ust.hk).
I. INTRODUCTION
ARGE portions of value-added products are produced in
chemical and pharmaceutical industries by batch processes.
Generally, a batch process consists of several phases, and the
variables in a batch run are expected to follow a pre-defined
recipe. Due to the variations in environmental conditions,
reaction depths, or raw materials, the variable evolution recipe
may be deviated, and the final product quality may be
unsatisfactory. Thus, timely assessment of the process state and
estimation of the final product quality is important [1, 2].
However, the quality variable is generally obtained with some
delay because of the technique used or economic limitation.
Establishing a soft-sensor model for quality prediction is
important. Quality modeling and monitoring techniques are
typically classified into two types, namely, mechanism
(white-box) models and data-driven (black-box) models [3-7].
On the one hand, establishing a mathematic model is difficult
because the reaction during a process is generally complex. On
the other hand, abundant of history data are stored with the
rapid advancement of sensing techniques. Data-driven
modeling and monitoring techniques are gaining increasing
attention [8-13].
Least square (LS) is the basic linear regression method for
quality or key-performance-indicator modeling [14]. However,
the LS generally fails in dealing with high-dimensional and
highly correlated data, because of the regression coefficient
stability and computational efficiency problems. To handle
high-dimensional and highly correlated data, partial least
squares (PLS) is proposed and among the most popular
data-driven soft-sensor development methods [15]. For batch
processes, the multiway PLS (MPLS) that unfolds the
three-way data as two-way data is generally used [16].
However, the following defects of classical MPLS method exist,
which may degrade the prediction performance. After data
unfolding, the number of predictor variables can be remarkably
large, whereas the number of predictor measurements is
generally small. For example, a batch process that has 10
variables and 200 measurements in each batch and a set of data
with 100 batches, the number of predictor variables is