information of all samples made up the original database. The
spectral information generally exhibiting evident noise and
other interference signals were adverse to the subsequent
discrimination analysis. Appropriate data pre-processing
methods needed to be considered to alleviate the irrelevant
variation. Therefore, the near-infrared range of
948.17e1649.20 nm with a high signal noise ratio was retained
by removing the fore and aft parts of spectral wavebands
(Chen et al., 2019; Hinton et al., 2012; Wang et al., 2015).
Effectively, 209 wavebands variables were retained. Savitzky-
Golay (SG) smoothing algorithm was also introduced to pre-
process the average spectrum of every ROI for the further
interference elimination (Ruffin & King, 1999). To gain more
stable and realistic spectral results, the corresponding pa-
rameters such as the number of smoothing points, polynomial
and derivative orders were set as 15, 1, and 0, respectively,
among the SG algorithms. Through the disposal of the original
database, the ultimate dataset on the 1D mean spectral in-
formation of all the 18,931 samples with 209 wavebands were
calculated and constituted. The entire procedure of spectra
extraction and pre-processing was performed using series of
MATLAB programs.
2.3. Multivariate data analysis
2.3.1. Visual clustering identification
Before the formal modeling analysis, principal component
analysis (PCA) and linear discrimination analysis (LDA) were
qualitively often applied to check whether there existed
significant difference or patterns on spectral information of 18
hybrid okra varieties. They were the classic algorithms
commonly adopted in the dimension reduction yet filed away
as the unsupervised and supervised algorithms respectively.
The PCA transformed groups of potentially dependent spec-
tral variables into independent and unrelated ones known as
the principal components (PCs). Possessing the significant
information of the original variables, the major PCs covering
the larger proportion in the explained variance were selected.
The scores of these PCs were drawn to identify the presence of
groups among okra varieties. Another dimension-reduced
method, LDA, was rarely used. Nevertheless, there are
certain studies considering that its reconstructed variables
meet the needs, making the covariance matrix between a
different cultivar and the same one bigger. This indicates that
LDA can achieve better clustering results than the PCA
(Martı´nez & Kak, 2001). Both were conducted and compared to
explore the clearer distinguishability among hybrid okra va-
rieties by clustering visually spatial scattering plot of the
transformed variables. The scattering plots of PCA and LDA
were finished by relevant functions or algorithms executed in
Matlab.
2.3.2. Neural network models
Essentially, the establishment of discrimination models was
of paramount importance to identify okra seed of different
varieties quantitatively. As mentioned in the introduction, the
intention of our research was to analyze DL-powered ap-
proaches to manifest its capability for the identification of
Fig. 1 e Hyperspectral imaging system and okra samples for each variety. (a) okra seeds appearance, (b) number of hybrid
okra samples used, and (c) hyperspectral imaging system.
biosystems engineering 212 (2021) 46e61 49