自适应粒子群优化：解决大规模分类特征选择难题

需积分: 9 62 浏览量更新于2024-07-16 收藏 4.54MB PDF 举报

本文探讨了在大规模特征选择背景下，如何利用自适应粒子群优化（Self-Adaptive Particle Swarm Optimization, SaPSO）技术来提升分类任务的性能。随着数据集规模的增长，特征选择问题的维度急剧增加，导致搜索空间膨胀，而且往往存在大量的冗余特征，这使得传统的进化计算方法如粒子群优化（Particle Swarm Optimization, PSO）容易陷入局部最优。针对这些问题，本文提出了一种创新的自适应策略，旨在解决大型特征选择中的效率和泛化性挑战。首先，作者注意到现有PSO方法中普遍采用的候选解生成策略（Candidate Solution Generation Strategy, CSGS）可能不适用于所有大规模特征选择问题，因为不同数据集可能具有独特的特性。因此，文章的重点在于设计一种自适应的候选解生成机制，能够动态调整粒子的行为，以适应复杂的数据分布和问题的多样性。这种自适应性有助于避免陷入单一策略的局限，提高算法在大规模环境中的全局搜索能力。具体来说，SaPSO算法的关键改进包括： 1. **自适应参数调整**：通过实时监控粒子的表现和问题的特性，动态调整粒子的位置和速度更新规则，以提高算法的收敛速度和优化效果。 2. **多样性的维护**：引入多样性保持机制，例如使用不同的种群分群策略或者变异操作，确保算法能够在探索多个潜在解决方案的同时保持一定的全局视野。 3. **问题特定优化**：针对不同的大规模特征选择问题，算法能够自适应地调整其行为，从而更好地适应问题的复杂性。 4. **效率与精度的平衡**：通过优化算法的设计，降低在大规模特征空间中的搜索复杂度，同时保持较高的特征选择精度。 5. **实验验证**：作者进行了详尽的实验对比，展示了SaPSO在处理大规模特征选择问题时相对于传统PSO和其他方法的显著优势，尤其是在面临数据集复杂性和高维特征的情况下，其性能更为优秀。本文的研究成果为解决大规模特征选择中的难题提供了新的视角和方法，证明了自适应粒子群优化在提高算法适应性和性能方面的有效性，这对于机器学习和进化计算领域的实际应用具有重要意义。未来的研究可以进一步探索如何将这一理念推广到其他复杂优化问题中。

50:6 Y. Xue et al.

to describe the probability of a strategy. They compared their SLPSO with eight PSO variants on

26 numerical optimization problems with dierent characteristics and an economic load dispatch

problem in power systems. Their results indicate that SLPSO can update the best solution records.

In recent years, Xue et al. (2014b, 2017) proposed some improved self-adaptive EC techniques to

solve the continuous and discrete optimization problems.

Recently, the EC methods with self-adaptive mechanisms have been proposed to solve large-

scale continuous optimization problems, and the experimental results show that these algorithms

have obvious advantages on the continuous numerical optimization problems with high dimen-

sionality (Xue et al. 2014b). However, to our best knowledge, though EC methods with self-adaptive

mechanisms have been employed for solving large-scale feature selection in clustering (Bharti and

Singh 2016), they have not been tried for solving feature selection problems in classication, not to

mention large-scale feature selection in classication. In this article, we investigate a self-adaptive

PSO algorithm to see whether it can achieve good performance for feature selection in classica-

tion, especially for large-scale feature selection in classication.

3 SELF-ADAPTIVE PARTICLE SWARM OPTIMIZATION FOR FEATURE SELECTION

3.1 Representation of Solutions

There are several representation schemes for feature selection in the literature (Xue et al. 2016).

In this article, feature selection is transformed into a “0” and “1” combinatorial optimization prob-

lem,inthesamemannerasthatin(Xueetal.2014a). Thus, the representation of a solution is a

binary string. This string has D dimensions, where D means the total number of features. We use

continuous encoding in PSO, and the range of each dimension of the position vector is limited in

[0, 1]. To transfer a continuous position vector to a binary string, a threshold θ is set in advance.

If the value of the dth dimension of the position is greater than θ, the corresponding value in the

binary vector is set to 1, which represents that the dth feature is selected. Otherwise, the value in

the binary vector is set to 0, which represents that the dth feature is not selected.

3.2 Methods for Designing Strategy Pool

Dierent from the other variant algorithms of PSO that use only one CSGS to generate new parti-

cles, the SaPSO algorithm uses multiple CSGSs to generate new particles. In the SaPSO algorithm,

the multiple CSGSs are maintained in a specic component that is termed as strategy pool. In

order to design the strategy pool for the SaPSO, we have rstly implemented 25 CSGSs that are

commonly used and representative CSGSs in the literature about PSO (The detailed information

of the 25 CSGSs can be seen in the complementary materials). The strategy pool is not constitute

of all the 25 CSGS, i.e., only the suitable CSGSs from the 25 CSGSs are put in the strategy pool. In

this subsection, a method for selecting CSGSs is introduced.

The choice of CSGSs to form the strategy pool has two aspects to consider. (1) How many CSGSs

should be selected to form the pool? (2) Which CSGSs should be selected? There is a basic question

here, i.e., how to identify which CSGSs are eective? We can identify which CSGSs are eective

if there are only one dataset. However, there are a large number of datasets, and we expect the

CSGSs can perform well on the large-scale datasets. The only information that can be obtained is

the performance of the CSGSs on each dataset by doing experiments. Hence, we need a method to

comprehensively evaluate the performance of the CSGSs.

Analytic hierarchy process (AHP) is a famous multicriteria decision making technique (Aguaron

et al. 2016; Saaty 1990). The main characteristics of this approach are as follows: the modeling of

the problem using a hierarchical structure that reects all the relevant aspects of the problem; the

use of pairwise comparisons to incorporate the preferences of decision makers; the derivation of

ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 5, Article 50. Publication date: September 2019.

剩余28页未读，继续阅读

xueyu7650309

粉丝: 2
资源: 12

自适应粒子群优化：解决大规模分类特征选择难题

Power Mean SVM for Large Scale Visual Classification

Application of Frequency Domain Feature Extraction in Image Recognition

[Signal Detection and Classification in MATLAB]: How to Identify Patterns in Signals

Challenges and Solutions for Multi-Label Classification Problems: 5 Strategies to Help You Overcome ...

Multi-Scale Training and Prediction Techniques in YOLOv8

MATLAB Version Selection Cases: Choosing the Right Version for Different Application Scenarios, ...

[Advanced] Instance Segmentation in MATLAB: Using Mask R-CNN for Image Instance Segmentation

【Advanced Section】Semantic Image Segmentation in MATLAB: Using Fully Convolutional Networks for ...

OpenCV Deep Learning Practical Guide: From Image Classification to Object Detection, Building AI ...

OpenCV and Python Version Selection Strategy: Optimizing Performance Based on Project Requirements

最新资源