python遗传算法特征选择代码实例
时间: 2023-11-07 10:04:46 浏览: 183
遗传算法是一种通过模拟生物进化过程来解决优化问题的算法。特征选择是机器学习中的一个重要任务,它可以帮助我们从大量的特征中选择出对目标变量具有最大预测能力的特征子集。下面是一个使用遗传算法进行特征选择的Python代码示例:
```python
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
# 创建一个示例数据集
X, y = make_classification(n_samples=100, n_features=10, random_state=42)
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 定义适应度函数
def fitness_function(individual, X, y):
selected_features = X[:, individual]
clf = LogisticRegression()
clf.fit(selected_features, y)
return clf.score(selected_features, y)
# 定义遗传算法相关操作
def initialize_population(population_size, chromosome_length):
return np.random.randint(2, size=(population_size, chromosome_length))
def crossover(parent1, parent2):
crossover_point = np.random.randint(1, len(parent1)-1)
child1 = np.concatenate((parent1[:crossover_point], parent2[crossover_point:]))
child2 = np.concatenate((parent2[:crossover_point], parent1[crossover_point:]))
return child1, child2
def mutation(individual, mutation_rate):
for i in range(len(individual)):
if np.random.rand() < mutation_rate:
individual[i] = 1 - individual[i]
return individual
def selection(population, fitness_values):
sorted_indices = np.argsort(fitness_values)[::-1]
return population[sorted_indices][:len(population)//2]
# 设置遗传算法参数
population_size = 100
chromosome_length = X_train.shape[1]
mutation_rate = 0.01
num_generations = 50
# 初始化种群
population = initialize_population(population_size, chromosome_length)
# 进化过程
for generation in range(num_generations):
# 计算适应度值
fitness_values = np.array([fitness_function(individual, X_train, y_train) for individual in population])
# 选择种群中的优秀个体
selected_population = selection(population, fitness_values)
# 生成下一代种群
new_population = []
while len(new_population) < population_size:
parent1, parent2 = np.random.choice(selected_population, size=2, replace=False)
child1, child2 = crossover(parent1, parent2)
child1 = mutation(child1, mutation_rate)
child2 = mutation(child2, mutation_rate)
new_population.append(child1)
new_population.append(child2)
population = np.array(new_population)
# 在测试集上进行评估
best_individual = population[np.argmax(fitness_values)]
selected_features = X_train[:, best_individual]
clf = LogisticRegression()
clf.fit(selected_features, y_train)
test_selected_features = X_test[:, best_individual]
accuracy = clf.score(test_selected_features, y_test)
print("Selected features accuracy:", accuracy)
```
这段代码演示了如何使用遗传算法进行特征选择。首先,它创建一个示例数据集,然后划分为训练集和测试集。接下来定义了适应度函数,该函数计算选定特征子集在逻辑回归模型上的准确度作为适应度值。然后,定义了遗传算法的各种操作,包括初始化种群、交叉、变异和选择。最后,通过迭代多个代进行进化,选择最佳个体,并在测试集上评估所选特征子集的准确度。
阅读全文