cols = sum([(categorical_cols[i] + '_' + ohe.categories_[i][1:]).tolist() for i in range(len(categorical_cols))],[]) + numeric_cols
时间: 2024-05-19 13:12:08 浏览: 39
这段代码是将经过 One-Hot Encoding 处理后的分类特征列和数值特征列合并成一个列表,以便后续在模型中使用。其中 `categorical_cols` 是原始数据集中的分类特征列,`ohe` 是使用 One-Hot Encoding 对这些列进行编码后得到的结果,`numeric_cols` 则是原始数据集中的数值特征列。这段代码首先对每个分类特征列和编码后的各个取值进行拼接,得到一个包含所有编码后的列名的列表,然后再将这个列表与数值特征列合并起来。最终得到的 `cols` 列表中包含了所有特征列的名称。
相关问题
优化这段代码for i in range(14): x_train_YS, y_train_YS = data.iloc[0:418+20*i,list(range(0))+list(range(8,95))],data.iloc[0:418+20*i,95] x_test_YS, y_test_YS = data.iloc[418+20*i:438+20*i,list(range(0))+list(range(8,95))],data.iloc[418+20*i:438+20*i,95] x_train_YS_1,y_train_YS_1 = x_train_YS.iloc[1:418+20*i,1:87].values,y_train_YS.iloc[1:418+20*i,87].values x_test_YS_1,y_test_YS_1 = x_test_YS.iloc[418+20*i:438+20*i,1:87].values,y_test_YS.iloc[418+20*i:438+20*i,87].values
可以对这段代码进行如下优化:
```
start = 0
end = 418
cols = list(range(0)) + list(range(8, 95))
cols_1 = list(range(1, 87))
train_data = []
test_data = []
for i in range(14):
x_train_YS, y_train_YS = data.iloc[start:end, cols], data.iloc[start:end, 95]
x_test_YS, y_test_YS = data.iloc[start+418:end+438, cols], data.iloc[start+418:end+438, 95]
x_train_YS_1, y_train_YS_1 = x_train_YS.iloc[1:end-1, cols_1].values, y_train_YS.iloc[1:end-1].values
x_test_YS_1, y_test_YS_1 = x_test_YS.iloc[:, cols_1].values, y_test_YS.iloc[:, -1].values
train_data.append((x_train_YS_1, y_train_YS_1))
test_data.append((x_test_YS_1, y_test_YS_1))
start = end
end += 20 * (i + 2)
```
这样,我们可以将数据处理部分的代码放入循环中,避免了重复代码,同时将结果存储在列表中,便于后续处理。
优化这段代码 x_train_YS, y_train_YS = data.iloc[0:418+20*i,list(range(0))+list(range(8,95))],data.iloc[0:418+20*i,95] x_test_YS, y_test_YS = data.iloc[418+20*i:438+20*i,list(range(0))+list(range(8,95))],data.iloc[418+20*i:438+20*i,95] x_train_YS_1,y_train_YS_1 = x_train_YS.iloc[1:418+20*i,1:87].values,y_train_YS.iloc[1:418+20*i,87].values x_test_YS_1,y_test_YS_1 = x_test_YS.iloc[418+20*i:438+20*i,1:87].values,y_test_YS.iloc[418+20*i:438+20*i,87].values
可以对这段代码进行如下优化:
```
start = 0
end = 418 + 20 * i
cols = list(range(0)) + list(range(8, 95))
x_train_YS = data.iloc[start:end, cols]
y_train_YS = data.iloc[start:end, 95]
start = 418 + 20 * i
end = 438 + 20 * i
x_test_YS = data.iloc[start:end, cols]
y_test_YS = data.iloc[start:end, 95]
cols_1 = list(range(1, 87))
x_train_YS_1 = x_train_YS.iloc[1:end-1, cols_1].values
y_train_YS_1 = y_train_YS.iloc[1:end-1].values
x_test_YS_1 = x_test_YS.iloc[:, cols_1].values
y_test_YS_1 = y_test_YS.iloc[:, -1].values
```
这样,我们可以避免重复代码,减少重复计算,提高代码的可读性和可维护性。同时,将计算结果存储在变量中,可以提高代码的执行效率。
阅读全文