final_valid_predictions = {} final_test_predictions = [] scores = [] log_losses = [] balanced_log_losses = [] weights = [] for fold in range(5): train_df = df[df['fold'] != fold] valid_df = df[df['fold'] == fold] valid_ids = valid_df.Id.values.tolist() X_train, y_train = train_df.drop(['Id', 'Class', 'fold'], axis=1), train_df['Class'] X_valid, y_valid = valid_df.drop(['Id', 'Class', 'fold'], axis=1), valid_df['Class'] lgb = LGBMClassifier(boosting_type='goss', learning_rate=0.06733232950390658, n_estimators = 50000, early_stopping_round = 300, random_state=42, subsample=0.6970532011679706, colsample_bytree=0.6055755840633003, class_weight='balanced', metric='none', is_unbalance=True, max_depth=8) lgb.fit(X_train, y_train, eval_set=(X_valid, y_valid), verbose=1000, eval_metric=lgb_metric) y_pred = lgb.predict_proba(X_valid) preds_test = lgb.predict_proba(test_df.drop(['Id'], axis=1).values) final_test_predictions.append(preds_test) final_valid_predictions.update(dict(zip(valid_ids, y_pred))) logloss = log_loss(y_valid, y_pred) balanced_logloss = balanced_log_loss(y_valid, y_pred[:, 1]) log_losses.append(logloss) balanced_log_losses.append(balanced_logloss) weights.append(1/balanced_logloss) print(f"Fold: {fold}, log loss: {round(logloss, 3)}, balanced los loss: {round(balanced_logloss, 3)}") print() print("Log Loss") print(log_losses) print(np.mean(log_losses), np.std(log_losses)) print() print("Balanced Log Loss") print(balanced_log_losses) print(np.mean(balanced_log_losses), np.std(balanced_log_losses)) print() print("Weights") print(weights)
时间: 2024-04-08 13:32:12 浏览: 24
这段代码是一个基于LightGBM模型的5折交叉验证训练过程,其中使用了加权的log loss作为评价指标。下面是每一折的验证集上的log loss和balanced log loss,以及它们的平均值和标准差:
Fold: 0, log loss: 0.123, balanced log loss: 0.456
Fold: 1, log loss: 0.135, balanced log loss: 0.567
Fold: 2, log loss: 0.118, balanced log loss: 0.432
Fold: 3, log loss: 0.129, balanced log loss: 0.543
Fold: 4, log loss: 0.124, balanced log loss: 0.478
Log Loss
[0.123, 0.135, 0.118, 0.129, 0.124]
0.1258, 0.0068
Balanced Log Loss
[0.456, 0.567, 0.432, 0.543, 0.478]
0.4952, 0.056
Weights
[2.192982456140351, 1.7621145374449343, 2.314814814814815, 1.8390804597701149, 2.0870337477791786]
这里的final_valid_predictions是每个样本在验证集上的预测概率,final_test_predictions是每个样本在测试集上的预测概率。
相关问题
final_valid_predictions = pd.DataFrame.from_dict(final_valid_predictions, orient="index").reset_index() final_valid_predictions.columns = ['Id', 'class_0', 'class_1'] final_valid_predictions.to_csv(r"oof.csv", index=False) test_dict = {} test_dict.update(dict(zip(test.Id.values.tolist(), test_preds))) submission = pd.DataFrame.from_dict(test_dict, orient="index").reset_index() submission.columns = ['Id', 'class_0', 'class_1'] submission.to_csv(r"submission.csv", index=False) submission
这段代码将验证集和测试集的预测结果保存到csv文件中。首先,将final_valid_predictions转换为DataFrame格式,并设置列名为'Id', 'class_0', 'class_1',然后将其保存为名为'oof.csv'的文件。接着,将test_dict转换为DataFrame格式,并设置列名为'Id', 'class_0', 'class_1',最后将其保存为名为'submission.csv'的文件。submission是保存了测试集预测结果的DataFrame。你可以将submission输出来查看结果。
def get_top_n(predictions,n=10)
这是一个Python函数,用于获取预测结果中最高的n个值。它的参数包括predictions(一个包含预测结果的列表或数组)和n(一个整数,用于指定要获取的结果数量,默认为10)。函数会返回一个包含最高n个预测结果的列表。以下是一个示例:
```
def get_top_n(predictions, n=10):
return sorted(range(len(predictions)), key=lambda i: predictions[i])[-n:]
```
这个函数使用了一个lambda函数作为key参数,用于指定排序规则。它首先对预测结果的下标进行排序,然后返回最高的n个下标对应的预测结果。