school_df = school_df.dropna(thresh=len(school_df)*0.9, axis=1)
时间: 2024-05-19 18:15:58 浏览: 139
This line of code drops columns in the dataset 'school_df' that have missing values more than 10% of the total number of rows. The parameter 'thresh' indicates the minimum number of non-null values that a column should have to be retained, and 'axis=1' specifies that the operation should be applied across columns. This code helps in cleaning the dataset by removing irrelevant or incomplete columns.
相关问题
param = {'num_leaves': 31, 'min_data_in_leaf': 20, 'objective': 'binary', 'learning_rate': 0.06, "boosting": "gbdt", "metric": 'None', "verbosity": -1} trn_data = lgb.Dataset(trn, trn_label) val_data = lgb.Dataset(val, val_label) num_round = 666 # clf = lgb.train(param, trn_data, num_round, valid_sets=[trn_data, val_data], verbose_eval=100, # early_stopping_rounds=300, feval=win_score_eval) clf = lgb.train(param, trn_data, num_round) # oof_lgb = clf.predict(val, num_iteration=clf.best_iteration) test_lgb = clf.predict(test, num_iteration=clf.best_iteration)thresh_hold = 0.5 oof_test_final = test_lgb >= thresh_hold print(metrics.accuracy_score(test_label, oof_test_final)) print(metrics.confusion_matrix(test_label, oof_test_final)) tp = np.sum(((oof_test_final == 1) & (test_label == 1))) pp = np.sum(oof_test_final == 1) print('accuracy1:%.3f'% (tp/(pp)))test_postive_idx = np.argwhere(oof_test_final == True).reshape(-1) # test_postive_idx = list(range(len(oof_test_final))) test_all_idx = np.argwhere(np.array(test_data_idx)).reshape(-1) stock_info['trade_date_id'] = stock_info['trade_date'].map(date_map) stock_info['trade_date_id'] = stock_info['trade_date_id'] + 1tmp_col = ['ts_code', 'trade_date', 'trade_date_id', 'open', 'high', 'low', 'close', 'ma5', 'ma13', 'ma21', 'label_final', 'name'] stock_info.iloc[test_all_idx[test_postive_idx]] tmp_df = stock_info[tmp_col].iloc[test_all_idx[test_postive_idx]].reset_index() tmp_df['label_prob'] = test_lgb[test_postive_idx] tmp_df['is_limit_up'] = tmp_df['close'] == tmp_df['high'] buy_df = tmp_df[(tmp_df['is_limit_up']==False)].reset_index() buy_df.drop(['index', 'level_0'], axis=1, inplace=True)buy_df['buy_flag'] = 1 stock_info_copy['sell_flag'] = 0tmp_idx = (index_df['trade_date'] == test_date_min+1) close1 = index_df[tmp_idx]['close'].values[0] test_date_max = 20220829 tmp_idx = (index_df['trade_date'] == test_date_max) close2 = index_df[tmp_idx]['close'].values[0]tmp_idx = (stock_info_copy['trade_date'] >= test_date_min) & (stock_info_copy['trade_date'] <= test_date_max) tmp_df = stock_info_copy[tmp_idx].reset_index(drop=True)from imp import reload import Account reload(Account) money_init = 200000 account = Account.Account(money_init, max_hold_period=20, stop_loss_rate=-0.07, stop_profit_rate=0.12) account.BackTest(buy_df, tmp_df, index_df, buy_price='open')tmp_df2 = buy_df[['ts_code', 'trade_date', 'label_prob', 'label_final']] tmp_df2 = tmp_df2.rename(columns={'trade_date':'buy_date'}) tmp_df = account.info tmp_df['buy_date'] = tmp_df['buy_date'].apply(lambda x: int(x)) tmp_df = tmp_df.merge(tmp_df2, on=['ts_code', 'buy_date'], how='left')最终的tmp_df是什么?tmp_df[tmp_df['label_final']==1]又选取了什么股票?
根据代码逐行分析,tmp_df 是一个 DataFrame,包含了股票的信息以及回测结果。其中,选取了 label_final 为 1 的股票,也就是模型预测为涨的股票,并且过滤掉了当天涨停的股票。最终买入的股票信息保存在 buy_df 中,回测时将其与 stock_info_copy、index_df 进行合并,得到了回测的结果,并将买入的股票信息与回测结果合并,生成了最终的 tmp_df。
vm.swappiness=0 net.ipv4.ip_forward=1 net.bridge.bridge-nf-call-iptables=1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.neigh.default.gc_thresh1=4096 net.ipv4.neigh.default.gc_thresh2=6144 net.ipv4.neigh.default.gc_thresh3=8192
这些是一些常见的Linux内核参数配置,用于优化网络和系统性能。具体含义如下:
- vm.swappiness=0:这个参数控制系统在内存不足时使用交换空间的程度。将其设置为0可以减少系统使用交换空间的频率,提高性能。
- net.ipv4.ip_forward=1:这个参数用于启用IPv4数据包转发功能,允许Linux系统作为路由器转发IP数据包。
- net.bridge.bridge-nf-call-iptables=1:这个参数用于启用iptables桥接模块,允许iptables对桥接的数据包进行过滤。
- net.bridge.bridge-nf-call-ip6tables=1:这个参数用于启用ip6tables桥接模块,允许ip6tables对桥接的IPv6数据包进行过滤。
- net.ipv4.neigh.default.gc_thresh1=4096:这个参数用于设置ARP缓存清理的阈值。当ARP缓存中的条目数量超过该阈值时,系统将启动清理操作。
- net.ipv4.neigh.default.gc_thresh2=6144:这个参数是第二个清理阈值,当ARP缓存中的条目数量超过该阈值时,系统将进一步加大清理力度。
- net.ipv4.neigh.default.gc_thresh3=8192:这个参数是第三个清理阈值,当ARP缓存中的条目数量超过该阈值时,系统将以最大力度进行清理。
这些参数的具体配置需要根据系统需求和网络环境进行调整。请确保在修改这些参数之前了解其含义和潜在影响,并谨慎操作。
阅读全文