data = pd.read_csv('data.csv') train, test = train_test_split(data, test_size=0.3, random_state=42) oot = data.loc[data['month'] == '202107'] psi = data.loc[data['month'] == '202106'] 这段代码里面的训练集包含了oot和psi样本这是错误点对吗
时间: 2023-06-01 18:02:56 浏览: 158
数据集包括训练集和测试集
是的,这段代码中训练集包含了oot和psi样本,这是错误的。应该将oot和psi样本分别作为测试集和验证集,而训练集应该是剩余的数据。可以将代码修改如下:
```
data = pd.read_csv('data.csv')
train, test = train_test_split(data, test_size=0.3, random_state=42)
oot = data.loc[data['month'] == '202107']
psi = data.loc[data['month'] == '202106']
```
修改为:
```
data = pd.read_csv('data.csv')
oot = data.loc[data['month'] == '202107']
psi = data.loc[data['month'] == '202106']
train = data.loc[(data['month'] != '202107') & (data['month'] != '202106')]
test = data.loc[data['month'] == '202107']
```
这样就将oot和psi样本分别作为测试集和验证集,而训练集则包含了剩余的数据。
阅读全文