如何在Python中不导入sklearn库使用train_test_split进行数据划分?
时间: 2024-12-11 15:17:32 浏览: 10
python中导入 train_test_split提示错误的解决
在Python中,如果你想在不依赖`sklearn`库的情况下进行数据集的训练集和测试集划分,你可以使用内置的random模块。`train_test_split`函数通常用于`sklearn.model_selection`模块中,但我们可以手动实现类似的功能。以下是一个简单的例子:
```python
import random
def train_test_split(data, targets, test_size=0.2, random_state=None):
"""
分割数据集为训练集和测试集
参数:
data (list or array): 输入特征数据
targets (list or array): 目标变量或标签
test_size (float, optional): 测试集的比例,默认为0.2
random_state (int, optional): 随机种子保持结果一致,默认None
返回:
tuple: (train_data, train_labels), (test_data, test_labels)
"""
assert len(data) == len(targets), "输入数据和标签长度不匹配"
indices = list(range(len(data)))
random.Random(random_state).shuffle(indices)
split_index = int(len(indices) * (1 - test_size))
train_indices = indices[:split_index]
test_indices = indices[split_index:]
train_data = [data[i] for i in train_indices]
train_labels = [targets[i] for i in train_indices]
test_data = [data[i] for i in test_indices]
test_labels = [targets[i] for i in test_indices]
return train_data, train_labels, test_data, test_labels
# 示例
data = ... # 你的数据列表
labels = ... # 标签列表
train_data, train_labels, test_data, test_labels = train_test_split(data, labels)
```
在这个例子中,我们首先生成了随机的索引列表,然后根据给定的测试集比例划分训练集和测试集。请注意,这个自定义版本没有`stratify`选项来保证类别平衡等高级特性,如果你需要这些功能,还是推荐使用`sklearn`库。
阅读全文