python分割csv超大文件
时间: 2024-01-10 17:40:03 浏览: 97
要分割超大的CSV文件,可以使用Python代码来实现。以下是一种可以实现此目的的代码示例:
```python
import pandas as pd
from pathlib import Path
res_file_path = Path(r"UserBehavior.csv") # 待分割文件路径
split_size = 1000000 # 子文件行数最大值
tar_dir = res_file_path.parent / ("split_" + res_file_path.name.split(".")[0])
if not tar_dir.exists():
tar_dir.mkdir()
print("创建文件夹\t" + str(tar_dir))
print("目标路径:\t" + str(tar_dir))
print("分割文件:\t" + str(res_file_path))
print("分割大小:\t" + "{:,}".format(split_size))
tmp = pd.read_csv(res_file_path, nrows=10)
columns = tmp.columns.to_list()
idx = 0
while len(tmp) > 0:
start = 1 + (idx * split_size)
tmp = pd.read_csv(res_file_path, header=None, names=columns, skiprows=start, nrows=split_size)
if len(tmp) <= 0:
break
file_name = res_file_path.name.split(".")[0 + "_{}_{}".format(start, start + len(tmp)) + ".csv"
file_path = tar_dir / file_name
tmp.to_csv(file_path, index=False)
idx += 1
print(file_name + "\t保存成功")
```
以上代码可以将超大的CSV文件分割为多个较小的子文件,每个子文件的行数不超过设定的split_size值。你只需将代码中的res_file_path替换为你的CSV文件路径,并根据需要调整split_size的大小即可。分割后的子文件将保存在与原文件同一目录下的"split_原文件名"文件夹中。<span class="em">1</span><span class="em">2</span><span class="em">3</span>
#### 引用[.reference_title]
- *1* [笔记:python分割csv超大文件并提取随机数据](https://blog.csdn.net/yanliar/article/details/128110499)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"]
- *2* *3* [python实现大型CSV文件的切割(以NGSIM数据为例)](https://blog.csdn.net/Mrcomj/article/details/122433104)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"]
[ .reference_list ]
阅读全文