python 使用boto3中upload_part上传17g大文件 完整代码
时间: 2024-12-16 07:17:25 浏览: 7
在Python中,使用Boto3库处理大文件(如17GB)时,由于内存限制,不适合一次性加载整个文件,而是需要使用`multipart upload`功能。以下是一个完整的示例,演示如何使用`boto3`处理大文件上传:
```python
import boto3
import os
import time
def upload_large_file_to_s3(file_path, bucket_name, object_key, part_size=8 * 1024 * 1024, max_attempts=5):
s3 = boto3.client('s3')
# 初始化Multipart Upload
mpu = s3.create_multipart_upload(Bucket=bucket_name, Key=object_key)
# 开始上传过程
uploaded_parts = []
start_time = time.time()
attempt = 0
while True:
try:
# 读取并上传part
with open(file_path, 'rb') as f:
byte_range = (attempt * part_size, (attempt + 1) * part_size - 1)
data = f.read(min(part_size, os.path.getsize(file_path) - byte_range[0]))
response = s3.upload_part(
Bucket=bucket_name,
Key=object_key,
UploadId=mpu['UploadId'],
PartNumber=attempt + 1,
Body=data,
Range=f"bytes={byte_range[0]}-{byte_range[1]}"
)
uploaded_parts.append({'PartNumber': response['PartNumber'], 'ETag': response['ETag']})
attempt += 1
if len(uploaded_parts) == int(os.path.getsize(file_path) / part_size) + 1:
break
except (IOError, FileNotFoundError, boto3.exceptions.S3UploadFailedError) as e:
print(f"Error uploading part {attempt+1}/{int(os.path.getsize(file_path) / part_size)+1}: {str(e)}")
if attempt >= max_attempts:
raise Exception("Maximum attempts exceeded, unable to complete upload.")
else:
print(f"Retrying after 5 seconds...")
time.sleep(5)
# 完成Multipart Upload
s3.complete_multipart_upload(
Bucket=bucket_name,
Key=object_key,
UploadId=mpu['UploadId'],
MultipartUpload={
'Parts': uploaded_parts
}
)
end_time = time.time()
print(f"File '{file_path}' uploaded successfully. Total time taken: {end_time - start_time} seconds.")
# 调用函数,假设本地文件名是big_file_17gb.txt,bucket名称是your_bucket,对象键是file_key
upload_large_file_to_s3('big_file_17gb.txt', 'your_bucket', 'file_key')
```
这段代码中,我们设置了每部分的大小为8MB,最大尝试次数为5次。如果超过这个次数还没有完成上传,会抛出异常。请注意替换实际的文件路径、S3桶名和对象键。
阅读全文