首页用python按时间分批读取pg库数据，做整体分组排序处理

用python按时间分批读取pg库数据，做整体分组排序处理

时间: 2023-05-21 22:04:30 浏览: 282

可以使用Python中的psycopg2库连接到pg数据库，并使用SQL查询语句按时间进行分批读取数据，然后使用Python的pandas库进行整体分组排序处理。具体代码如下： ```python import psycopg2 import pandas as pd # 设置参数 conn_params = { 'host': 'your_host', 'port': 5432, 'database': 'your_database', 'user': 'your_username', 'password': 'your_password' } # 连接数据库 conn = psycopg2.connect(**conn_params) # 设置SQL查询语句 sql = """ SELECT * FROM your_table WHERE created_at BETWEEN '%s' AND '%s' -- 按时间筛选数据 ORDER BY id ASC -- 升序排列 LIMIT %s OFFSET %s -- 分页查询 """ # 设置参数 batch_size = 1000 # 分批读取的每个批次大小 offset = 0 # 分页查询的起始位置 # 读取并处理数据 df_list = [] while True: # 执行SQL查询语句 cursor = conn.cursor() cursor.execute(sql % ('2022-01-01', '2022-01-31', batch_size, offset)) rows = cursor.fetchall() if len(rows) == 0: break # 将数据转换为DataFrame格式 columns = [desc[0] for desc in cursor.description] df = pd.DataFrame(rows, columns=columns) # 加入DataFrame列表 df_list.append(df) offset += batch_size # 合并所有DataFrame并按结果分组排序处理 result = pd.concat(df_list).groupby('your_column').apply(lambda x: x.sort_values('your_column2')) ``` 希望能够帮到你，如果有更多问题，欢迎随时提出。

阅读全文