pandas将数据写入Hive的方法

可以使用PyHive库将pandas数据写入Hive。以下是一个示例代码： ```python from pyhive import hive import pandas as pd # 创建连接 conn = hive.Connection(host='localhost', port=10000, username='hiveuser') # 创建表 query = """ CREATE TABLE mytable ( col1 STRING, col2 INT ) """ with conn.cursor() as cursor: cursor.execute(query) # 将pandas数据写入表 df = pd.DataFrame({'col1': ['foo', 'bar'], 'col2': [1, 2]}) with conn.cursor() as cursor: cursor.execute("USE mydatabase") cursor.execute("SET hive.exec.dynamic.partition.mode=nonstrict") cursor.execute("SET hive.exec.max.dynamic.partitions=10000") cursor.execute("SET hive.exec.max.dynamic.partitions.pernode=10000") cursor.execute("SET hive.enforce.bucketing=true") cursor.execute("SET hive.mapred.mode=nonstrict") cursor.execute("SET hive.optimize.index.filter=true") cursor.execute("SET hive.optimize.ppd=true") cursor.execute("SET hive.vectorized.execution.enabled=true") cursor.execute("SET hive.vectorized.execution.reduce.enabled=true") cursor.execute("SET hive.vectorized.execution.reduce.groupby.enabled=true") cursor.execute("SET hive.vectorized.execution.reduce.groupby.fixed.ordered=false") cursor.execute("SET hive.vectorized.execution.reduce.groupby.variable.estimated=false") cursor.execute("SET hive.vectorized.execution.reduce.groupby.variable.exact=false") cursor.execute("SET hive.vectorized.execution.reduce.groupby.variable.force=false") cursor.execute("SET hive.vectorized.execution.reduce.groupby.variable.width=32768") cursor.execute("SET hive.vectorized.execution.row.filter.enabled=true") cursor.execute("SET hive.vectorized.execution.row.filter.pushdown=true") cursor.execute("SET hive.vectorized.groupby.checkinterval=4096") cursor.execute("SET hive.cbo.enable=true") cursor.execute("SET hive.stats.fetch.column.stats=true") cursor.execute("SET hive.stats.fetch.partition.stats=true") cursor.execute("SET hive.compute.query.using.stats=true") cursor.execute("SET hive.stats.join.factor=1.0") cursor.execute("SET hive.stats.key.prefix=stats_") cursor.execute("SET hive.stats.ndv.error=0.05") cursor.execute("SET hive.stats.reliable=true") cursor.execute("SET hive.stats.autogather=true") cursor.execute("SET hive.stats.autogather.interval=10000") cursor.execute("SET hive.stats.autogather.maxsize=10000") df.to_sql(name='mytable', con=conn, if_exists='append', index=False) # 关闭连接 conn.close() ``` 请注意，代码中的连接信息需要根据您的环境进行修改。另外，如果需要写入分区表，可以在`to_sql`方法中使用`partition_by`参数指定分区列。

阅读全文

pandas将数据写入Hive的方法

相关推荐

python处理数据,存进hive表的方法

Python pandas 列转行操作详解(类似hive中explode方法)

用python把数据写入hive表

pandas实现to_sql将DataFrame保存到数据库中

使用Python构造hive insert语句说明

python3.6链接hive和impala的所有依赖包.zip

Python与Hive：数据分析利器组合

使用Scala连接Hive数据仓库进行数据读写操作

Hive数据集成与ETL实战技巧

Hive与数据可视化工具的整合

python dateframe写入hive

spark将已经pandas读取出来的dataframe数据存入hive

python如何把dataframe写入到一个hive的临时表中

python链接kyuubi写入数据

pandas 专pyspark

Hive ddl建表语句生成execl

利用python，从多个hive表导出多个excel文件

最新推荐

Python pandas 列转行操作详解(类似hive中explode方法)

Python连接HDFS实现文件上传下载及Pandas转换文本文件到CSV操作

基于freeRTOS和STM32F103x的手机远程控制浴室温度系统设计源码

LABVIEW程序实例-web写数据.zip

Windows平台下的Fastboot工具使用指南

管理建模和仿真的文件

DLMS规约深度剖析：从基础到电力通信标准的全面掌握

修改代码，使其正确运行

Python机器学习基础入门与项目实践

"互动学习：行动中的多样性与论文攻读经历"