python连接hdfs和hive,将hdfs中的csv文件导入hive的数据表中
时间: 2023-10-24 19:25:09 浏览: 175
可以使用PyHive库连接Hive和HDFS,并使用HiveQL语句将HDFS中的CSV文件导入Hive数据表中。下面是一个示例代码:
```python
from pyhive import hive
import pandas as pd
# 连接Hive
conn = hive.connect(host='localhost', port=10000, username='username')
# 创建Hive表
create_table_qry = """
CREATE TABLE IF NOT EXISTS my_table (
col1 STRING,
col2 INT,
col3 FLOAT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
"""
with conn.cursor() as cur:
cur.execute(create_table_qry)
# 从HDFS中导入数据到Hive表
load_data_qry = """
LOAD DATA INPATH '/path/to/csv/file.csv' INTO TABLE my_table
"""
with conn.cursor() as cur:
cur.execute(load_data_qry)
# 查询导入的数据
select_qry = """
SELECT * FROM my_table
"""
df = pd.read_sql(select_qry, conn)
print(df.head())
# 关闭连接
conn.close()
```
其中,需要将`host`和`username`参数替换为实际的Hive主机名和用户名,将`/path/to/csv/file.csv`替换为实际的HDFS中CSV文件的路径。在上述代码中,使用Pandas库读取Hive数据表中的数据并打印前5行。
阅读全文