partitioned BY (pdate STRING) stored AS parquet
时间: 2024-10-13 10:12:53 浏览: 47
`partitioned BY (pdate STRING)` 是一种数据库表设计模式,它用于Apache Parquet文件格式中,特别是Hive SQL或Spark SQL这样的大数据处理框架中。`partitioned`关键字表示这个表是分区的,`BY`后面跟着的是分区键(如 `pdate STRING`),这是一种日期类型的列,用于将大表的数据按照不同的日期分成小的逻辑部分,每个分区都是独立可读的。
`stored AS parquet` 指明这些分区数据是以Parquet格式存储的。Parquet是一种高效的列式存储格式,它允许对数据进行压缩并支持列级别的选择性读取,这对于大规模数据分析非常有利。
具体操作示例如下:
```sql
CREATE TABLE sales (
id INT,
product STRING,
amount DECIMAL(10,2),
pdate STRING)
PARTITIONED BY (pdate STRING)
STORED AS PARQUET;
```
在这个例子中,`sales` 表会被按照`pdate`字段分区,这意味着你可以单独查询特定日期范围内的销售数据,而无需扫描整个表。
相关问题
create table if not exists MODEL_BACK.DCV_TB_LXY_ADXY_0(`ad_s rv_id` DOUBLE,`ad_acc_nbr` STRING,`acc_nbr2` STRING,`end_date` STRING,`fee_xy` STRING,`updatetime` STRING) partitioned by (updatetime string) stored as parquet impala的这个建表语句报Duplicate column name: updatetime这个错误,这是什么原因
Impala的建表语句中,分区字段是表数据的逻辑划分,不能和表的列名重复。在你的建表语句中,updatetime既是列名,又是分区字段,导致了该错误的发生。解决方法是将分区字段的名称修改,例如将分区字段改为partition_date,建表语句可以修改为:
```
create table if not exists MODEL_BACK.DCV_TB_LXY_ADXY_0(
`ad_s rv_id` DOUBLE,
`ad_acc_nbr` STRING,
`acc_nbr2` STRING,
`end_date` STRING,
`fee_xy` STRING,
`updatetime` STRING
) partitioned by (partition_date string) stored as parquet;
```
需要注意的是,修改分区字段名称后,查询该表时需要使用新的分区字段名进行查询。
以下HOLODESK建表语句正确的有( ) 2分 CREATE FACT TABLE WECHAT_HISTORY( ID STRING, STORE STRING, TRANS_NO STRING, TRANS_DATE DATE, TRANS_TYPE STRING) PARTITIONED BY (TRANS_DATE) CLUSTERED BY (ID) STORED AS HOLODESK; CREATE FACT TABLE WECHAT_HISTORY( ID STRING, STORE STRING, TRANS_NO STRING, TRANS_DATE DATE, TRANS_TYPE STRING) PARTITIONED BY (PARTITION_DATE DATE) CLUSTERED BY (ID) STORED AS HOLODESK; CREATE TABLE WECHAT_HISTORY( ID STRING, STORE STRING, TRANS_NO STRING, TRANS_DATE DATE, TRANS_TYPE STRING) PARTITIONED BY (PARTITION_DATE DATE) CLUSTERED BY (ID) INTO 3 BUCKETS STORED AS HOLODESK; CREATE TABLE WECHAT_HISTORY( ID STRING, STORE STRING, TRANS_NO STRING, TRANS_DATE DATE, TRANS_TYPE STRING) PARTITIONED BY (PARTITION_DATE DATE) CLUSTERED BY (ID) STORED AS HOLODESK;
以下HOLODESK建表语句正确的有:
1. CREATE FACT TABLE WECHAT_HISTORY( ID STRING, STORE STRING, TRANS_NO STRING, TRANS_DATE DATE, TRANS_TYPE STRING) PARTITIONED BY (TRANS_DATE) CLUSTERED BY (ID) STORED AS HOLODESK;
2. CREATE TABLE WECHAT_HISTORY( ID STRING, STORE STRING, TRANS_NO STRING, TRANS_DATE DATE, TRANS_TYPE STRING) PARTITIONED BY (PARTITION_DATE DATE) CLUSTERED BY (ID) STORED AS HOLODESK;
第一条语句是创建一个事实表WECHAT_HISTORY,使用TRANS_DATE进行分区,使用ID进行聚集,并将数据存储为HOLODESK格式。
第二条语句是创建一个表WECHAT_HISTORY,使用PARTITION_DATE进行分区,使用ID进行聚集,并将数据存储为HOLODESK格式。
第三条语句中的BUCKETS是无效的关键字,不符合HOLODESK建表语法规范,因此是错误的。
第四条语句只缺少了FACT关键字,应该是CREATE FACT TABLE WECHAT_HISTORY,其他部分是正确的。
所以,正确的建表语句有两个:
1. CREATE FACT TABLE WECHAT_HISTORY( ID STRING, STORE STRING, TRANS_NO STRING, TRANS_DATE DATE, TRANS_TYPE STRING) PARTITIONED BY (TRANS_DATE) CLUSTERED BY (ID) STORED AS HOLODESK;
2. CREATE TABLE WECHAT_HISTORY( ID STRING, STORE STRING, TRANS_NO STRING, TRANS_DATE DATE, TRANS_TYPE STRING) PARTITIONED BY (PARTITION_DATE DATE) CLUSTERED BY (ID) STORED AS HOLODESK;
阅读全文