spark之case...when...then...的两种实现方式
时间: 2023-05-02 12:01:07 浏览: 319
b'spark之case...when...then...的两种实现方式'是在Spark中使用SQL语句进行数据查询时,通过使用CASE函数来实现if-then-else语句的功能。这个函数通常由CASE和WHEN关键字构成,后面紧跟着各种具体条件,然后再用THEN进行判断,最后使用ELSE关键字返回结果。 有两种实现方式,一种是使用SQL语句中的CASE WHEN THEN ELSE END的方式,另一种是使用Spark中的when(条件).then(结果值).otherwise(结果值)的方式实现。两种方式都可以实现类似if-then-else的效果,具体使用哪种方式,根据数据查询的需求以及个人习惯来决定。
相关问题
mismatched input 'count' expecting <EOF>(line 1, pos 379) == SQL == insert overwrite table spark_odc_data.D_RE_BA_RAD_MAP_H partition(p_hour = '2023060600') select '202306060000' TIME_ID, a.IMSI, a.GJ, a.YYS, count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 then a.IMSI else null end) as MRWZGXQQCS, count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 and b.result<>1 then a.IMSI else null end) as MRWZGXCGCS, case when count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 then a.IMSI else null end)=0 then 0 else count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 and b.result<>1 then a.IMSI else null)/count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 then a.IMSI else null end) as MRWZGXCGL, sum(c.MRZJCS) as MRZJCS, sum(c.MRZJHJCGCS) as MRZJHJCGCS, sum(d.MRBJCS) as MRBJCS, sum(d.MRBJHJCGCS) as MRBJHJCGCS, case when sum(c.MRZJCS)=0 then 0 else sum(c.MRZJHJCGCS)/sum(c.MRZJCS) end as MRZJWLJTL, case when sum(c.MRZJCS)=0 then 0 else sum(d.MRBJHJCGCS)/sum(d.MRBJCS) end as MRBJWLJTL, count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 then a.IMSI else null end) as MCWZGXQQCS, count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 and b.result<>1 then a.IMSI else null end) as MCWZGXCGCS, case when count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 then a.IMSI else null end)=0 then 0 else count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 and b.result<>1 then a.IMSI else null end)/ count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 then a.IMSI else null end) as MCWZGXCGL, now() LOAD_TIME
这是一条 SQL 语句,用于将数据插入到名为 spark_odc_data.D_RE_BA_RAD_MAP_H 的表中,表的分区为 p_hour = '2023060600'。其中包含了多个字段和计算表达式,用于统计各种数据指标。具体的指标包括 MRWZGXQQCS、MRWZGXCGCS、MRWZGXCGL、MRZJCS、MRZJHJCGCS、MRZJWLJTL、MRBJCS、MRBJHJCGCS、MRBJWLJTL、MCWZGXQQCS、MCWZGXCGCS、MCWZGXCGL。最后还有一个 LOAD_TIME 字段,表示数据的加载时间。
select a.IMSI, a.GJ, a.YYS, count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 then a.IMSI else null end) as MRWZGXQQCS, count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 and b.result<>1 then a.IMSI else null end) as MRWZGXQQCS, case when count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 then a.IMSI else null end)=0 then 0 else count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 and b.result<>1 then a.IMSI else null)/ count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 then a.IMSI else null end) as MRWZGXCGL, sum(c.MRZJCS) as MRZJCS, sum(c.MRZJHJCGCS) as MRZJHJCGCS, sum(d.MRBJCS) as MRBJCS, sum(d.MRBJHJCGCS) as MRBJHJCGCS, case when sum(c.MRZJCS)=0 then 0 else sum(c.MRZJHJCGCS)/sum(c.MRZJCS) end as MRZJWLJTL, case when sum(c.MRZJCS)=0 then 0 else sum(d.MRBJHJCGCS)/sum(d.MRBJCS) end as MRBJWLJTL, count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 then a.IMSI else null end) as MCWZGXQQCS, count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 and b.result<>1 then a.IMSI else null end) as MCWZGXCGCS, case when count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 then a.IMSI else null end)=0 then 0 else count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 and b.result<>1 then a.IMSI else null end)/ count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 then a.IMSI else null end) as MCWZGXCGL from (select * from TAB_A union select * from TAB_B union select * from TAB_C)a left join (select * from spark_odc_dwd.D_ENS_GMAP_MM where p_hour='#{time yyyyMMddHH}') b on a.IMSI=b.IMSI left join TAB_D c on a.IMSI=c.calling_imsi left join TAB_E c on a.IMSI=c.called_imsi group by a.IMSI, a.GJ, a.YYS
这个SQL语句存在两个问题:
1. `count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 and b.result<>1 then a.IMSI else null end)` 这个计数列在前面已经定义过了,而且它的别名与另一个计数列的别名相同,应该更改其中一个别名。
2. `else count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 and b.result<>1 then a.IMSI else null end)/ count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 then a.IMSI else null end)` 这个计算列的语法有误,应该将它单独作为一个计算列,并且在分母为0时设置默认值0。
下面是修改后的SQL语句:
```
select
a.IMSI,
a.GJ,
a.YYS,
count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 then a.IMSI else null end) as MRWZGXQQCS,
count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 and b.result<>1 then a.IMSI else null end) as MRWZGXCGCS,
case when count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 then a.IMSI else null end)=0 then 0
else count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 and b.result<>1 then a.IMSI else null end) / count(case when substr(b.IMSI,1,5)<>46000 and b.operate_code=2 then a.IMSI else null end)
end as MRWZGXCGL,
sum(c.MRZJCS) as MRZJCS,
sum(c.MRZJHJCGCS) as MRZJHJCGCS,
sum(d.MRBJCS) as MRBJCS,
sum(d.MRBJHJCGCS) as MRBJHJCGCS,
case when sum(c.MRZJCS)=0 then 0
else sum(c.MRZJHJCGCS)/sum(c.MRZJCS)
end as MRZJWLJTL,
case when sum(d.MRBJCS)=0 then 0
else sum(d.MRBJHJCGCS)/sum(d.MRBJCS)
end as MRBJWLJTL,
count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 then a.IMSI else null end) as MCWZGXQQCS,
count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 and b.result<>1 then a.IMSI else null end) as MCWZGXCGCS,
case when count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 then a.IMSI else null end)=0 then 0
else count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 and b.result<>1 then a.IMSI else null end) / count(case when substr(b.msc,1,2)<>86 and a.GJ is not null and b.operate_code=2 then a.IMSI else null end)
end as MCWZGXCGL
from
(select * from TAB_A union select * from TAB_B union select * from TAB_C) a
left join
(select * from spark_odc_dwd.D_ENS_GMAP_MM where p_hour='#{time yyyyMMddHH}') b on a.IMSI=b.IMSI
left join
TAB_D c on a.IMSI=c.calling_imsi
left join
TAB_E d on a.IMSI=d.called_imsi
group by
a.IMSI,
a.GJ,
a.YYS
```
阅读全文