SELECT t2.city_name, COUNT(DISTINCT t1.cuid) AS count FROM ( SELECT cuid 代码中仍然有distinct FROM map_huiyan_short_stay WHERE loc_x > 12993523 AND loc_x < 13012467 AND loc_y > 4773079 AND loc_y < 4789463 AND event_day = 20230605 ) sub WHERE rn = 1 GROUP BY cuid ) t1 JOIN ( SELECT cuid, city_name FROM map_huiyan_long_stay_with_area WHERE event_day = 20230605 AND event_type = 'home' AND event_os = 'android' ) t2 ON t1.cuid = t2.cuid GROUP BY t2.city_name;
时间: 2024-04-08 10:28:56 浏览: 154
count_distinct:在PostgreSQL中扩展以COUNT(DISTINCT ...)聚合的替代方法
5星 · 资源好评率100%
在Hive中,由于不支持在GROUP BY子句中使用DISTINCT关键字,你可以使用子查询和窗口函数来实现类似的功能。以下是修改后的代码:
```sql
SELECT t2.city_name, COUNT(DISTINCT t1.cuid) AS count
FROM (
SELECT cuid
FROM (
SELECT cuid, ROW_NUMBER() OVER (PARTITION BY cuid ORDER BY cuid) AS rn
FROM map_huiyan_short_stay
WHERE loc_x > 12993523 AND loc_x < 13012467 AND loc_y > 4773079 AND loc_y < 4789463 AND event_day = 20230605
) sub
WHERE rn = 1
) t1
JOIN (
SELECT cuid, city_name
FROM map_huiyan_long_stay_with_area
WHERE event_day = 20230605 AND event_type = 'home' AND event_os = 'android'
) t2 ON t1.cuid = t2.cuid
GROUP BY t2.city_name;
```
在这个修改后的代码中,我们使用了窗口函数ROW_NUMBER()来为每个cuid分配一个行号,并使用PARTITION BY子句将数据分区。然后,我们在子查询中筛选出行号为1的记录,以确保每个cuid只出现一次。最后,我们通过使用COUNT(DISTINCT)来计算每个城市的唯一cuid数量。
阅读全文