hive多表关键后取最新一条数据
时间: 2023-10-18 20:13:09 浏览: 86
假设有两个表table1和table2,它们有相同的关键字key,需要取得每个key对应的最新一条数据。
可以使用Hive的窗口函数和子查询实现:
```
SELECT t1.*
FROM (
SELECT key, MAX(timestamp) AS max_timestamp
FROM (
SELECT key, timestamp FROM table1
UNION ALL
SELECT key, timestamp FROM table2
) t
GROUP BY key
) subq
JOIN (
SELECT key, timestamp, col1, col2, ...
FROM table1
UNION ALL
SELECT key, timestamp, col1, col2, ...
FROM table2
) t1
ON subq.key = t1.key AND subq.max_timestamp = t1.timestamp;
```
首先,使用UNION ALL将两个表合并成一个结果集,然后使用子查询计算每个key对应的最新时间戳max_timestamp,最后再将结果集与原始表进行JOIN操作,取得每个key对应的最新一条数据。
阅读全文