select substring(page_type,1,6) as page_type, count(*) as count_num, round((count(*)/411665.0)*100,4) as weights from law_part where page_type_part=101 group by substring(page_type,1,6) order by count_num desc;改成sparksql
时间: 2024-03-05 12:55:15 浏览: 182
mastring-spark-sql
下面是将该SQL语句转换为Spark SQL的代码:
```
SELECT SUBSTRING(page_type,1,6) AS page_type, COUNT(*) AS count_num, ROUND((COUNT(*)/411665.0)*100,4) AS weights
FROM law_part
WHERE page_type_part=101
GROUP BY SUBSTRING(page_type,1,6)
ORDER BY count_num DESC;
```
转换为Spark SQL后的代码:
```scala
import org.apache.spark.sql.functions._
val resultDF = law_part
.filter($"page_type_part" === 101)
.groupBy(substring($"page_type", 1, 6).as("page_type"))
.agg(count("*").as("count_num"))
.withColumn("weights", round((col("count_num")/411665.0)*100, 4))
.orderBy(desc("count_num"))
```
其中,`law_part`是你的数据表,可以通过`spark.read.jdbc()`或其他方式加载数据。
阅读全文