Scala按照season进行分组聚合求平均数
时间: 2024-01-15 10:04:28 浏览: 28
假设你有一个包含以下内容的DataFrame:
```
+-------+-------+-------+-------+
| id | season| value | date |
+-------+-------+-------+-------+
| 1 | 2019 | 10.0 | 201901|
| 2 | 2019 | 20.0 | 201902|
| 3 | 2020 | 15.0 | 202001|
| 4 | 2020 | 25.0 | 202002|
+-------+-------+-------+-------+
```
你可以按照`season`进行分组,然后聚合求平均数:
```scala
import org.apache.spark.sql.functions._
val result = df.groupBy("season")
.agg(avg("value").as("average_value"))
.orderBy("season")
result.show()
```
输出结果如下:
```
+-------+--------------+
| season| average_value|
+-------+--------------+
| 2019| 15.0|
| 2020| 20.0|
+-------+--------------+
```
其中,`avg("value")`表示对`value`列求平均值,`as("average_value")`表示将结果列命名为`average_value`。`.orderBy("season")`表示按照`season`列进行升序排列。