Scala在df中按照season进行分组聚合求另外三列平均数
时间: 2024-01-21 19:18:09 浏览: 68
假设你有一个DataFrame,其中包含四列:season、col1、col2和col3。你可以按照season列进行分组,然后使用agg函数计算col1、col2和col3的平均值。下面是一个简单的Scala代码示例:
```scala
import org.apache.spark.sql.functions._
// 创建DataFrame
val df = Seq(
("Spring", 10, 20, 30),
("Spring", 20, 30, 40),
("Summer", 15, 25, 35),
("Summer", 25, 35, 45),
("Fall", 10, 15, 20),
("Fall", 20, 25, 30),
("Winter", 5, 10, 15),
("Winter", 15, 20, 25)
).toDF("season", "col1", "col2", "col3")
// 按照season列进行分组,计算col1、col2和col3的平均值
val result = df.groupBy("season")
.agg(avg("col1").alias("avg_col1"), avg("col2").alias("avg_col2"), avg("col3").alias("avg_col3"))
// 显示结果
result.show()
```
输出结果应该类似于:
```
+------+--------+--------+--------+
|season|avg_col1|avg_col2|avg_col3|
+------+--------+--------+--------+
|Winter| 10.0| 15.0| 20.0|
|Spring| 15.0| 25.0| 35.0|
|Summer| 20.0| 30.0| 40.0|
| Fall| 15.0| 20.0| 25.0|
+------+--------+--------+--------+
```
阅读全文