使用Spark RDD找出所有只打1分以下,且评论次数大于5次的用户,用scala编程语言
时间: 2024-03-23 15:39:42 浏览: 68
基于Scala的Spark RDD、Spark SQL、Spark Streaming相关Demo设计源码
好的,以下是使用Spark RDD找出所有只打1分以下,且评论次数大于5次的用户的Scala代码:
```scala
import org.apache.spark.{SparkConf, SparkContext}
val conf = new SparkConf().setAppName("user analysis")
val sc = new SparkContext(conf)
val data = sc.textFile("ratings.csv")
val header = data.first()
val ratings = data.filter(line => line != header)
.map(line => line.split(","))
.map(fields => (fields(0), (fields(2).toFloat, 1)))
.reduceByKey((x, y) => (x._1 + y._1, x._2 + y._2))
.filter(x => x._2._1 / x._2._2 <= 1.0 && x._2._2 > 5)
.collect()
println("Users who rated only 1 star and have reviewed more than 5 times:")
for (user <- ratings) {
println(s"User ${user._1} - Average rating: ${user._2._1 / user._2._2}, Review count: ${user._2._2}")
}
```
这段代码与Python代码的思路相同,只是使用了Scala语言的语法。
阅读全文