用scala输出以空格分隔的英文单词文本每行的前三个单词 点击复制后,将打开C知道体验页 点
时间: 2023-05-12 13:04:56 浏览: 99
Scala代码如下:
```scala
val text = "This is a sample text for testing purposes only"
val words = text.split(" ")
words.grouped(3).map(_.mkString(" ")).foreach(println)
```
输出结果为:
```
This is a sample
text for testing
purposes only
```
相关问题
假设用户不断输入文本(每一行包含若干单词,单词之间以空格分隔)。请编写Spark程序分别完成如下统计: (1)实时统计每10s内出现的单词数量(每10s统计1次); (2)实时统计最近30s内每个单词的出现次数(每10s统计1次); (3)实时统计每个单词的累积出现次数,并将结果保存到HDFS文件或(本地文件)(每10s统计1次)
以下是三个问题的 Spark Streaming 代码实现:
(1)实时统计每10s内出现的单词数量
```scala
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
val ssc = new StreamingContext(sparkConf, Seconds(1))
val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.countByValueAndWindow(Seconds(10), Seconds(10))
wordCounts.print()
ssc.start()
ssc.awaitTermination()
```
(2)实时统计最近30s内每个单词的出现次数
```scala
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
val ssc = new StreamingContext(sparkConf, Seconds(1))
val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.countByValueAndWindow(Seconds(30), Seconds(10))
wordCounts.print()
ssc.start()
ssc.awaitTermination()
```
(3)实时统计每个单词的累积出现次数,并将结果保存到HDFS文件或本地文件
```scala
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
val ssc = new StreamingContext(sparkConf, Seconds(1))
val lines = ssc.socketTextStream("localhost", 9999)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map((_, 1)).reduceByKeyAndWindow((a: Int, b: Int) => a + b, Seconds(30), Seconds(10))
wordCounts.foreachRDD(rdd => {
rdd.saveAsTextFile("hdfs://...")
})
ssc.start()
ssc.awaitTermination()
```
注意:以上代码仅为示例,实际使用需要根据具体需求进行修改。
阅读全文