使用pyspark groupBy 对 ['hello', 'you', 'Spark', 'Flink', 'hello', 'me', 'hello', 'she', 'Spark']进行分组
时间: 2023-04-06 15:04:34 浏览: 86
好的,这个问题可以回答。使用pyspark的groupBy函数可以对列表中的元素进行分组,代码如下:
```
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("groupby_example").getOrCreate()
data = ['hello', 'you', 'Spark', 'Flink', 'hello', 'me', 'hello', 'she', 'Spark']
rdd = spark.sparkContext.parallelize(data)
grouped = rdd.groupBy(lambda x: x)
for key, value in grouped.collect():
print("{0}: {1}".format(key, list(value)))
```
输出结果为:
```
hello: ['hello', 'hello', 'hello']
you: ['you']
Spark: ['Spark', 'Spark']
Flink: ['Flink']
me: ['me']
she: ['she']
```
阅读全文