首页pyspark worldcount案例

pyspark worldcount案例

时间: 2023-08-04 16:08:28 浏览: 106

好的，以下是一个基本的 PySpark Word Count 案例： ```python from pyspark import SparkConf, SparkContext # 创建SparkConf对象 conf = SparkConf().setAppName("WordCount") # 创建SparkContext对象 sc = SparkContext(conf=conf) # 读取输入文件 input_file_path = "input.txt" input_file = sc.textFile(input_file_path) # 对输入文件中的每一行进行切分并扁平化处理 words = input_file.flatMap(lambda line: line.split(" ")) # 对每个单词进行计数 word_counts = words.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b) # 将计数结果存储到输出文件中 output_file_path = "output.txt" word_counts.saveAsTextFile(output_file_path) # 关闭SparkContext sc.stop() ``` 上述代码中，首先创建了一个 `SparkConf` 对象，用于设置应用程序名称。然后创建了一个 `SparkContext` 对象，用于连接到Spark集群。接着，使用 `textFile` 方法读取输入文件，并对每一行进行切分和扁平化处理。然后使用 `map` 和 `reduceByKey` 方法对每个单词进行计数。最后，使用 `saveAsTextFile` 方法将计数结果存储到输出文件中。最后关闭 `SparkContext`。假设输入文件 `input.txt` 包含以下内容： ``` Hello World Hello PySpark PySpark World Count ``` 则上述代码的输出结果为： ``` (PySpark,2) (Count,1) (Helllo,1) (World,2) ```

阅读全文