使用java,scala,python三种语言实现spark的RDD的join算子的案例
时间: 2024-05-26 19:10:29 浏览: 122
可以使用以下代码实现Spark的RDD join算子案例:
Java:
JavaPairRDD<Integer, String> rdd1 = new JavaSparkContext().parallelizePairs(Arrays.asList(
new Tuple2<>(1, "apple"),
new Tuple2<>(2, "banana"),
new Tuple2<>(3, "cherry")
));
JavaPairRDD<Integer, Integer> rdd2 = new JavaSparkContext().parallelizePairs(Arrays.asList(
new Tuple2<>(1, 10),
new Tuple2<>(2, 20),
new Tuple2<>(3, 30)
));
JavaPairRDD<Integer, Tuple2<String, Integer>> result = rdd1.join(rdd2);
result.foreach(t -> System.out.println(t));
Scala:
val rdd1 = sc.parallelize(Array(
(1, "apple"),
(2, "banana"),
(3, "cherry")
))
val rdd2 = sc.parallelize(Array(
(1, 10),
(2, 20),
(3, 30)
))
val result = rdd1.join(rdd2)
result.foreach(println)
Python:
rdd1 = sc.parallelize([
(1, "apple"),
(2, "banana"),
(3, "cherry")
])
rdd2 = sc.parallelize([
(1, 10),
(2, 20),
(3, 30)
])
result = rdd1.join(rdd2)
result.foreach(print)
阅读全文