flink对文档一列统计前10top打包执行代码
时间: 2024-01-16 15:02:49 浏览: 75
以下是使用Flink对文档中的一列进行统计,并输出前10个最大值的打包执行代码:
```java
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import java.util.Collections;
public class Top10Words {
public static void main(String[] args) throws Exception {
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<String> text = env.readTextFile("path/to/input/file");
DataSet<Tuple2<String, Integer>> counts =
text.flatMap((String line, Collector<Tuple2<String, Integer>> out) -> {
for (String word : line.split(" ")) {
if (word.length() > 0) {
out.collect(new Tuple2<>(word, 1));
}
}
})
.groupBy(0)
.reduce((ReduceFunction<Tuple2<String, Integer>>) (t1, t2) -> new Tuple2<>(t1.f0, t1.f1 + t2.f1));
DataSet<Tuple2<String, Integer>> topCounts = counts
.map(new MapFunction<Tuple2<String, Integer>, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> map(Tuple2<String, Integer> value) throws Exception {
return value;
}
})
.sortPartition(1, org.apache.flink.api.common.operators.Order.DESCENDING)
.setParallelism(1)
.first(10);
topCounts.print();
}
}
```
在这个示例中,我们使用了Flink的DataSet API来读取文本文件,将每个单词映射为一个键值对(单词, 1),并通过reduce操作将每个单词的计数相加。然后我们将结果按照计数排序,并输出前10个最大值。在执行时,我们需要将代码打包成一个jar文件,并在集群上执行。
阅读全文