mapreduce程序，词频统计

好的，词频统计是一个非常经典的MapReduce程序案例。下面是一份Java代码示例： ``` public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ``` 在这个示例中，Mapper将输入文本分割成单词，并且将每个单词映射为键值对（单词，1）。然后，Combiner将相同单词的键值对合并起来，并且将它们的值相加。最后，Reducer将每个单词的总和输出为键值对（单词，总和）。在MapReduce执行过程中，Hadoop会自动将数据分割为若干个等大小的块，并行处理这些块，最终将它们的结果合并输出。

阅读全文

mapreduce程序，词频统计

相关推荐

大数据 hadoop mapreduce 词频统计

MapReduce实验_统计词频.docx

基于MapReduce的词频统计程序及其重构、MapReduce编程之Combiner、Partitioner组件应用.zip

编写MapReduce的词频统计

mapreduce词频统计代码程序

hadoop mapreduce单词词频统计

MapReduce实现词频统计实验原理

mapreduce词频统计

帮我编写编写MapReduce的词频统计源码

mapreduce词频统计python

在Hadoop伪分布式环境下，如何使用Java编写MapReduce程序来完成词频统计任务？请结合《Hadoop MapReduce实验：词频统计详解》提供详细步骤。

使用mapreduce写一个词频统计

在集群运行模式提交词频统计MapReduce程序运行的步骤是

用idea创建maven项目 MapReduce词频统计中文生成文件乱码

在伪分布式Hadoop环境中，如何利用Java编写MapReduce程序来完成词频统计任务，并确保程序能够在指定的Hadoop版本上正常运行？

mapreduce实现单词的词频统计实验目的和要求

hadoop streaming 词频统计

用java编写mapreduce程序

hdfs词频统计idea

最新推荐

基于hadoop的词频统计.docx

《大数据导论》MapReduce的应用.docx

手把手教你Hadoop环境搭建、词频统计demo及原理

CPPC++_低成本实现Wooting键盘的Rapid trigger功能不必为几个按键购买整个键盘人人都能做Wouo.zip

CPPC++_可能是世界上最快的协同程序库.zip

前端协作项目：发布猜图游戏功能与待修复事项

管理建模和仿真的文件

【高斯信道信号编码优化】：4大方法优化Chirp信号编码过程

对给定图，实现图的深度优先遍历和广度优先遍历。以邻接表或邻接矩阵为存储结构，实现连通无向图的深度优先和广度优先遍历。以用户指定的结点为起点，分别输出每种遍历下的结点访问序列，并调试算法。使用C语言

Spring框架REST服务开发实践指南