使用mapreduce写一个词频统计

好的，这里是一个使用 MapReduce 统计单词频率的示例程序： ```java import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ``` 该程序包含两个 MapReduce 阶段，第一个 MapReduce 阶段是 Mapper，它将输入数据中的每个单词作为键，值设置为 1。Reducer 将相同键的值相加，并将结果输出为键值对。该程序还使用了 Combiner，这是一个可选的优化步骤，它会在 Map 阶段之后，在每个 Map 节点上对中间结果进行一次局部合并，以减少数据传输量和计算量。

阅读全文

使用mapreduce写一个词频统计

相关推荐

大数据 hadoop mapreduce 词频统计

词频统计程序

词频统计工具

使用mapreduce写一个词频统计，并给出每行代码的注释

MapReduce实现TopN中文词频统计+英文词频统计+中文词频统计

MapReduce实战：WordCount词频统计与HDFS应用

大数据导论：MapReduce实验实战—词频统计前十

使用MapReduce实现词频统计算法

MapReduce编程实例——词频统计

在Hadoop伪分布式环境下，如何使用Java编写MapReduce程序来完成词频统计任务？请结合《Hadoop MapReduce实验：词频统计详解》提供详细步骤。

怎么看待mapreduce实现单词的词频统计

mapreduce实现单词的词频统计实验目的和要求

spark rdd、python、mapreduce、hive实现词频统计的区别

如何在Hadoop分布式文件系统(HDFS)上使用Java实现MapReduce来完成WordCount词频统计？请提供一个具体的实现示例。

有短句‘A friend in need is a friend in deed',画流程图来展示使用MapReduce对它进行词频统计的过程，主要展示Map阶段与Reduce阶段的处理过程

mapreduce程序，词频统计

mapreduce实现中午词频统计

mapreduce词频统计

大数据小型项目源码之mapreduce英语单词频次统计

大数据mapreduce词频统计jar包(java语言)

最新推荐

基于hadoop的词频统计.docx

基于Flask，mysql slope one的图书推荐系统全部资料+详细文档.zip

舰艇2 glb模型文件，航空母舰glb模型（亲测可用） 效果图见描述

WordPress作为新闻管理面板的实现指南

管理建模和仿真的文件

函数与模块化编程宝典：J750编程高效之路

用C语言求有4个圆塔，圆心分别为（2，2)，(2，-2)，(-2，2)，(-2，-2)圆半径为1， 这4个塔的高度为10m 塔以外无建筑物接输入任意点的坐标 求该点的建筑高度（塔外的高度为零)的程序

NPC_Generator：使用Ruby打造的游戏角色生成器

"互动学习：行动中的多样性与论文攻读经历"

流程控制与循环结构详解：J750编程逻辑构建指南

舰艇2 glb模型文件，航空母舰glb模型（亲测可用）效果图见描述

用C语言求有4个圆塔，圆心分别为（2，2)，(2，-2)，(-2，2)，(-2，-2)圆半径为1，这4个塔的高度为10m 塔以外无建筑物接输入任意点的坐标求该点的建筑高度（塔外的高度为零)的程序