用java实现基于mapper-reduce的频率统计
时间: 2023-07-12 12:10:47 浏览: 121
实现基于mapper-reduce的频率统计可以使用Java编写MapReduce程序。这里提供一个简单的示例:
假设我们有一个包含单词的文本文件,我们想要计算每个单词在文本中出现的次数。
首先,我们需要编写Mapper类,它将输入文本文件中的每个单词映射为一个键值对,其中键是单词本身,值是1。Mapper类如下:
```java
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Mapper.Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] words = line.split("\\s+");
for (String word : words) {
this.word.set(word);
context.write(this.word, one);
}
}
}
```
然后,我们需要编写Reducer类,它将每个单词的键值对汇总,并计算它们的总数。Reducer类如下:
```java
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Reducer.Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
result.set(sum);
context.write(key, result);
}
}
```
最后,我们需要编写Driver类,它将Mapper和Reducer类组合起来,并将它们作为一个MapReduce任务运行。Driver类如下:
```java
public class WordCountDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCountDriver.class);
job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountReducer.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
```
在上述代码中,我们指定了输入和输出文件路径,并启动了MapReduce任务。
最后,我们可以使用Hadoop命令行工具来运行这个程序,例如:
```
hadoop jar WordCount.jar WordCountDriver input output
```
这将从“input”目录中读取输入文件,并将输出写入“output”目录。输出将包含每个单词及其出现次数。
阅读全文