编写一个最基本的wordcount单词统计的mapreduce,要求写出自定义的maper类和reducer类.
时间: 2024-03-01 19:49:35 浏览: 71
mapreduce-wordcount
好的,下面是一个最基本的WordCount单词统计的MapReduce程序,包括自定义的Mapper类和Reducer类。
Mapper类:
```java
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
private Text word = new Text();
private LongWritable one = new LongWritable(1);
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] words = line.split("\\s+");
for (String w : words) {
word.set(w);
context.write(word, one);
}
}
}
```
Reducer类:
```java
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
private LongWritable count = new LongWritable();
@Override
public void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
long sum = 0;
for (LongWritable val : values) {
sum += val.get();
}
count.set(sum);
context.write(key, count);
}
}
```
在Driver类中,需要进行如下配置:
```java
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: WordCount <input path> <output path>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(WordCount.class);
job.setJobName("Word Count");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
```
在运行程序时,需要指定输入文件路径和输出文件路径,例如:
```shell
$ hadoop jar WordCount.jar input output
```
程序会读取输入路径下的文件,统计每个单词出现的次数,并将结果输出到输出路径下的文件中。
阅读全文