针对这个文件帮我写一份 Hadoop MapReduce词频统计源码
时间: 2024-12-18 16:30:07 浏览: 18
大数据 hadoop mapreduce 词频统计
当然可以!以下是一个简单的Hadoop MapReduce程序,用于统计`word.txt`文件中的单词频率。该程序包括一个Mapper类和一个Reducer类,并且使用了Hadoop的基本API。
### Mapper类
```java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] words = line.split(",");
for (String w : words) {
word.set(w.trim());
context.write(word, one);
}
}
}
```
### Reducer类
```java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
```
### 驱动程序
```java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountDriver {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: WordCount <input path> <output path>");
System.exit(-1);
}
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Word Count");
job.setJarByClass(WordCountDriver.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
```
### 运行步骤
1. **编译代码**:将上述三个类保存到相应的Java文件中(例如`WordCountMapper.java`、`WordCountReducer.java`和`WordCountDriver.java`),然后编译它们。
2. **打包**:将编译后的类打包成一个JAR文件。
3. **运行MapReduce作业**:使用Hadoop命令行工具运行MapReduce作业。假设输入文件在HDFS上的路径是`/input/word.txt`,输出目录为`/output`,则命令如下:
```sh
hadoop jar your-jar-file.jar WordCountDriver /input/word.txt /output
```
4. **查看结果**:结果将存储在指定的输出目录中,通常是一个名为`part-r-00000`的文件。
希望这对你有帮助!如果有任何问题,请随时提问。
阅读全文