演示案例1-统计单词数量hadoop
时间: 2024-09-15 08:15:18 浏览: 79
Hadoop简单应用案例
5星 · 资源好评率100%
演示案例1:假设我们要使用Hadoop框架统计一个文本文件中单词的数量。Hadoop是一个分布式计算平台,适合处理大规模数据。在这个场景下,我们可以采用MapReduce模型:
首先,我们编写Mapper阶段的代码。Mapper负责读取输入文件的一个部分,并将每一行分割成单词。例如,在Java中,你可以这样做:
```java
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] words = line.split("\\s+");
for (String w : words) {
word.set(w);
context.write(word, one);
}
}
}
```
接着,Reducer阶段接手Mapper的结果,对每个单词计数:
```java
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
```
最后,你需要设置JobConfiguration,指定Mapper和Reducer,并运行Job来获取结果。这个例子展示了如何利用Hadoop的并行处理能力高效地统计大量文本文件中的单词数量。
阅读全文