使用课堂讲解的Hadoop mapreduce,编写map类和reduce类。 (3)对天气数据,计算每个城市id的气温次数(计数)
时间: 2024-03-20 17:40:21 浏览: 60
假设天气数据的格式为:每一行包含城市id和气温,中间用逗号隔开。例如:
```
101,18
102,23
101,20
103,15
102,21
101,16
```
下面是一个简单的 MapReduce 任务,用于计算每个城市id的气温次数:
Map 类:
```java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class TemperatureCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text cityId = new Text();
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] parts = line.split(",");
if (parts.length == 2) {
cityId.set(parts[0]);
context.write(cityId, one);
}
}
}
```
Reduce 类:
```java
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class TemperatureCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
result.set(sum);
context.write(key, result);
}
}
```
在这个 MapReduce 任务中,我们使用 Text 类型存储城市id,使用 IntWritable 类型存储计数。在 Map 阶段中,我们从输入的每一行中提取城市id,然后将其作为 key,将值设置为 1,发送给 Reduce 阶段。在 Reduce 阶段中,我们对每个城市id的计数进行求和,并将结果输出。
完整的代码可以参考下面的链接:
https://github.com/ChitGPT/MapReduce-Examples/tree/main/Temperature-Count
阅读全文