使用mr程序统计每年入职的人数。 最终结果要求如下: 1. 格式如下: 年份:1980 人数:xxx 年份:1981 人数:xxx ....... 2. 两个分区: 0分区存储 入职年份<1982年的 1分区存储 入职年份>=1982年的
时间: 2023-12-10 10:41:25 浏览: 81
以下是使用MapReduce程序实现统计每年入职人数的步骤:
1. Mapper阶段:
mapper读取输入文件中的每一行,提取出入职年份作为键,将值设置为1,将键值对输出。
2. Reducer阶段:
reducer接收来自mapper传来的键值对,将同一年份的值进行累加并输出。
3. Partitioner阶段:
partitioner根据入职年份的大小,将数据分发到0或1分区。
以下是示例代码:
Mapper类:
```
public class EmployeeMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
private IntWritable year = new IntWritable();
private IntWritable one = new IntWritable(1);
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] fields = value.toString().split(",");
int yearValue = Integer.parseInt(fields[1]);
year.set(yearValue);
context.write(year, one);
}
}
```
Reducer类:
```
public class EmployeeReducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
private IntWritable result = new IntWritable();
@Override
public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
```
Partitioner类:
```
public class EmployeePartitioner extends Partitioner<IntWritable, IntWritable> {
@Override
public int getPartition(IntWritable key, IntWritable value, int numPartitions) {
if (key.get() < 1982) {
return 0;
} else {
return 1;
}
}
}
```
驱动程序:
```
public class EmployeeCount {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "employee count");
job.setJarByClass(EmployeeCount.class);
job.setMapperClass(EmployeeMapper.class);
job.setPartitionerClass(EmployeePartitioner.class);
job.setReducerClass(EmployeeReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(2);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
```
在运行程序时,需要将输入文件和输出目录作为程序参数传入。例如:
```
hadoop jar EmployeeCount.jar input/employee.csv output
```
输出结果将会保存在output文件夹中,可以使用cat命令查看输出:
```
hadoop fs -cat output/part-r-00000
hadoop fs -cat output/part-r-00001
```
其中,part-r-00000为0分区的输出,part-r-00001为1分区的输出。
阅读全文
相关推荐
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![rar](https://img-home.csdnimg.cn/images/20241231044955.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![pptx](https://img-home.csdnimg.cn/images/20241231044947.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)