已知数据如下: 7369,SMITH,CLERK,7902,1980-12-17,800,null,20 7499,ALLEN,SALESMAN,7698,1981-02-20,1600,300,30 7521,WARD,SALESMAN,7698,1981-02-22,1250,500,30 7566,JONES,MANAGER,7839,1981-04-02,2975,null,20 7654,MARTIN,SALESMAN,7698,1981-09-28,1250,1400,30 7698,BLAKE,MANAGER,7839,1981-05-01,2850,null,30 7782,CLARK,MANAGER,7839,1981-06-09,2450,null,10 7788,SCOTT,ANALYST,7566,1987-04-19,3000,null,20 7839,KING,PRESIDENT,null,1981-11-17,5000,null,10 7844,TURNER,SALESMAN,7698,1981-09-08,1500,0,30 7876,ADAMS,CLERK,7788,1987-05-23,1100,null,20 7900,JAMES,CLERK,7698,1981-12-03,950,null,30 7902,FORD,ANALYST,7566,1981-12-02,3000,null,20 7934,MILLER,CLERK,7782,1982-01-23,1300,null,10 使用MapReduce程序统计每年入职的人数。
时间: 2023-07-27 15:09:47 浏览: 30
首先,需要对数据进行清洗和处理,将入职日期中的年份提取出来。可以使用Java中的MapReduce框架进行处理。
Mapper阶段:
1. 将每一行数据按照逗号进行分割,得到一个字符串数组。
2. 从数组中获取入职日期字段,使用正则表达式提取出年份。
3. 将年份作为key,值为1作为value输出。
Reducer阶段:
1. 将相同年份的value进行累加,得到每个年份入职人数的总和。
2. 将年份和对应的入职人数输出。
下面是Mapper和Reducer的Java代码:
```java
public class EmployeeMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text year = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] fields = line.split(",");
String hireDate = fields[4];
Pattern pattern = Pattern.compile("\\d{4}");
Matcher matcher = pattern.matcher(hireDate);
if (matcher.find()) {
year.set(matcher.group());
context.write(year, one);
}
}
}
public class EmployeeReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
```
最后,将Mapper和Reducer传入Job中,并设置输入输出路径,运行程序即可得到每年入职的人数统计结果。