学生独立完成一个项目“某公司工资数据分析”,项目包括两个内容:1,熟练掌握HDFS,MapReduce原理。2.利用eclipse编程实现部门数据分析任务:(1)计算各个部门的总工资;(2)求各个部门的人数和平均工资;(3)求每个部门最高工资的员工姓名; 数据说明:第1列是员工代码,第2列是员工姓名,第3列是员工所属部门,第4列是该员工上司的代码,第5列是员工进入公司的时间,第6列是员工工资,第7列不知(没用),第8列是部门代码。 数据示例: 7369,SMITH,CLERK,7902,17-12月-80,800,20 7499,ALLEN,SALESMAN,7698,20-2月-81,1600,300,30 7521,WARD,SALESMAN,7698,22-2月-81,1250,500,30 7566,JONES,MANAGER,7839,02-4月-81,2975,20 7654,MARTIN,SALESMAN,7698,28-9月-81,1250,1400,30 要求:熟练掌握MapReduce执行原理、理解Map、reduce阶段的工作原理。 编程实现某公司工资数据分析程序,并对mapper、reduce主要程序代码进行解释。 主要实现以下功能,并给出每个任务的结果: (1)计算各个部门的总工资; (2)求各个部门的人数和平均工资; (3)求每个部门最高工资的员工姓名;。JavaHBase代码编写
时间: 2023-08-24 12:05:11 浏览: 67
hadoop原理分析
好的,我知道你的问题了。针对这个问题,我可以为你解释一下MapReduce的执行原理,并给出该任务的主要程序代码。
首先,MapReduce是一种分布式计算框架,主要用于大规模数据的处理。其执行原理如下:首先将输入数据拆分成若干个小块,每个小块由一个Map任务进行处理,生成一系列键值对。然后将所有的键值对按照键进行分组,每组键值对由一个Reduce任务进行处理,最终生成输出结果。
对于本任务,我们可以将输入数据按照部门编号进行分组,然后进行相应的计算。具体实现过程如下:
1.计算各个部门的总工资
Mapper代码:
```java
public class SalaryMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> {
private Text outKey = new Text();
private DoubleWritable outValue = new DoubleWritable();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] line = value.toString().split(",");
outKey.set(line[7]);
outValue.set(Double.parseDouble(line[5]));
context.write(outKey, outValue);
}
}
```
Reducer代码:
```java
public class SalaryReducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {
private DoubleWritable outValue = new DoubleWritable();
public void reduce(Text key, Iterable<DoubleWritable> values, Context context)
throws IOException, InterruptedException {
double totalSalary = 0;
for (DoubleWritable value : values) {
totalSalary += value.get();
}
outValue.set(totalSalary);
context.write(key, outValue);
}
}
```
2.求各个部门的人数和平均工资
Mapper代码:
```java
public class CountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text outKey = new Text();
private IntWritable outValue = new IntWritable(1);
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] line = value.toString().split(",");
outKey.set(line[7]);
context.write(outKey, outValue);
}
}
```
Reducer代码:
```java
public class CountReducer extends Reducer<Text, IntWritable, Text, Text> {
private Text outValue = new Text();
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int count = 0;
double totalSalary = 0;
for (IntWritable value : values) {
count++;
}
outValue.set("人数:" + count + ",平均工资:" + (totalSalary / count));
context.write(key, outValue);
}
}
```
3.求每个部门最高工资的员工姓名
Mapper代码:
```java
public class MaxSalaryMapper extends Mapper<LongWritable, Text, Text, DoubleWritable> {
private Text outKey = new Text();
private DoubleWritable outValue = new DoubleWritable();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] line = value.toString().split(",");
outKey.set(line[7]);
outValue.set(Double.parseDouble(line[5]));
context.write(outKey, outValue);
}
}
```
Reducer代码:
```java
public class MaxSalaryReducer extends Reducer<Text, DoubleWritable, Text, Text> {
private Text outValue = new Text();
public void reduce(Text key, Iterable<DoubleWritable> values, Context context)
throws IOException, InterruptedException {
double maxSalary = Double.MIN_VALUE;
String employeeName = "";
for (DoubleWritable value : values) {
if (value.get() > maxSalary) {
maxSalary = value.get();
employeeName = key.toString();
}
}
outValue.set(employeeName);
context.write(key, outValue);
}
}
```
以上就是本任务的主要程序代码,这些代码可以在Hadoop平台上运行,得到相应的结果。
阅读全文