根据提供的数据集,编写实现计算每门成绩的最高分、最低分、平均分—>(key:课程名称;value:max: xx, min: xx, avg: xx)的MapReduce java程序,导出jar包上传到集群运行(10分)
时间: 2024-03-09 11:46:27 浏览: 140
下面是实现计算每门成绩的最高分、最低分、平均分的 MapReduce 程序的代码:
```java
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class ScoreStatistics {
public static class ScoreMapper extends Mapper<Object, Text, Text, DoubleWritable> {
private Text course = new Text();
private DoubleWritable score = new DoubleWritable();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String[] fields = value.toString().split(",");
course.set(fields[1]);
score.set(Double.parseDouble(fields[2]));
context.write(course, score);
}
}
public static class ScoreReducer extends Reducer<Text, DoubleWritable, Text, Text> {
public void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException {
List<Double> scoreList = new ArrayList<Double>();
double maxScore = Double.MIN_VALUE;
double minScore = Double.MAX_VALUE;
double sumScore = 0;
int count = 0;
for (DoubleWritable score : values) {
double s = score.get();
scoreList.add(s);
if (s > maxScore) {
maxScore = s;
}
if (s < minScore) {
minScore = s;
}
sumScore += s;
count++;
}
double avgScore = sumScore / count;
String result = "max: " + maxScore + ", min: " + minScore + ", avg: " + avgScore;
context.write(key, new Text(result));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Score Statistics");
job.setJarByClass(ScoreStatistics.class);
job.setMapperClass(ScoreMapper.class);
job.setReducerClass(ScoreReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
```
这个程序包含了一个 Mapper 和一个 Reducer,Mapper 将每条记录中的课程名称和分数提取出来,作为输出的键值对的键和值;Reducer 则对每门课程的所有分数进行处理,计算最高分、最低分和平均分,并输出结果。
程序的运行需要两个参数,第一个参数是输入数据所在的路径,第二个参数是输出结果所在的路径。可以使用以下命令将程序打成 jar 包并上传到集群运行:
```
$ javac -classpath $(hadoop classpath) -d ScoreStatistics ScoreStatistics.java
$ jar -cvf ScoreStatistics.jar -C ScoreStatistics/ .
$ hadoop jar ScoreStatistics.jar ScoreStatistics /input /output
```
其中,`/input` 是输入数据所在的路径,`/output` 是输出结果所在的路径。
阅读全文