mapreduce统计一个数据其中一列的最大值以逗号分隔
时间: 2024-02-24 16:53:39 浏览: 71
好的,以下是一个基于MapReduce的实现,可以统计一个数据中某一列的最大值,并以逗号分隔输出。假设数据格式为CSV格式,每一行由若干个字段组成,以逗号分隔。统计第三列的最大值,输出格式为:`max_value,`。
Mapper代码:
```java
public class MaxValueMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] fields = line.split(",");
if (fields.length >= 3) {
int columnValue = Integer.parseInt(fields[2]);
word.set("max_value");
context.write(word, new IntWritable(columnValue));
}
}
}
```
Reducer代码:
```java
public class MaxValueReducer extends Reducer<Text, IntWritable, Text, Text> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(new Text(Integer.toString(maxValue)), new Text(","));
}
}
```
Driver代码:
```java
public class MaxValueDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "MaxValue");
job.setJarByClass(MaxValueDriver.class);
job.setMapperClass(MaxValueMapper.class);
job.setReducerClass(MaxValueReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
```
在运行程序时,需要将输入数据和输出路径作为参数传入:
```bash
$ hadoop jar MaxValue.jar MaxValueDriver input output
```
其中,`MaxValue.jar`是打包好的程序包,`MaxValueDriver`是程序的入口类,`input`是输入数据的路径,`output`是输出结果的路径。
阅读全文