mapreduce去重计算每个网络访问的手机数量完整代码
时间: 2024-02-22 18:09:51 浏览: 79
以下是一个基于MapReduce的去重计算每个网络访问的手机数量的完整代码示例:
Mapper:
```java
public class AccessMapper extends Mapper<LongWritable, Text, Text, Text> {
private Text outKey = new Text();
private Text outValue = new Text();
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] line = value.toString().split("\t");
String network = line[0]; // 网络类型
String phone = line[1]; // 手机号
outKey.set(phone);
outValue.set(network);
context.write(outKey, outValue);
}
}
```
Reducer:
```java
public class AccessReducer extends Reducer<Text, Text, Text, IntWritable> {
private IntWritable outValue = new IntWritable();
@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
Set<String> networkSet = new HashSet<>();
for (Text value : values) {
networkSet.add(value.toString());
}
outValue.set(networkSet.size());
context.write(key, outValue);
}
}
```
Driver:
```java
public class AccessDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Access Count");
job.setJarByClass(AccessDriver.class);
job.setMapperClass(AccessMapper.class);
job.setReducerClass(AccessReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
```
使用时,只需要将上述代码打包成jar包,然后通过以下命令提交作业即可:
```shell
$HADOOP_HOME/bin/hadoop jar access.jar AccessDriver /input /output
```
其中,`access.jar`是打包后的jar包,`/input`是存放输入数据的目录,`/output`是存放输出数据的目录。
阅读全文