代码编写Hadoop已知各文件数据按位分布情况，汇总后找到分布中Top10的数据分布以及所在文件信息；

以下是Hadoop MapReduce的代码实现： Mapper1: ```java public class BitCountMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] tokens = value.toString().trim().split(","); for (int i = 0; i < tokens.length; i++) { int bit = i + 1; int count = Integer.parseInt(tokens[i].trim()); context.write(new IntWritable(bit), new IntWritable(count)); } } } ``` Reducer1: ```java public class BitCountReducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> { @Override protected void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int totalCount = 0; for (IntWritable value : values) { totalCount += value.get(); } context.write(key, new IntWritable(totalCount)); } } ``` Mapper2: ```java public class Top10Mapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] tokens = value.toString().trim().split("\t"); int bit = Integer.parseInt(tokens[0].trim()); int count = Integer.parseInt(tokens[1].trim()); context.write(new IntWritable(count), new IntWritable(bit)); } } ``` Reducer2: ```java public class Top10Reducer extends Reducer<IntWritable, IntWritable, IntWritable, Text> { private static final int TOP_N = 10; @Override protected void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { for (IntWritable value : values) { String fileName = ((FileSplit)context.getInputSplit()).getPath().getName(); context.write(value, new Text(fileName)); } } @Override protected void cleanup(Context context) throws IOException, InterruptedException { Configuration conf = context.getConfiguration(); int n = conf.getInt("n", TOP_N); PriorityQueue<Pair<Integer, String>> queue = new PriorityQueue<>(n, Comparator.comparingInt(Pair::getKey)); while (context.nextKey()) { int count = context.getCurrentKey().get(); String fileName = context.getCurrentValue().toString(); Pair<Integer, String> pair = Pair.of(count, fileName); queue.offer(pair); if (queue.size() > n) { queue.poll(); } } while (!queue.isEmpty()) { Pair<Integer, String> pair = queue.poll(); context.write(new IntWritable(pair.getKey()), new Text(pair.getValue())); } } } ``` Driver: ```java public class Top10BitCount { public static void main(String[] args) throws Exception { Configuration conf1 = new Configuration(); Job job1 = Job.getInstance(conf1, "BitCount"); job1.setJarByClass(Top10BitCount.class); job1.setMapperClass(BitCountMapper.class); job1.setReducerClass(BitCountReducer.class); job1.setOutputKeyClass(IntWritable.class); job1.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job1, new Path(args[0])); FileOutputFormat.setOutputPath(job1, new Path(args[1])); job1.waitForCompletion(true); Configuration conf2 = new Configuration(); conf2.setInt("n", 10); Job job2 = Job.getInstance(conf2, "Top10"); job2.setJarByClass(Top10BitCount.class); job2.setMapperClass(Top10Mapper.class); job2.setReducerClass(Top10Reducer.class); job2.setOutputKeyClass(IntWritable.class); job2.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job2, new Path(args[1])); FileOutputFormat.setOutputPath(job2, new Path(args[2])); job2.waitForCompletion(true); } } ``` 以上代码实现了对输入文件的数据按位分布情况进行汇总，然后找到分布中Top10的数据分布以及所在文件信息。其中，Mapper1和Reducer1实现了按位分布的汇总，Mapper2和Reducer2实现了Top10的查找和输出。最后，在Driver中串联两个Job完成任务。

代码编写Hadoop已知各文件数据按位分布情况，汇总后找到分布中Top10的数据分布以及所在文件信息；

相关推荐

Hadoop HDFS分布式文件系统简介

第四章(Hadoop大数据处理实战)Hadoop分布式文件系统.pdf

大数据分析-网站日志数据文件（Hadoop部署分析资料）

Hadoop已知各文件数据按位分布情况，汇总后找到分布中Top10的数据分布以及所在文件信息；

编写java代码，获取hadoop文件系统中的文件

编写java代码，获取hadoop文件系统中的文件并输出其内容

Hadoop伪分布从搭建到上传数据详细过程

Hadoop伪分布从搭建到上传数据

如何查看数据在DataNode节点的分布情况

Hive将数据都存储在Hadoop的系统分布式文件系统中，包括以下四个部分

使用R语言数据爬取生成csv文件，将数据上传Linux，使用Hadoop和hive进行数据分析

在 MapReduce 作业结束后保存输出数据到本地，可以使用 Hadoop 的文件系统 API 将输出数据写入本地文件。

ssm用hadoop展示文件列表,上传文件，下载文件，删除文件前后端代码

SeJAVA对用SequenceFile随机数生成以（整数，字符串）为（key，Value）的文本文件读取各文件中的key（可以是从1到100000的随机整数），统计各文件中数据按位分布

用IDEA获取hadoop指定文件的数据字节流

hadoop 编写jar文件

Exce、pandas，Hadoop处理的数据文件大小多少合适

编写 MapReduce 程序，对文件中的数据进行排序处理

数据分析hadoop代码

最新推荐

Hadoop平台搭建(单节点,伪分布,分布式文件系统及其上MapReduce程序测试)

基于Hadoop的数据仓库Hive学习指南.doc

第二章 分布式文件系统HDFS+MapReduce（代码实现检查文件是否存在&WordCount统计）.docx

构建企业级数仓-Hadoop可行性分析报告.docx

hadoop与mysql数据库的那点事(1)

zigbee-cluster-library-specification

管理建模和仿真的文件

实现实时数据湖架构：Kafka与Hive集成

可见光定位LED及其供电硬件具体型号，广角镜头和探测器，实验设计具体流程步骤，

JSBSim Reference Manual

第二章分布式文件系统HDFS+MapReduce（代码实现检查文件是否存在&WordCount统计）.docx