利用美国人口统计数据文件，使用mapreduce算法，完成下列任务： 1）统计数据文件中每一年每个州的总人口数量； 2）统计2008年所有州人口密度，并按降序排列。

好的，我知道你的问题了。不用担心，我会保密的。对于你的问题，可以使用Hadoop的MapReduce框架来进行处理。以下是实现这两个任务的步骤： 1. 统计数据文件中每一年每个州的总人口数量首先，需要编写一个Mapper程序，将数据文件中的每一行按照年份和州分组，输出键值对<年份_州, 人口数量>。代码示例如下： ```java public class PopulationMapper extends Mapper<LongWritable, Text, Text, LongWritable> { private Text outputKey = new Text(); private LongWritable outputValue = new LongWritable(); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] fields = value.toString().split(","); String year = fields[0]; String state = fields[1]; long population = Long.parseLong(fields[2]); outputKey.set(year + "_" + state); outputValue.set(population); context.write(outputKey, outputValue); } } ``` 然后，需要编写一个Reducer程序，将Mapper输出的键值对按照年份和州再次分组，并将人口数量相加得到每一年每个州的总人口数量。代码示例如下： ```java public class PopulationReducer extends Reducer<Text, LongWritable, Text, LongWritable> { private LongWritable outputValue = new LongWritable(); @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { long sum = 0; for (LongWritable value : values) { sum += value.get(); } outputValue.set(sum); context.write(key, outputValue); } } ``` 最后，在Driver程序中配置MapReduce作业的输入输出路径、Mapper和Reducer类，并提交作业到Hadoop集群运行即可。代码示例如下： ```java public class PopulationJob { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "PopulationJob"); job.setJarByClass(PopulationJob.class); job.setMapperClass(PopulationMapper.class); job.setReducerClass(PopulationReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } ``` 2. 统计2008年所有州人口密度，并按降序排列同样地，需要编写一个Mapper程序，将数据文件中的每一行按照州分组，输出键值对<州, 人口数量>。代码示例如下： ```java public class DensityMapper extends Mapper<LongWritable, Text, Text, LongWritable> { private Text outputKey = new Text(); private LongWritable outputValue = new LongWritable(); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] fields = value.toString().split(","); String year = fields[0]; String state = fields[1]; long population = Long.parseLong(fields[2]); if ("2008".equals(year)) { outputKey.set(state); outputValue.set(population); context.write(outputKey, outputValue); } } } ``` 然后，需要编写一个Reducer程序，将Mapper输出的键值对按照州分组，并将人口数量相加得到每个州的总人口数量。代码示例如下： ```java public class DensityReducer extends Reducer<Text, LongWritable, Text, DoubleWritable> { private DoubleWritable outputValue = new DoubleWritable(); @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { long sum = 0; for (LongWritable value : values) { sum += value.get(); } double density = (double) sum / 1000000; // 人口密度 = 总人口数量 / 平方英里 outputValue.set(density); context.write(key, outputValue); } } ``` 最后，在Driver程序中配置MapReduce作业的输入输出路径、Mapper和Reducer类，并将Reducer输出的键值对按照值降序排列输出到文件中。代码示例如下： ```java public class DensityJob { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "DensityJob"); job.setJarByClass(DensityJob.class); job.setMapperClass(DensityMapper.class); job.setReducerClass(DensityReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(DoubleWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); Job sortJob = Job.getInstance(conf, "DensitySortJob"); sortJob.setJarByClass(DensityJob.class); sortJob.setMapperClass(InverseMapper.class); sortJob.setReducerClass(DescendingSortReducer.class); sortJob.setOutputKeyClass(DoubleWritable.class); sortJob.setOutputValueClass(Text.class); FileInputFormat.addInputPath(sortJob, new Path(args[1])); FileOutputFormat.setOutputPath(sortJob, new Path(args[2])); sortJob.waitForCompletion(true); } } ``` 注意，为了按照降序排列，还需要编写一个InverseMapper类和一个DescendingSortReducer类，代码示例如下： ```java public class InverseMapper extends Mapper<LongWritable, Text, DoubleWritable, Text> { private DoubleWritable outputKey = new DoubleWritable(); private Text outputValue = new Text(); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] fields = value.toString().split("\t"); String state = fields[0]; double density = Double.parseDouble(fields[1]); outputKey.set(-density); // 取相反数以实现降序排列 outputValue.set(state); context.write(outputKey, outputValue); } } public class DescendingSortReducer extends Reducer<DoubleWritable, Text, Text, DoubleWritable> { private Text outputKey = new Text(); @Override protected void reduce(DoubleWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException { double density = -key.get(); // 取相反数还原为原来的值 for (Text value : values) { outputKey.set(value); context.write(outputKey, new DoubleWritable(density)); } } } ``` 这样，就完成了两个任务的处理。希望这些代码对你有所帮助！

阅读全文

利用美国人口统计数据文件，使用mapreduce算法，完成下列任务： 1）统计数据文件中每一年每个州的总人口数量； 2）统计2008年所有州人口密度，并按降序排列。

相关推荐

美国各州的人口数据state-population

美国各州人口数据分析csv

基于Hadoop MapReduce的高校考研分数线统计分析项目代码+数据集.rar

MapReduce算法

mapreduce-db-operat:mapreduce实现数据从hdfs到mysql之间的相互传递

MapReduce-Machine-Learning:一些机器学习算法的 Map-Reduce 实现

Hadoop-MapReduce-Distributed-Grep:使用 Hadoop MapReduce 实现分布式 grep

基于PML结构文件的MapReduce算法优化 (2016年)

MapReduce算法分析

yelp-heat-map:Yelp 学术数据集的 MapReduce 算法，用于创建单词的地理热图

使用MapReduce对数据文件进行切分

MapReduce---CS6240:使用 MapReduce 进行并行数据处理

云计算之mapreduce算法

Ch5-MapReduce算法设计1

服务器与MapReduce作业的联合调度：性能界限和实验

Hadoop-MapReduce-by-Example：所有Hadoop＆Map Reduce示例

二度人脉算法MapReduce实现（工程代码和数据文件）

NCDC-weather-dataset-using-Hadoop-MapReduce-Pig-Hive:国家气候数据中心（NCDC）是世界上最大的实时气象数据存档。 我下载了1930年的NCDC数据并将其加载到HDFS系统中。 我实现了MapReduce程序和Pig，Hove脚本，以找到不同站点的Min，Max，avg温度

最新推荐

第二章 分布式文件系统HDFS+MapReduce（代码实现检查文件是否存在&WordCount统计）.docx

java大数据作业_5Mapreduce、数据挖掘

基于MapReduce实现决策树算法

阿里数据中台设计与数据资产管理.docx

1基于蓝牙的项目开发--蓝牙温度监测器.docx

IEEE 14总线系统Simulink模型开发指南与案例研究

管理建模和仿真的文件

【数据安全黄金法则】：R语言中party包的数据处理与隐私保护

Takagi-Sugeno模糊控制方法的原理是什么？如何设计一个基于此方法的零阶或一阶模糊控制系统？

STLinkV2.J16.S4固件更新与应用指南

NCDC-weather-dataset-using-Hadoop-MapReduce-Pig-Hive:国家气候数据中心（NCDC）是世界上最大的实时气象数据存档。我下载了1930年的NCDC数据并将其加载到HDFS系统中。我实现了MapReduce程序和Pig，Hove脚本，以找到不同站点的Min，Max，avg温度

第二章分布式文件系统HDFS+MapReduce（代码实现检查文件是否存在&WordCount统计）.docx