Java代码写MapReduce读取txt文件入库hive的示例代码
时间: 2024-09-26 07:18:17 浏览: 56
在Java中使用Hadoop MapReduce框架读取文本文件并将其插入Hive表,通常需要以下几个步骤:
首先,你需要在Mapper类中负责读取文件内容,并将数据转换成键值对。假设我们的文本文件包含键值对格式,例如`key1:value1`, `key2:value2`等。
```java
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
public class TextFileToHiveMapper extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text key = new Text();
@Override
protected void map(Object keyIn, Text valueIn, Context context) throws IOException, InterruptedException {
String[] keyValue = valueIn.toString().split(":");
key.set(keyValue[0]);
context.write(key, one); // 输出键值对到reduce阶段
}
}
```
然后,在Reducer类中,我们将键值对合并成一行,格式化为Hive能接受的格式,如`"key1, value1"`,并写入Hive表。
```java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class TextFileToHiveReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
StringBuilder sb = new StringBuilder();
for (IntWritable val : values) {
sb.append(key.toString()).append(",").append(val.get());
}
context.write(new Text(sb.toString()), result); // 写入键值对到Hive表
}
}
```
最后,配置Job并运行它:
```java
Configuration conf = new Configuration();
conf.set("mapred.output.format.class", "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"); // 使用Hive输出格式
Job job = Job.getInstance(conf, "TextFileToHive");
job.setJarByClass(TextFileToHiveJob.class); // 设置主类
job.setMapperClass(TextFileToHiveMapper.class);
job.setReducerClass(TextFileToHiveReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
Path outputPath = new Path("hdfs://.../output");
job.setOutputPath(outputPath);
job.waitForCompletion(true);
```
注意:这只是一个基本示例,实际应用中可能需要处理更复杂的逻辑,比如错误处理、分割符选择、字段映射等。同时,你需要确保Hadoop和Hive环境已经设置好并且能够正常交互。
阅读全文