hadoop输出文件乱码
时间: 2023-07-07 22:27:11 浏览: 222
Hadoop输出文件乱码可能是因为编码不一致导致的。可以尝试在输出时使用UTF-8编码,例如:
```
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.getConfiguration().set("mapreduce.output.textoutputformat.separator", ",");
job.getConfiguration().set("mapreduce.output.fileoutputformat.compress.type", "BLOCK");
job.getConfiguration().set("mapreduce.output.fileoutputformat.compress", "false");
job.getConfiguration().set("mapreduce.output.fileoutputformat.compress.codec", "org.apache.hadoop.io.compress.GzipCodec");
job.getConfiguration().set("mapreduce.map.output.compress.codec", "org.apache.hadoop.io.compress.GzipCodec");
job.getConfiguration().set("mapreduce.task.timeout", "1800000");
job.getConfiguration().set("mapreduce.task.io.sort.mb", "2048");
job.getConfiguration().set("mapreduce.task.io.sort.factor", "30");
job.getConfiguration().set("mapreduce.job.reduces", "30");
job.getConfiguration().set("mapreduce.reduce.shuffle.input.buffer.percent", "0.2");
job.getConfiguration().set("mapreduce.reduce.shuffle.memory.limit.percent", "0.5");
job.getConfiguration().set("mapreduce.reduce.input.limit", "0");
job.getConfiguration().set("mapreduce.reduce.shuffle.merge.percent", "0.7");
job.getConfiguration().set("mapreduce.reduce.shuffle.parallelcopies", "30");
job.getConfiguration().set("mapreduce.reduce.input.buffer.percent", "0.2");
job.getConfiguration().set("mapreduce.reduce.memory.mb", "2048");
job.getConfiguration().set("mapreduce.reduce.java.opts", "-Xmx1638m");
job.getConfiguration().set("mapreduce.reduce.shuffle.memory.limit.mb", "1024");
job.getConfiguration().set("mapreduce.reduce.shuffle.input.buffer.percent", "0.2");
job.getConfiguration().set("mapreduce.output.fileoutputformat.encoding", "UTF-8");
FileOutputFormat.setOutputPath(job, outputPath);
```
如果还是出现乱码,可以尝试在读取时指定编码方式,例如:
```
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF-8"));
```
阅读全文