java 读写parquet
时间: 2024-02-27 17:50:51 浏览: 137
Java读写Parquet文件可以使用Apache Parquet库。下面是一个简单的示例代码,演示了如何使用Java读取和写入Parquet文件:
1. 首先,确保你已经添加了Apache Parquet库的依赖。可以在Maven项目中添加以下依赖项:
```xml
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-avro</artifactId>
<version>1.12.0</version>
</dependency>
```
2. 读取Parquet文件:
```java
import org.apache.avro.generic.GenericRecord;
import org.apache.hadoop.fs.Path;
import org.apache.parquet.avro.AvroParquetReader;
import org.apache.parquet.hadoop.ParquetReader;
public class ParquetReaderExample {
public static void main(String[] args) {
try {
Path path = new Path("path/to/parquet/file.parquet");
ParquetReader<GenericRecord> reader = AvroParquetReader.<GenericRecord>builder(path).build();
GenericRecord record;
while ((record = reader.read()) != null) {
// 处理每一行数据
System.out.println(record);
}
reader.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
```
3. 写入Parquet文件:
```java
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericRecord;
import org.apache.hadoop.fs.Path;
import org.apache.parquet.avro.AvroParquetWriter;
import org.apache.parquet.hadoop.ParquetWriter;
import org.apache.parquet.hadoop.metadata.CompressionCodecName;
public class ParquetWriterExample {
public static void main(String[] args) {
try {
Path path = new Path("path/to/output/file.parquet");
Schema schema = new Schema.Parser().parse(new File("path/to/schema.avsc"));
ParquetWriter<GenericRecord> writer = AvroParquetWriter.<GenericRecord>builder(path)
.withSchema(schema)
.withCompressionCodec(CompressionCodecName.SNAPPY)
.build();
GenericRecord record = new GenericData.Record(schema);
// 设置record的字段值
record.put("field1", "value1");
record.put("field2", 123);
writer.write(record);
writer.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
```
请注意,上述示例中的路径需要根据实际情况进行替换。另外,读取Parquet文件时,你可能需要根据实际的Avro模式对读取的记录进行处理。
阅读全文