flink 写入upser-kafka 不输出delete数据

在 Flink 写入上述的 Kafka topic 时，可以通过指定 `upsert` 语义来避免输出 `delete` 数据。具体来说，可以使用 Flink 自带的 `JsonRowSerializationSchema`（或其他序列化方式）将 `Row` 数据转换为 JSON 格式，并在写入 Kafka 时指定 `upsert` 语义，如下所示： ```java // 创建 FlinkKafkaProducer 实例 FlinkKafkaProducer<Row> kafkaProducer = new FlinkKafkaProducer<>( "topic", new JsonRowSerializationSchema(schema, true), // 将 Row 转换为 JSON kafkaProducerConfig, FlinkKafkaProducer.Semantic.UPSERT // 指定 upsert 语义 ); // 将数据流写入 Kafka dataStream.addSink(kafkaProducer); ``` 在上述代码中，`schema` 是 `Row` 数据的 Schema，`true` 表示只输出 `upsert` 数据。这样就可以确保写入 Kafka 的数据只包含 `insert` 和 `update` 数据，而不包含 `delete` 数据了。

请使用java编写flink消费kafka写入hive集群的demo

以下是一个简单的Java实现，使用Flink消费Kafka并将数据写入Hive集群。请根据实际情况进行修改并适当添加错误处理。 ```java import org.apache.flink.api.common.functions.MapFunction; import org.apache.flink.api.common.serialization.SimpleStringSchema; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer; import org.apache.flink.streaming.util.serialization.KeyedSerializationSchema; import org.apache.flink.streaming.util.serialization.KeyedSerializationSchemaWrapper; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hive.conf.HiveConf; import org.apache.hadoop.hive.metastore.api.FieldSchema; import org.apache.hadoop.hive.metastore.api.Table; import org.apache.hadoop.hive.ql.metadata.Hive; import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.lib.NullOutputFormat; import org.apache.kafka.clients.consumer.ConsumerConfig; import org.apache.kafka.clients.producer.ProducerConfig; import org.apache.kafka.common.serialization.StringSerializer; import java.io.IOException; import java.util.List; import java.util.Properties; public class FlinkKafkaHiveDemo { private static final String KAFKA_BOOTSTRAP_SERVERS = "localhost:9092"; private static final String KAFKA_TOPIC = "test"; private static final String HIVE_METASTORE_URI = "thrift://localhost:9083"; private static final String HIVE_DATABASE = "default"; private static final String HIVE_TABLE = "test"; public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.disableOperatorChaining(); Properties kafkaProps = new Properties(); kafkaProps.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_BOOTSTRAP_SERVERS); kafkaProps.setProperty(ConsumerConfig.GROUP_ID_CONFIG, "flink_consumer"); FlinkKafkaConsumer<String> kafkaConsumer = new FlinkKafkaConsumer<>(KAFKA_TOPIC, new SimpleStringSchema(), kafkaProps); kafkaConsumer.setStartFromEarliest(); DataStream<String> input = env.addSource(kafkaConsumer); DataStream<String> transformed = input.map(new MapFunction<String, String>() { @Override public String map(String value) throws Exception { // 进行数据转换 return value; } }); Properties hiveProps = new Properties(); hiveProps.setProperty("hive.metastore.uris", HIVE_METASTORE_URI); HiveConf hiveConf = new HiveConf(); hiveConf.addResource(hiveProps); Hive hive = Hive.get(hiveConf); try { Table table = new Table(); table.setDbName(HIVE_DATABASE); table.setTableName(HIVE_TABLE); table.setTableType("EXTERNAL_TABLE"); List<FieldSchema> columns = List.of(new FieldSchema("col1", TypeInfoFactory.stringTypeInfo.getTypeName(), "")); table.setFields(columns); table.getParameters().put("EXTERNAL", "TRUE"); table.getParameters().put("LOCATION", "/user/hive/warehouse/" + HIVE_DATABASE + ".db/" + HIVE_TABLE); hive.createTable(table); } catch (HiveException e) { e.printStackTrace(); } Configuration hadoopConf = new Configuration(); hadoopConf.set("fs.defaultFS", "hdfs://localhost:9000"); Path outputPath = new Path("/user/hive/warehouse/" + HIVE_DATABASE + ".db/" + HIVE_TABLE); FileSystem fs = FileSystem.get(hadoopConf); if (fs.exists(outputPath)) { fs.delete(outputPath, true); } Properties kafkaProducerProps = new Properties(); kafkaProducerProps.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_BOOTSTRAP_SERVERS); FlinkKafkaProducer<String> kafkaProducer = new FlinkKafkaProducer<>(KAFKA_TOPIC, new KeyedSerializationSchemaWrapper<>(new SimpleStringSchema()), kafkaProducerProps, FlinkKafkaProducer.Semantic.AT_LEAST_ONCE); transformed.writeAsText("/tmp/flink-hive-output", org.apache.flink.core.fs.FileSystem.WriteMode.OVERWRITE).setParallelism(1); transformed.writeUsingOutputFormat(new HiveOutputFormat(hiveConf, HIVE_DATABASE, HIVE_TABLE)).setParallelism(1); env.execute("Flink Kafka Hive Demo"); } private static class HiveOutputFormat extends org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat<String> { private final HiveConf hiveConf; private final String database; private final String table; public HiveOutputFormat(HiveConf hiveConf, String database, String table) { super(); this.hiveConf = hiveConf; this.database = database; this.table = table; } @Override public org.apache.hadoop.mapred.RecordWriter getRecordWriter(FileSystem ignored, org.apache.hadoop.mapred.JobConf jobConf, String name, org.apache.hadoop.util.Progressable progressable) throws IOException { try { return new HiveRecordWriter(hiveConf, database, table); } catch (HiveException e) { throw new IOException(e); } } } private static class HiveRecordWriter implements org.apache.hadoop.mapred.RecordWriter<LongWritable, Text> { private final HiveConf hiveConf; private final String database; private final String table; private final org.apache.hadoop.hive.ql.metadata.Table hiveTable; private final TextInputFormat inputFormat; private final NullOutputFormat<Text, Text> outputFormat; public HiveRecordWriter(HiveConf hiveConf, String database, String table) throws HiveException { this.hiveConf = hiveConf; this.database = database; this.table = table; this.hiveTable = Hive.get(hiveConf).getTable(database, table); this.inputFormat = new TextInputFormat(); this.outputFormat = new NullOutputFormat<>(); } @Override public void write(LongWritable key, Text value) throws IOException { try { inputFormat.addInputPath(new org.apache.hadoop.mapred.FileSplit(new Path(value.toString()), 0, Long.MAX_VALUE, new String[0])); org.apache.hadoop.mapred.RecordReader<LongWritable, Text> reader = inputFormat.getRecordReader(new org.apache.hadoop.mapred.FileSplit(new Path(value.toString()), 0, Long.MAX_VALUE, new String[0]), new org.apache.hadoop.mapred.JobConf(hiveConf), null); org.apache.hadoop.mapred.OutputCollector<Text, Text> collector = outputFormat.getRecordWriter(new org.apache.hadoop.mapred.JobConf(hiveConf), null, null, null); LongWritable keyWritable = reader.createKey(); Text valueWritable = reader.createValue(); while (reader.next(keyWritable, valueWritable)) { collector.collect(null, valueWritable); } reader.close(); } catch (Exception e) { throw new IOException(e); } } @Override public void close(org.apache.hadoop.mapred.Reporter reporter) throws IOException { } } } ```

阅读全文

flink 写入upser-kafka 不输出delete数据

请使用java编写flink消费kafka写入hive集群的demo

相关推荐

基于Flink-Kafka-InfluxDB的IoT框架示例应用

自制Flink-Kafka工具包：实现Kafka消费者和生产者

构建Flink-Kafka实验环境与部署教程

Java_Flink CDC是一个流数据集成工具.zip

Flink SQL大数据视频教程下载

【Maxwell与Kafka集成秘籍】：构建高效率数据管道，让数据流动无阻

HBase写入流程揭秘：客户端数据如何直达HFile

Kafka的基本概念和架构解析

阿里canal和Kafka的深度集成与应用

消息系统终极对决：Kafka与RabbitMQ深度比较与选择指南

Go微服务消息队列集成：RabbitMQ与Kafka在Go中的应用

【消息队列集成指南】Spring消息队列集成：Kafka与RabbitMQ的实用技巧

【高可用性方案】iFix与SQL Server数据同步：构建不中断的数据冗余机制

JSON数据库编程中的数据集成：打破数据孤岛，实现数据互联互通

【诊断数据管理】：构建高效数据记录与回放系统（数据管理秘籍）

使用Debezium实现数据湖中的实时数据更新

泛微E-cology8.0集成中心实战秘笈：打造高效数据展现集成

MQ-3传感器网络化攻略：轻松将数据集成到IoT平台

PHP数据入库批量操作指南：提升大数据量入库效率

大家在看

AGV硬件设计概述.pptx

DSR.rar_MANET DSR_dsr_dsr manet_it_manet

VITA 62.0.docx

年终活动抽奖程序，随机动画变化

形成停止条件-c#导出pdf格式

最新推荐

大数据之flink教程-TableAPI和SQL.pdf

2015-2024软考中级信息安全工程师视频教程网课程真题库课件复习材料.zip

Spring Websocket快速实现与SSMTest实战应用

电力电子技术的智能化：数据中心的智能电源管理

通过spark sql读取关系型数据库mysql中的数据

新版微软inspect工具下载：32位与64位版本

如何运用电力电子技术实现IT设备的能耗监控

2635.656845多位小数数字，js不使用四舍五入保留两位小数，然后把结果千分位，想要的结果是2,635.65;如何处理

解决最小倍数问题 - Ruby编程项目欧拉实践

电力电子技术：IT数据中心的能源革命者