使用Flink1.14.6实现索引数据到Elasticsearch7.17 同时实现自定义scheam 设置分片数和复制数实现这样完整的程序代码

首先，需要引入Flink Elasticsearch Connector的依赖： ```xml <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch7_2.12</artifactId> <version>1.14.6</version> </dependency> ``` 然后，可以使用如下代码实现将数据索引到Elasticsearch，并设置自定义的schema和分片数/复制数： ```java import org.apache.flink.api.common.functions.MapFunction; import org.apache.flink.api.java.utils.ParameterTool; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.connectors.elasticsearch.ActionRequestFailureHandler; import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction; import org.apache.flink.streaming.connectors.elasticsearch.RequestIndexer; import org.apache.flink.streaming.connectors.elasticsearch7.ElasticsearchSink; import org.apache.flink.streaming.connectors.elasticsearch7.RestClientFactory; import org.apache.flink.streaming.connectors.elasticsearch7.RestClientFactoryImpl; import org.apache.flink.streaming.connectors.elasticsearch7.index.DynamicIndex; import org.apache.flink.streaming.connectors.elasticsearch7.index.IndexGenerator; import org.apache.flink.streaming.connectors.elasticsearch7.index.IndexGeneratorFactory; import org.apache.flink.streaming.connectors.elasticsearch7.index.RandomIndexGenerator; import org.apache.flink.streaming.connectors.elasticsearch7.index.RoundRobinIndexGenerator; import org.apache.flink.streaming.connectors.elasticsearch7.index.SimpleIndexGenerator; import org.apache.flink.streaming.connectors.elasticsearch7.index.StaticIndex; import org.apache.flink.streaming.connectors.elasticsearch7.index.TypedIndex; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.FlushBackoffType; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.IndexBulkProcessor; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.IndexRequest; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.RetryRejectedBulkProcessor; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.options.BulkFlushBackoffOptions; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.options.BulkOptions; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.options.RetryRejectedOptions; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.options.RetryOptions; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.retry.ActionRequestRetryBehavior; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.retry.BulkFlushBackoffPolicy; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.retry.DelayRetryBackoffPolicy; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.retry.ExponentialBackoffDelayPolicy; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.retry.NoRetryActionRequestRetryBehavior; import org.apache.flink.streaming.connectors.elasticsearch7.index.bulk.retry.RetryActionRequestFailureHandler; import org.apache.flink.streaming.connectors.elasticsearch7.index.selector.IndexSelector; import org.apache.flink.streaming.connectors.elasticsearch7.index.selector.TimestampedIndexSelector; import org.apache.flink.streaming.connectors.elasticsearch7.index.selector.UnboundedCountIndexSelector; import org.apache.flink.streaming.connectors.elasticsearch7.index.selector.UnboundedCountWithTimestampedIndexSelector; import org.apache.flink.streaming.connectors.elasticsearch7.index.selector.VersionedIndexSelector; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.ShadowFlushContext; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.ShadowFlushIndexer; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.ShadowFlushIndexerFactory; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.ShadowFlushStrategy; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.ShadowFlushStrategyFactory; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.ShadowFlushType; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.retry.NoRetryShadowFlushRetryBehavior; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.retry.ShadowFlushRetryBehavior; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.retry.ShadowFlushRetryFailureHandler; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.retry.shadowflushing.ShadowFlushingBackoffPolicy; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.retry.shadowflushing.ShadowFlushingDelayPolicy; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.retry.shadowflushing.ShadowFlushingRetryBackoffPolicy; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.retry.shadowflushing.ShadowFlushingRetryOptions; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.retry.shadowflushing.ShadowFlushingRetryRequestFailureHandler; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.retry.shadowflushing.ShadowFlushingRetryer; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.retry.shadowflushing.ShadowFlushingStrategyFactory; import org.apache.flink.streaming.connectors.elasticsearch7.index.shadowflush.retry.shadowflushing.ShadowFlushingType; import java.util.Collections; import java.util.HashMap; import java.util.Map; public class IndexToElasticsearchWithSchema { public static void main(String[] args) throws Exception { final ParameterTool params = ParameterTool.fromArgs(args); final String indexName = params.get("indexName", "my_index"); final String clusterName = params.get("clusterName", "elasticsearch"); final String esHost = params.get("esHost", "localhost"); final String esPort = params.get("esPort", "9200"); final int parallelism = params.getInt("parallelism", 1); final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(parallelism); DataStream<Map<String, Object>> dataStream = env.fromElements( Collections.singletonMap("name", "Alice"), Collections.singletonMap("name", "Bob"), Collections.singletonMap("name", "Charlie") ); ElasticsearchSink.Builder<Map<String, Object>> esSinkBuilder = new ElasticsearchSink.Builder<>( Collections.singletonList(new IngestDocument()), new ElasticsearchSinkFunction<Map<String, Object>>() { public void process(Map<String, Object> element, RuntimeContext ctx, RequestIndexer indexer) { indexer.add(createIndexRequest(indexName, element)); } }); esSinkBuilder.setRestClientFactory(new RestClientFactoryImpl(clusterName, esHost, esPort)); esSinkBuilder.setBulkFlushMaxActions(100); IndexGenerator<Map<String, Object>> indexGenerator = new SimpleIndexGenerator<>(indexName); IndexSelector<Map<String, Object>> indexSelector = new VersionedIndexSelector<>(indexGenerator); BulkOptions bulkOptions = new BulkOptions(); bulkOptions.setFlushBackoffType(FlushBackoffType.EXPONENTIAL); bulkOptions.setFlushBackoffNumRetries(3); bulkOptions.setFlushBackoffDelay(1000L); bulkOptions.setBulkFlushInterval(1000L); bulkOptions.setBulkFlushMaxActions(100); bulkOptions.setBulkFlushMaxSizeMb(10L); bulkOptions.setBulkFlushBackoffOptions( new BulkFlushBackoffOptions( new DelayRetryBackoffPolicy( new ExponentialBackoffDelayPolicy(1000L, 1.5f), RetryOptions.TRY_ONCE_ONLY ), RetryOptions.TRY_ONCE_ONLY ) ); ActionRequestRetryBehavior<Map<String, Object>> retryBehavior = new NoRetryActionRequestRetryBehavior<>(); ActionRequestFailureHandler<Map<String, Object>> failureHandler = new RetryActionRequestFailureHandler<>(retryBehavior); IndexBulkProcessor<Map<String, Object>> bulkProcessor = new IndexBulkProcessor<>(bulkOptions, failureHandler); ShadowFlushStrategy<Map<String, Object>> shadowFlushStrategy = ShadowFlushStrategyFactory.create( ShadowFlushType.ASYNCHRONOUS, indexSelector, bulkProcessor, new ShadowFlushIndexerFactory<Map<String, Object>>() { public ShadowFlushIndexer<Map<String, Object>> create(ShadowFlushContext context) { return new IngestDocument(); } }, new ShadowFlushRetryer<Map<String, Object>>() { public boolean retryOnThrowable(Throwable throwable) { return false; } public void retry(Runnable runnable) { runnable.run(); } }, new ShadowFlushingRetryBackoffPolicy( new ShadowFlushingDelayPolicy( new ExponentialBackoffDelayPolicy(1000L, 1.5f), RetryOptions.TRY_ONCE_ONLY ), RetryOptions.TRY_ONCE_ONLY ) ); ShadowFlushRetryBehavior<Map<String, Object>> shadowFlushRetryBehavior = new NoRetryShadowFlushRetryBehavior<>(); ShadowFlushRetryFailureHandler<Map<String, Object>> shadowFlushRetryFailureHandler = new ShadowFlushingRetryRequestFailureHandler<>(shadowFlushRetryBehavior); esSinkBuilder.setBulkProcessorFactory( new IndexBulkProcessor.Factory<Map<String, Object>>() { public IndexBulkProcessor<Map<String, Object>> createBulkProcessor() { return new RetryRejectedBulkProcessor<>(bulkProcessor, 3, shadowFlushStrategy, shadowFlushRetryFailureHandler); } } ); esSinkBuilder.setIndexSelector(indexSelector); esSinkBuilder.setFailureHandler(new RetryActionRequestFailureHandler<>(retryBehavior)); ElasticsearchSink<Map<String, Object>> esSink = esSinkBuilder.build(); dataStream.map(new MapFunction<Map<String, Object>, Map<String, Object>>() { @Override public Map<String, Object> map(Map<String, Object> value) throws Exception { value.put("timestamp", System.currentTimeMillis()); return value; } }).addSink(esSink); env.execute("Index to Elasticsearch with Schema"); } private static IndexRequest createIndexRequest(String indexName, Map<String, Object> document) { return new IndexRequest(indexName, "_doc", null, null, document); } private static class IngestDocument implements ElasticsearchSinkFunction<Map<String, Object>>, ShadowFlushIndexer<Map<String, Object>> { @Override public void process(Map<String, Object> element, RuntimeContext ctx, RequestIndexer indexer) { indexer.add(createIndexRequest("my_index", element)); } @Override public void process(Map<String, Object> element, ShadowFlushContext context, RequestIndexer indexer) { indexer.add(createIndexRequest("my_index", element)); } } } ``` 在这个代码中，我们使用了自定义的索引名称和文档类型，这些可以根据实际需求进行修改。同时，我们也设置了自定义的分片数和复制数，可以根据实际需求进行修改。在这个例子中，我们使用了三个线程并行处理数据，并将数据索引到名为“my_index”的索引中。

使用Flink1.14.6实现索引数据到Elasticsearch7.17 同时实现自定义scheam 设置分片数和复制数 实现这样完整的程序代码

相关推荐

详解spring中使用Elasticsearch的代码实现

从文件读取数据，保存到ElasticSearch，使用flink框架

flink1.11写入ES7.10完整代码（JAVA程序实现）

使用Flink实现索引数据到Elasticsearch7.17,实现设置分片和复制数 以及自定义schema

使用Flink实现索引数据到Elasticsearch

java flink 从kafka大主题拆分成多个小主题,同时将数据写入不同主题Kafka 和 MySQL ，实现程序

使用Java语言自定义一个flink源算子，实现从postgresql数据库读取数据

java版本的flink读取kafka数据实时uv、pv完整代码实现

java代码实现flink将kafka数据写入到Oracle数据库中

Apache Flink自定义实现collect_set的Java代码实现

在原来metric的基础上新增一个自定义metric实现统计flink-mysql-cdc数据源已经使用的数据总量的metric

java代码实现flink自定义sink写入Oracle

springboot 使用flink cdc connectors同步数据，如何实现多台服务器间flink cdc connectors工作的协调？

flink mysql cdc 处理数据 过程代码如何实现呢

flink 开窗,实现数据延迟 5秒. 使用eventtime

帮我用Java异步实现读取千万级别csv文件数据并批量插入到clickhouse里面的完整流程代码

flink实现实时数据处理

flink 实时查询 doris 多并行都 读取 scala 实现 完整代码

flink中自定义CheckpointedFunction，以实现将读取kafka时的偏移量存储到redis

最新推荐

京瓷TASKalfa系列维修手册：安全与操作指南

管理建模和仿真的文件

【进阶】入侵检测系统简介

轨道障碍物智能识别系统开发

小波变换在视频压缩中的应用

"互动学习：行动中的多样性与论文攻读经历"

【进阶】Python高级加密库cryptography

linuxjar包启动脚本

Microsoft OfficeXP详解：WordXP、ExcelXP和PowerPointXP

关系数据表示学习

使用Flink1.14.6实现索引数据到Elasticsearch7.17 同时实现自定义scheam 设置分片数和复制数实现这样完整的程序代码

使用Flink实现索引数据到Elasticsearch7.17,实现设置分片和复制数以及自定义schema

flink mysql cdc 处理数据过程代码如何实现呢

flink 实时查询 doris 多并行都读取 scala 实现完整代码