亿级规模ES查询优化实战:策略与案例分析

需积分: 50 21 下载量 81 浏览量 更新于2024-07-19 2 收藏 5.53MB PDF 举报
亿级规模的ES查询优化实战是由何金城进行的分享,他作为塔布数据的高级项目经理,拥有丰富的实践经验,包括参与大型项目如舆情监控爬虫和家电企业用户画像的构建。分享主要围绕如何在大数据环境下,针对ES(Elasticsearch)查询进行高效优化,以满足亿级用户和海量关联数据(20亿条)的性能需求。 在背景部分,何金城介绍了公司的内销渠道管理系统、电子商务系统等涉及的多个平台,以及庞大的用户群体、数据量(500GB)和实时数据展现的需求。查询性能的问题开始于测试阶段,发现平均耗时远超客户要求的2000ms,具体表现为在startTime和product相关的字段上执行聚合操作时出现瓶颈。 问题的深入分析聚焦于以下几个方面: 1. Badcases: 分析了两个具体的案例,一是startTime字段采用date_range聚合导致性能下降;二是product.time和province字段的nested聚合结构,通过嵌套多个层次的date_histogram和terms聚合,进一步降低了查询效率。 2. 原因探讨:何金城质疑可能存在的问题,如字段选择的合理性、查询语法的准确性、聚合操作的复杂性,以及内存使用是否有效。 优化方案主要包括: 1. 转换范围为术语:通过增加冗余字段startTimeStr,将date类型转换为字符串形式,简化date_range聚合为terms,从而减少查询的复杂度。 2. 字段配置优化:对特定字段如"product"进行针对性配置,可能是调整索引或分析器设置,以提高查询速度。 此外,他还可能讨论了其他优化策略,如使用合适的数据模型、创建索引策略、使用缓存、分片和副本的数量调整等,这些都是处理大规模数据和高并发查询时的关键考虑因素。 在整个分享中,何金城不仅分享了实际问题的解决过程,还强调了查询优化的重要性,并给出了实用的技巧和思考方法,对于从事大数据和搜索引擎优化的专业人士来说,这是一份极具价值的实战指南。

select lot_hs.lot_id as lot_id,lot_type,lot_hs.mainpd_id, created_time,COMPLETE_TIME, value(bank.banktime,0) as banktime , round ( ( 1.00*(days(COMPLETE_TIME)-days(created_time)) + (hour(COMPLETE_TIME)-hour(created_time))*1.00/24 + (minute(COMPLETE_TIME)-minute(created_time))*1.00/24/60 + (second(COMPLETE_TIME)-second(created_time))1.00/24/60/60) - value(bank.banktime,0),3) as use_days, customer_id, coalesce(cc.cust_id_define,lot_hs.customer_id) as cust_id2, cc.cycletime_target as ct_target, date,layer, round(count() over(partition by coalesce(cc.cust_id_define,lot_hs.customer_id),cc.cycletime_target)*0.9,0) cnt, row_number() over(partition by coalesce(cc.cust_id_define,lot_hs.customer_id),cc.cycletime_target order by ( ( days(COMPLETE_TIME)-days(created_time) + (hour(COMPLETE_TIME)-hour(created_time))*1.00/24 + (minute(COMPLETE_TIME)-minute(created_time))*1.00/24/60 + (second(COMPLETE_TIME)-second(created_time))*1.00/24/60/60) - value(bank.banktime,0))/layer) id From (select date(a.claim_time) as date, a.lot_id, a.lot_type,a.mainpd_id,a.prodspec_id,a.custprod_id, case when(date(b.created_time) <= '2009-01-05') then b.created_time + 21 days else b.created_time end as created_time, CASE WHEN A.CUST_id in ('MCA','NPA','SET') THEN a.COMPLETE_TIME ELSE a.COMPLETE_TIME END COMPLETE_TIME, a.cust_id as customer_id, a.ope_category, c.layer From f3rpt.F3_TB_DAILY_FABOUT a, f3rpt.fvlot b, (select mainpd_id, sum(masks)layer from f3rpt.ASMCRPT_VW_MAINPD_MASKS_ALL group by mainpd_id) as c, (select * from (select lot_id, max(claim_time)claim_time, count(case when(ope_category='Ship')then lot_id else null end) as LS, count(case when(ope_category='Unship') then lot_id else null end) as LUS from f3rpt.F3_TB_DAILY_FABOUT where substr(lot_id,1,2) not in('CA','CW','ES','E0','EM') and lot_type = 'Production' AND LOT_ID NOT LIKE 'H%' and substr(lot_id,7,4)='.00F' and ope_category in ('Ship','Unship') and year(claim_time) = year(current date - 1 days) and month(claim_time) = month(current date - 1 days) group by lot_id) as a where LS - LUS > 0 ) as lot Where a.lot_id = b.lot_id and b.mainpd_id = c.mainpd_id and a.lot_id = lot.LOT_ID and a.claim_time = lot.claim_time and a.ope_category = 'Ship' and a.cust_id in ('SM','BOE','GSC','NPA','GTA') ) as lot_hs left outer join (select lot_id,max(bankin_time) banktime from f3rpt.asmc_dpm where bankin_time>0 group by lot_id) bank on (lot_hs.lot_id = bank.lot_id) left join f3cim.f3cim_cfg_cust_rule cc on case when lot_hs.customer_id='WXM' THEN 'WII'||SUBSTR(lot_hs.mainpd_id,6,1) else lot_hs.customer_id end = cc.cust_id and locate(cc.mainpd_id,lot_hs.mainpd_id)>0 and locate(cc.prodspec_id,lot_hs.prodspec_id)>0 and locate(cc.custprod_id_45,substr(lot_hs.custprod_id,3,3))>0 where lot_hs.ope_category = 'Ship' ;以上sql如何优化

2023-06-07 上传