大数据技术的关键挑战与理解

0 下载量 93 浏览量 更新于2024-06-28 收藏 30MB PPT 举报
"大数据技术领域若干关键问题(ppt版).ppt" 大数据技术是当前信息技术领域的重要研究方向,它涉及处理和分析海量、复杂、多样性的数据。在这个领域中,"大数据"这一概念并不是单纯指数据的体积庞大,而是涵盖了数据的体积(Volume)、速度(Velocity)、多样性和价值(Value)这四大特性,通常被称为4V模型。 大数据技术的兴起源于对数据处理能力的需求提升。随着互联网、物联网、社交媒体和移动设备的快速发展,数据的生成速度呈指数级增长,传统的数据管理系统已经无法有效地处理这些数据。例如,"超大规模数据库"(VLDB)在20世纪70年代用来描述数百万条记录的数据库,而在21世纪初,随着数据规模的急剧扩大,"海量数据"这个术语被引入,以适应非结构化和半结构化数据的处理需求,比如文本、图像和视频等。 非结构化数据的管理是大数据技术中的一个核心挑战。由于这类数据无法通过传统的关系型数据库进行有效管理和查询,因此,分布式文件系统(如Hadoop的HDFS)和并行计算框架(如MapReduce)应运而生。它们能够处理PB级别的数据,提供高效率的数据存储和处理能力,使得快速访问和分析大规模数据成为可能。 大数据不仅仅关注数据的存储,还包括数据的分析和挖掘,以提取有价值的信息。科学、商业、医疗、交通等多个领域都广泛应用大数据技术来发现模式、预测趋势和优化决策。例如,科研中,大数据可以帮助科学家处理PB级别的数据,揭示新的科学发现;在商业上,企业利用大数据分析消费者行为,提升市场营销策略。 随着大数据技术的不断发展,新的挑战也随之而来,如数据安全、隐私保护、实时分析、数据质量保证以及跨领域的数据融合等问题。这些问题要求我们在技术上不断创新,同时,也需要法规和政策的支持,以确保大数据的合理、合规和有效利用。 大数据技术领域若干关键问题包括但不限于数据的收集、存储、处理、分析、可视化以及相关的法律和伦理问题。随着技术的不断进步,大数据将继续发挥其在推动科技进步和社会发展中的重要作用。

SELECT PIS.SHOW_FLT_DETAIL AS SHOW_FLT_DETAIL -- new , PIS.SHOW_AWB_DETAIL AS SHOW_AWB_DETAIL -- new , PIS.DISPLAY_AIRLINE_CODE AS CARRIER_CODE , DECODE(PIS.REVERT_FLOW,'N',PIS.FLOW_TYPE,DECODE(PIS.FLOW_TYPE,'I','E','I')) AS FLOW_TYPE , PIS.SHIP_TO_LOCATION AS SHIP_TO_LOCATION , PIS.INVOICE_SEQUENCE AS INVOICE_SEQUENCE , PFT.FLIGHT_DATE AS FLIGHT_DATE , PFT.FLIGHT_CARRIER_CODE AS FLIGHT_CARRIER_CODE , PFT.FLIGHT_SERIAL_NUMBER AS FLIGHT_SERIAL_NUMBER , PFT.FLOW_TYPE AS AIRCRAFT_FLOW , FAST.AIRCRAFT_SERVICE_TYPE AS AIRCRAFT_SERVICE_TYPE , PPT.AWB_NUMBER AS AWB_NUMBER , PPT.WEIGHT AS WEIGHT , PPT.CARGO_HANDLING_OPERATOR AS CARGO_HANDLING_OPERATOR , PPT.SHIPMENT_PACKING_TYPE AS SHIPMENT_PACKING_TYPE , PPT.SHIPMENT_FLOW_TYPE AS SHIPMENT_FLOW_TYPE , PPT.SHIPMENT_BUILD_TYPE AS SHIPMENT_BUILD_TYPE , PPT.SHIPMENT_CARGO_TYPE AS SHIPMENT_CARGO_TYPE , PPT.REVENUE_TYPE AS REVENUE_TYPE , PFT.JV_FLIGHT_CARRIER_CODE AS JV_FLIGHT_CARRIER_CODE , PPT.PORT_TONNAGE_UID AS PORT_TONNAGE_UID , PPT.AWB_UID AS AWB_UID , PIS.INVOICE_SEPARATION_UID AS INVOICE_SEPARATION_UID , PFT.FLIGHT_TONNAGE_UID AS FLIGHT_TONNAGE_UID FROM PN_FLT_TONNAGES PFT , FZ_AIRLINES FA , PN_TONNAGE_FLT_PORTS PTFP , PN_PORT_TONNAGES PPT , FF_AIRCRAFT_SERVICE_TYPES FAST , SR_PN_INVOICE_SEPARATIONS PIS --new , SR_PN_INVOICE_SEP_DETAILS PISD--new , SR_PN_INV_SEP_PORT_TONNAGES PISPT --new WHERE PFT.FLIGHT_OPERATION_DATE >= trunc( CASE :rundate WHEN TO_DATE('01/01/1900', 'DD/MM/YYYY') THEN ADD_MONTHS(SYSDATE,-1) ELSE ADD_MONTHS(:rundate,-1) END, 'MON') AND PFT.FLIGHT_OPERATION_DATE < trunc( CASE :rundate WHEN TO_DATE('01/01/1900', 'DD/MM/YYYY') THEN TRUNC(SYSDATE) ELSE TRUNC(:rundate) END, 'MON') AND PFT.TYPE IN ('C', 'F') AND PFT.RECORD_TYPE = 'M' AND (PFT.TERMINAL_OPERATOR NOT IN ('X', 'A') OR (PFT.TERMINAL_OPERATOR <> 'X' AND FA.CARRIER_CODE IN (SELECT * FROM SPECIAL_HANDLING_AIRLINE) AND PPT.REVENUE_TYPE IN (SELECT * FROM SPECIAL_REVENUE_TYPE) AND PPT.SHIPMENT_FLOW_TYPE IN (SELECT * FROM SPECIAL_SHIPMENT_FLOW_TYPE) AND PFT.FLIGHT_OPERATION_DATE >= (select EFF_DATE from SPECIAL_HANDLING_EFF_DATE) )) AND PFT.DELETING_DATETIME IS NULL AND FA.AIRLINE_UID = PFT.AIRLINE_UID AND FA.DELETING_DATETIME IS NULL AND PTFP.FLIGHT_TONNAGE_UID = PFT.FLIGHT_TONNAGE_UID AND PTFP.RECORD_TYPE = 'M' AND PTFP.DELETING_DATETIME IS NULL AND PPT.TONNAGE_FLIGHT_PORT_UID (+)= PTFP.TONNAGE_FLIGHT_PORT_UID AND PPT.RECORD_TYPE (+)= 'M' AND PPT.DISCREPANCY_TYPE (+)= 'NONE' AND PPT.ADJUSTMENT_INC_FLAG (+)= 'Y' AND PPT.DELETING_DATETIME (+) IS NULL AND FAST.AIRCRAFT_SERVICE_TYPE_UID = PFT.AIRCRAFT_SERVICE_TYPE_UID AND FAST.DELETING_DATETIME IS NULL AND PIS.TEMPORAL_NAME = TO_CHAR((CASE :rundate --new WHEN TO_DATE('01/01/1900', 'DD/MM/YYYY') THEN TRUNC(SYSDATE) ELSE TRUNC(:rundate) END ), 'YYYYMM') || '00' AND PIS.INVOICE_SEPARATION_UID = PISD.INVOICE_SEPARATION_UID --new AND PISD.INVOICE_SEP_DETAIL_UID = PISPT.INVOICE_SEP_DETAIL_UID --new AND PISPT.PORT_TONNAGE_UID = PPT.PORT_TONNAGE_UID --new AND PIS.PRINT_SUPPORTING_DOC = 'Y';上面是oracle的写法,请转成spark SQL的写法。

2023-06-02 上传