掌握四种偏最小二乘分组方法的MATLAB实现

版权申诉
0 下载量 197 浏览量 更新于2024-12-04 收藏 1KB ZIP 举报
偏最小二乘是一种在多元统计分析中广泛使用的降维技术,尤其适用于数据集中自变量(解释变量)和因变量(响应变量)之间存在多重共线性时的数据分析。它通过建立数据的成分(或称因子),旨在找到自变量和因变量之间的最佳投影方向,从而有效提取出反映自变量和因变量关系的主要信息。 在统计学和机器学习领域,偏最小二乘回归(PLSR)是偏最小二乘分析的一个重要分支,主要用于回归分析。除了PLSR,偏最小二乘还广泛应用于分类(PLS-DA)、生存数据分析以及其他多种数据分析场景。偏最小二乘的分组方法主要有以下几种: 1. 交叉验证法(Cross-validation):该方法通过将数据集分成多个小组,轮流使用其中一部分作为测试集,其余部分作为训练集,来评估模型的预测能力。这种方法可以有效避免模型过拟合,并帮助选择最适宜的PLS成分数目。 2. 留一法(Leave-one-out):这是一种特殊形式的交叉验证,每一次迭代只留下一个样本作为测试集,其余样本作为训练集。留一法对数据量较小的数据集较为适用,但计算量相对较大。 3. 随机分组(Random Grouping):在这种方法中,数据被随机分成若干组。不同的分组方案可能会产生不同的PLS模型,通过分析不同模型的统计特性可以得出更为稳定和可靠的分析结果。 4. 数据集划分法(Dataset Splitting):这种方法将数据集明确地划分为训练集和测试集。通过调整训练集和测试集的比例,可以找到最佳的模型复杂度与泛化能力之间的平衡。 了解和掌握偏最小二乘方法及其分组技术对于进行数据分析和建模是非常重要的。通过本压缩包中的文件,用户可以具体学习到如何在MATLAB环境下实现这些方法,并进行数据分析。尽管压缩包中的文件列表名为“fz”,但由于缺乏具体文件内容的描述,我们无法确定“fz”具体指代什么。不过,根据标题和描述,可以合理推断这些文件应该包含了MATLAB代码示例、数据集以及可能的分析结果和图形展示,以便用户能够更直观地理解偏最小二乘分析的过程及其效果。"

SELECT PIS.SHOW_FLT_DETAIL AS SHOW_FLT_DETAIL -- new , PIS.SHOW_AWB_DETAIL AS SHOW_AWB_DETAIL -- new , PIS.DISPLAY_AIRLINE_CODE AS CARRIER_CODE , DECODE(PIS.REVERT_FLOW,'N',PIS.FLOW_TYPE,DECODE(PIS.FLOW_TYPE,'I','E','I')) AS FLOW_TYPE , PIS.SHIP_TO_LOCATION AS SHIP_TO_LOCATION , PIS.INVOICE_SEQUENCE AS INVOICE_SEQUENCE , PFT.FLIGHT_DATE AS FLIGHT_DATE , PFT.FLIGHT_CARRIER_CODE AS FLIGHT_CARRIER_CODE , PFT.FLIGHT_SERIAL_NUMBER AS FLIGHT_SERIAL_NUMBER , PFT.FLOW_TYPE AS AIRCRAFT_FLOW , FAST.AIRCRAFT_SERVICE_TYPE AS AIRCRAFT_SERVICE_TYPE , PPT.AWB_NUMBER AS AWB_NUMBER , PPT.WEIGHT AS WEIGHT , PPT.CARGO_HANDLING_OPERATOR AS CARGO_HANDLING_OPERATOR , PPT.SHIPMENT_PACKING_TYPE AS SHIPMENT_PACKING_TYPE , PPT.SHIPMENT_FLOW_TYPE AS SHIPMENT_FLOW_TYPE , PPT.SHIPMENT_BUILD_TYPE AS SHIPMENT_BUILD_TYPE , PPT.SHIPMENT_CARGO_TYPE AS SHIPMENT_CARGO_TYPE , PPT.REVENUE_TYPE AS REVENUE_TYPE , PFT.JV_FLIGHT_CARRIER_CODE AS JV_FLIGHT_CARRIER_CODE , PPT.PORT_TONNAGE_UID AS PORT_TONNAGE_UID , PPT.AWB_UID AS AWB_UID , PIS.INVOICE_SEPARATION_UID AS INVOICE_SEPARATION_UID , PFT.FLIGHT_TONNAGE_UID AS FLIGHT_TONNAGE_UID FROM PN_FLT_TONNAGES PFT , FZ_AIRLINES FA , PN_TONNAGE_FLT_PORTS PTFP , PN_PORT_TONNAGES PPT , FF_AIRCRAFT_SERVICE_TYPES FAST , SR_PN_INVOICE_SEPARATIONS PIS --new , SR_PN_INVOICE_SEP_DETAILS PISD--new , SR_PN_INV_SEP_PORT_TONNAGES PISPT --new WHERE PFT.FLIGHT_OPERATION_DATE >= trunc( CASE :rundate WHEN TO_DATE('01/01/1900', 'DD/MM/YYYY') THEN ADD_MONTHS(SYSDATE,-1) ELSE ADD_MONTHS(:rundate,-1) END, 'MON') AND PFT.FLIGHT_OPERATION_DATE < trunc( CASE :rundate WHEN TO_DATE('01/01/1900', 'DD/MM/YYYY') THEN TRUNC(SYSDATE) ELSE TRUNC(:rundate) END, 'MON') AND PFT.TYPE IN ('C', 'F') AND PFT.RECORD_TYPE = 'M' AND (PFT.TERMINAL_OPERATOR NOT IN ('X', 'A') OR (PFT.TERMINAL_OPERATOR <> 'X' AND FA.CARRIER_CODE IN (SELECT * FROM SPECIAL_HANDLING_AIRLINE) AND PPT.REVENUE_TYPE IN (SELECT * FROM SPECIAL_REVENUE_TYPE) AND PPT.SHIPMENT_FLOW_TYPE IN (SELECT * FROM SPECIAL_SHIPMENT_FLOW_TYPE) AND PFT.FLIGHT_OPERATION_DATE >= (select EFF_DATE from SPECIAL_HANDLING_EFF_DATE) )) AND PFT.DELETING_DATETIME IS NULL AND FA.AIRLINE_UID = PFT.AIRLINE_UID AND FA.DELETING_DATETIME IS NULL AND PTFP.FLIGHT_TONNAGE_UID = PFT.FLIGHT_TONNAGE_UID AND PTFP.RECORD_TYPE = 'M' AND PTFP.DELETING_DATETIME IS NULL AND PPT.TONNAGE_FLIGHT_PORT_UID (+)= PTFP.TONNAGE_FLIGHT_PORT_UID AND PPT.RECORD_TYPE (+)= 'M' AND PPT.DISCREPANCY_TYPE (+)= 'NONE' AND PPT.ADJUSTMENT_INC_FLAG (+)= 'Y' AND PPT.DELETING_DATETIME (+) IS NULL AND FAST.AIRCRAFT_SERVICE_TYPE_UID = PFT.AIRCRAFT_SERVICE_TYPE_UID AND FAST.DELETING_DATETIME IS NULL AND PIS.TEMPORAL_NAME = TO_CHAR((CASE :rundate --new WHEN TO_DATE('01/01/1900', 'DD/MM/YYYY') THEN TRUNC(SYSDATE) ELSE TRUNC(:rundate) END ), 'YYYYMM') || '00' AND PIS.INVOICE_SEPARATION_UID = PISD.INVOICE_SEPARATION_UID --new AND PISD.INVOICE_SEP_DETAIL_UID = PISPT.INVOICE_SEP_DETAIL_UID --new AND PISPT.PORT_TONNAGE_UID = PPT.PORT_TONNAGE_UID --new AND PIS.PRINT_SUPPORTING_DOC = 'Y';上面是oracle的写法,请转成spark SQL的写法。

250 浏览量
2023-06-05 上传
194 浏览量