Mining Quantitative Associations Techniques can be categorized by how numerical attributes, such as age or salary are treated 1. Static discretization based on predefined concept hierarchies (data cube methods) 2. Dynamic discretization based on data distribution (quantitative rules, e.g., Srikant & Agrawal@SIGMOD’96) 3. Clustering: Distance-based association (e.g., Miller & Yang@SIGMOD’97) • One dimensional clustering then association 4. Deviation: (such as Aumann & Lindell@KDD’99) Gender = female => Wage: mean=$7/hr (overall mean = $9) Z检验（或t检验）翻译解释

时间: 2024-04-01 11:31:32 浏览: 120

这段话讨论了挖掘定量关联的技术，根据如何处理数字属性（如年龄或工资）可以分为以下几种类型： 1. 基于预定义概念层次结构的静态离散化方法（数据立方体方法）； 2. 基于数据分布的动态离散化方法（定量规则），例如Srikant & Agrawal在SIGMOD'96上提出的方法； 3. 聚类方法：基于距离的关联方法，例如Miller & Yang在SIGMOD'97上提出的方法，可以先进行一维聚类，然后进行关联； 4. 偏差方法：例如Aumann & Lindell在KDD'99上提出的方法，可以根据性别为女性的人的平均工资和总体平均工资进行比较，使用Z检验（或t检验）来检查两者之间的差异。其中，Z检验是一种用于检验样本均值与总体均值之间是否存在显著差异的统计方法，适用于样本容量较大的情况；t检验则适用于样本容量较小的情况。

The LULC simulation data we utilized to create future EN maps was produced by X. Liu et al. (2017), which was conducted at the national level. The reason we apply national-level simulated data to a local area is as follows. Firstly, China has a top-down land use planning system (also known as spatial planning) with five levels. The quantitative objectives in national plans are handed down to county-level plans through provincial and prefectural level plans (Zhong et al., 2014). That means land use patterns of nine cities in WUA are required to reflect relevant upper-level plans, for example, to satisfy the land use quota made by Hubei provincial plans and the national plans. Secondly, there are interdependencies across places so what happens in one region produces effects not only on this location but on other regions (Overman et al., 2010). And the increase of construction land in one place will shift protection pressure on natural ecosystems elsewhere for a sustainable goal. The land use simulation at the national level allocated land resources from a top-down perspective and links land use changes in a region to events taking place in other locations through global simulation. However, the Kappa coefficient of the simulated data in WUA is 0.55 and the overall accuracy is 0.71, which is lower than the statistic value at the national-level data. Although the Kappa between 0.4~0.6 is moderate and at an acceptable level (Appiah et al., 2015; Ding et al., 2013; Ku, 2016), the simulated accuracy of the land use data needs to be improved. Future work on exploring the impact of LULC dynamics on EN will develop based on the high-accuracy simulated data and updating the initial simulated time to 2020, by integrating the impacts of socioeconomic factors, climate change, regional planning, land use policy, etc.

答：我们使用的用于创建未来EN地图的LULC模拟数据由X。刘等人（2017）在全国范围内进行，我们将全国层面的模拟数据应用于局部地区的原因如下：首先，中国拥有自上而下的土地利用规划系统（也称为空间规划），具有五个层次。国家计划中的定量目标通过省级和地级计划，传递到县级计划（Zhong等，2014）。这意味着WUA的九个城市的土地利用模式需要反映相关的上级计划，例如，满足湖北省计划和国家计划制定的土地利用配额。其次，不同地方之间存在相互依赖性，因此发生在一个地区的事情不仅会产生影响，而且会影响其他地区（Overman等，2010）。一个地方建设用地的增加会转移到其他地方的自然生态系统上，以达到可持续的目标。全国层面的土地利用模拟从自上而下的视角分配土地资源，并通过全球模拟将一个地区的土地利用变化与其他地点发生的事件联系起来。然而，WUA中模拟数据的Kappa系数为0.55，总体准确率为0.71，低于全国层面的统计值。虽然0.4~0.6之间的Kappa值为中等水平，且水平可接受（Appiah等，2015;丁等，2013;库，2016），但土地利用数据的模拟准确度仍需要提高。未来的工作将基于高精度的模拟数据，将初始模拟时间更新到2020年，并通过综合考虑社会经济因素、气候变化、区域规划、土地利用政策等影响，探索LULC动态对EN的影响。

Mining Multi-Dimensional Association • Single-dimensional rules: buys(X, “milk”) ⇒ buys(X, “bread”) • Multi-dimensional rules: ≥ 2 dimensions or predicates • Inter-dimension assoc. rules (no repeated predicates) age(X,”19-25”) ∧ occupation(X,“student”) ⇒ buys(X, “coke”) • hybrid-dimension assoc. rules (repeated predicates) age(X,”19-25”) ∧ buys(X, “popcorn”) ⇒ buys(X, “coke”) • Categorical Attributes: finite number of possible values, no ordering among values—data cube approach • Quantitative Attributes: Numeric, implicit ordering among values— discretization, clustering, and other approaches翻译解释

这段话是讨论多维关联挖掘的内容。在单维规则中，我们只考虑一个谓词，例如“买牛奶就买面包”。而在多维规则中，我们考虑两个或多个谓词之间的关联，例如“年龄在19-25岁且职业是学生的人会买可乐”。在多维规则中，可以存在跨维度的关联规则，其中没有重复的谓词。而在混合维度关联规则中，可能存在重复的谓词。在分类属性中，属性值是有限的，没有排序。可以使用数据立方体方法来处理。而在定量属性中，属性值是数字，有隐含的排序。可以使用离散化、聚类等方法来处理。

阅读全文

相关推荐

商业分析与数据驱动决策：塑造学生的未来职业生涯

1993年S.M.Pytel研究钙添加对8620钢工具磨损影响

QuantLib.js轻量级交互式计算笔记本介绍

Quantitative measurement of displacement and strain by the numerical moire method

Detection of quantization index modulation steganography in G.723.1 bit stream based on quantization index sequence analysis

Fusion of two typical quantitative steganalysis based on SVR

Can AI Planning be used for Quantitative Finance Problems?.pdf

Quantitative deviation of the two-photon absorption coefficient based on three laser pulse models

R语言_量化回测_Quantitative-backtest-based-on-R-software.zip

Quantitative optimization of interoperability during feature-based data exchange (ESI高被引论文)

visual data quantitative information

定量金融投资分析：Jupyter笔记本工具集

GENESIM技术：将决策树集合优化为单一高预测性能决策树

基于STM32单片机的激光雕刻机控制系统设计-含详细步骤和代码

白色简洁风格的前端网站模板下载.zip

大家在看

STM8L051F3P6使用手册（中文）.zip

华为2403安装手册.

TwinCAT3.1学习笔记

新代plc资料

先栅极还是后栅极 业界争论高K技术

最新推荐

计算机体系结构量化分析第六版课后答案

Applied Quantitative Methods for Trading and Investment

关于深度学习的九篇标志性论文

掌握HTML/CSS/JS和Node.js的Web应用开发实践

管理建模和仿真的文件

计算机体系结构概述：基础概念与发展趋势

int a[][3]={{1,2},{4}}输出这个数组

勒玛算法研讨会项目：在线商店模拟与Qt界面实现

"互动学习：行动中的多样性与论文攻读经历"

【计算机组成原理精讲】：从零开始深入理解计算机硬件

先栅极还是后栅极业界争论高K技术