Mining Quantitative Associations Techniques can be categorized by how numerical attributes, such as age or salary are treated 1. Static discretization based on predefined concept hierarchies (data cube methods) 2. Dynamic discretization based on data distribution (quantitative rules, e.g., Srikant & Agrawal@SIGMOD’96) 3. Clustering: Distance-based association (e.g., Miller & Yang@SIGMOD’97) • One dimensional clustering then association 4. Deviation: (such as Aumann & Lindell@KDD’99) Gender = female => Wage: mean=$7/hr (overall mean = $9) Z检验(或t检验)翻译解释
时间: 2024-04-01 11:31:32 浏览: 120
这段话讨论了挖掘定量关联的技术,根据如何处理数字属性(如年龄或工资)可以分为以下几种类型:
1. 基于预定义概念层次结构的静态离散化方法(数据立方体方法);
2. 基于数据分布的动态离散化方法(定量规则),例如Srikant & Agrawal在SIGMOD'96上提出的方法;
3. 聚类方法:基于距离的关联方法,例如Miller & Yang在SIGMOD'97上提出的方法,可以先进行一维聚类,然后进行关联;
4. 偏差方法:例如Aumann & Lindell在KDD'99上提出的方法,可以根据性别为女性的人的平均工资和总体平均工资进行比较,使用Z检验(或t检验)来检查两者之间的差异。
其中,Z检验是一种用于检验样本均值与总体均值之间是否存在显著差异的统计方法,适用于样本容量较大的情况;t检验则适用于样本容量较小的情况。
相关问题
The LULC simulation data we utilized to create future EN maps was produced by X. Liu et al. (2017), which was conducted at the national level. The reason we apply national-level simulated data to a local area is as follows. Firstly, China has a top-down land use planning system (also known as spatial planning) with five levels. The quantitative objectives in national plans are handed down to county-level plans through provincial and prefectural level plans (Zhong et al., 2014). That means land use patterns of nine cities in WUA are required to reflect relevant upper-level plans, for example, to satisfy the land use quota made by Hubei provincial plans and the national plans. Secondly, there are interdependencies across places so what happens in one region produces effects not only on this location but on other regions (Overman et al., 2010). And the increase of construction land in one place will shift protection pressure on natural ecosystems elsewhere for a sustainable goal. The land use simulation at the national level allocated land resources from a top-down perspective and links land use changes in a region to events taking place in other locations through global simulation. However, the Kappa coefficient of the simulated data in WUA is 0.55 and the overall accuracy is 0.71, which is lower than the statistic value at the national-level data. Although the Kappa between 0.4~0.6 is moderate and at an acceptable level (Appiah et al., 2015; Ding et al., 2013; Ku, 2016), the simulated accuracy of the land use data needs to be improved. Future work on exploring the impact of LULC dynamics on EN will develop based on the high-accuracy simulated data and updating the initial simulated time to 2020, by integrating the impacts of socioeconomic factors, climate change, regional planning, land use policy, etc.
答:我们使用的用于创建未来EN地图的LULC模拟数据由X。刘等人(2017)在全国范围内进行,我们将全国层面的模拟数据应用于局部地区的原因如下:首先,中国拥有自上而下的土地利用规划系统(也称为空间规划),具有五个层次。国家计划中的定量目标通过省级和地级计划,传递到县级计划(Zhong等,2014)。这意味着WUA的九个城市的土地利用模式需要反映相关的上级计划,例如,满足湖北省计划和国家计划制定的土地利用配额。其次,不同地方之间存在相互依赖性,因此发生在一个地区的事情不仅会产生影响,而且会影响其他地区(Overman等,2010)。一个地方建设用地的增加会转移到其他地方的自然生态系统上,以达到可持续的目标。全国层面的土地利用模拟从自上而下的视角分配土地资源,并通过全球模拟将一个地区的土地利用变化与其他地点发生的事件联系起来。然而,WUA中模拟数据的Kappa系数为0.55,总体准确率为0.71,低于全国层面的统计值。虽然0.4~0.6之间的Kappa值为中等水平,且水平可接受(Appiah等,2015;丁等,2013;库,2016),但土地利用数据的模拟准确度仍需要提高。未来的工作将基于高精度的模拟数据,将初始模拟时间更新到2020年,并通过综合考虑社会经济因素、气候变化、区域规划、土地利用政策等影响,探索LULC动态对EN的影响。
Mining Multi-Dimensional Association • Single-dimensional rules: buys(X, “milk”) ⇒ buys(X, “bread”) • Multi-dimensional rules: ≥ 2 dimensions or predicates • Inter-dimension assoc. rules (no repeated predicates) age(X,”19-25”) ∧ occupation(X,“student”) ⇒ buys(X, “coke”) • hybrid-dimension assoc. rules (repeated predicates) age(X,”19-25”) ∧ buys(X, “popcorn”) ⇒ buys(X, “coke”) • Categorical Attributes: finite number of possible values, no ordering among values—data cube approach • Quantitative Attributes: Numeric, implicit ordering among values— discretization, clustering, and other approaches翻译解释
这段话是讨论多维关联挖掘的内容。在单维规则中,我们只考虑一个谓词,例如“买牛奶就买面包”。而在多维规则中,我们考虑两个或多个谓词之间的关联,例如“年龄在19-25岁且职业是学生的人会买可乐”。
在多维规则中,可以存在跨维度的关联规则,其中没有重复的谓词。而在混合维度关联规则中,可能存在重复的谓词。
在分类属性中,属性值是有限的,没有排序。可以使用数据立方体方法来处理。而在定量属性中,属性值是数字,有隐含的排序。可以使用离散化、聚类等方法来处理。
阅读全文