数据聚类模式：K-means算法与多中心现象

版权申诉

119 浏览量更新于2024-08-04 收藏 866KB PDF 举报

"cluster_lecture.pdf"是一份关于数据聚类的学术讲座材料，针对机器学习（ML2022）课程，它探讨了数据中的聚类模式。聚类是许多现象中常见的简单模式，它涉及到寻找数据中的集中趋势或分组。核心概念包括： 1. 目标与偏差：自然现象中，可能存在一个中心值或目标，个体实例试图接近这个目标，但往往以典型偏差的方式偏离。这可以通过正态分布模型来理解，它展示了个体围绕平均值的分散程度。 2. 调整和复杂性：在更复杂的数据集里，可能需要考虑不同尺度因素的影响，比如收入、身高等数据可能受到多种因素的综合影响，不能仅仅依赖单一的平均值。 3. 多目标和关联过程：有些情况下，可能发现不止一个中心点，它们代表不同的过程或子群体，这些过程可能相互关联，例如多个市场细分或用户类型。 4. k-means算法：这是一种广泛应用的聚类方法，通过将数据分为若干个互不重叠的集群（k-clusters），每个簇内的数据点彼此相似，而与其他簇有较高的差异。 5. 一维聚类模型：讲座举例说明了一维空间中的聚类，如家庭的平均孩子数量并非精确的数值，而是近似值，人们通常接受并理解这种平均作为某种“典型”或“代表性”。这份讲座深入浅出地讲解了聚类分析的基本原理和实际应用，对于理解和处理数据中的结构和模式具有重要的指导意义。通过学习和实践k-means等算法，学生能够掌握如何在实际问题中发现和利用数据的内在组织结构。

Cluster Patterns in Data

ML 2022: Machine Learning

https://people.sc.fsu.edu/∼jburkardt/classes/ml 2022/cluster lecture/cluster lecture.pdf

Sometimes Nature takes shots at a target, but her aim isn’t perfect!

Cluster Patterns

Given new data, we can search for is evidence of a central tendency, or clustering.

• clustering is a simple pattern common to many phenomena;

• there may be a “target” or central value;

• individual examples “aim” for the target, but tend to miss by a typical amount;

• the normal distribution is a model of this behavior;

• in more complicated examples, we may need to adjust for scale factors;

• we may also discover that there are several targets, associated with separate but related processes;

• the kmeans algorithm can partition data into k separate clusters;

1 A Model of Clustering in One Dimension

We are used to the idea that there is an average height, an average number of children per family, an average

income, an average number of miles traveled by car each year. The average number of children, as computed

in 2019, is 1.93, and we would NOT expect any family to have exactly that number. We simply regard the

average as some kind of “target” or “central value”, so that actual values usually tend to be close (and, in

this case, integers!).

In some cases, such as for the height of men 20 years and older, we can say more. Measured values of height

will tend to be near the average value of 5 feet and 9 inches (1.75 meters) in a way that approximates a

mathematical shape called the normal distribution. The normal distribution, with parameters µ and σ

determines the probability of a value x as:

N(x; µ, σ

) =

√

2πσ

−

(x−µ)

2σ

Here µ is the mean or average value and σ

is the variance. The square root σ is known as the standard

deviation. A plot of the normal distribution will show why it is known as the “bell curve”, since it has

a bell-shaped central peak. Changing the value of µ simply moves the curve left or right, so that its peak

下载后可阅读完整内容，剩余8页未读，立即下载

卷积神经网络

粉丝: 378

数据聚类模式：K-means算法与多中心现象

单片机开发基础讲座 - ESS265_Lecture1.pdf

实时渲染学习资源分享：GAMES202系列PPT汇总

闫令琪《现代计算机图形学》课件完整版

linear_lecture.pdf

logistic_lecture.pdf

intro_lecture.pdf

gradient_lecture.pdf

algebra_lecture.pdf

svm_lecture.pdf

data_lecture.pdf

最新资源