传统的fcm聚类算法
时间: 2024-12-28 15:20:41 浏览: 7
### 传统FCM聚类算法原理
模糊C-均值(Fuzzy C-Means, FCM)聚类算法是一种软聚类方法,允许数据点可以部分属于多个簇。这种特性使得FCM特别适用于处理边界不清晰的数据集。FCM的目标是最小化目标函数:
\[ J_m(U,V)=\sum_{i=1}^{n}\sum_{j=1}^{c}(u_{ij})^m d^2(x_i,v_j), \]
其中 \( u_{ij} \) 表示第 \( i \) 个样本对于第 \( j \) 类的隶属度;\( v_j \) 是第 \( j \) 类的中心位置;\( m \geq 1 \) 控制着隶属度分布的宽度;而 \( d(\cdot,\cdot) \) 则代表距离测度,在大多数情况下采用欧氏距离。
为了最小化上述目标函数,需要迭代更新两个变量——隶属矩阵 \( U=[u_{ij}] \) 和类别中心 \( V=\{v_1,...,v_c\} \)[^1]。
### 实现过程中的关键步骤
#### 初始化阶段
初始化过程中随机设定各个类别的初始质心,并赋予每个对象一定的概率去隶属于这些类别之一。这一步骤非常重要因为它直接影响到后续计算的速度和最终结果的质量。
#### 迭代优化
接着进入主要循环体内部执行以下操作直到满足收敛条件为止:
- 计算新的成员资格权重;
- 更新各族群重心坐标;
当两次连续迭代之间的变化小于预定义阈值时认为达到了稳定状态并停止运算。
下面给出一段简单的Python代码来展示如何利用`sklearn`库实现这一经典版本的FCM:
```python
from sklearn.cluster import KMeans
import numpy as np
def fcm(data, n_clusters, m=2., error=1e-5, max_iter=1000):
"""Implementation of the standard Fuzzy C Means clustering algorithm."""
# Initialize membership matrix randomly between [0, 1], ensuring sum along rows equals to one.
memship = np.random.rand(len(data), n_clusters)
memship /= np.sum(memship, axis=1)[:, None]
prev_centers = None
for _ in range(max_iter):
# Compute cluster centers based on current memberships
centers = compute_centers(data, memship, m)
if prev_centers is not None and all(np.linalg.norm(centers - prev_centers) < error):
break
prev_centers = centers.copy()
# Update membersip values according to new center positions
distances = calc_distance_matrix(data, centers)
power_term = (-2./(m-1))
numerator = 1./distances**(power_term)
denominator = numerator.dot(np.ones((n_clusters, 1)))
memship = (numerator / denominator).clip(min=np.finfo(float).eps)
return memship.argmax(axis=-1)
def compute_centers(X, M, fuzziness_factor):
"""Compute updated centroids from data X using given membership degrees"""
weighted_sum = ((X.T @ (M ** fuzziness_factor)).T)
weights = M.sum(axis=0)**fuzziness_factor
return weighted_sum/weights.reshape(-1, 1)
def calc_distance_matrix(points, centers):
"""Calculate Euclidean distance matrices between points and clusters' centers"""
dists_sqrd = []
for c in centers:
diff = points-c
sq_dist = np.einsum('ij, ij->i', diff, diff)
dists_sqrd.append(sq_dist)
return np.array(dists_sqrd).T
if __name__ == "__main__":
# Example usage with synthetic dataset generation
from sklearn.datasets.samples_generator import make_blobs
X, y_true = make_blobs(n_samples=300, centers=4,
cluster_std=0.60, random_state=0)
labels = fcm(X, n_clusters=4)
```
这段程序实现了基本形式下的FCM逻辑框架[^2]。
阅读全文