能否提供一个用Python实现的Optics(密度聚类)算法的具体代码示例?
时间: 2024-10-28 21:14:57 浏览: 7
当然可以。Optics(Ordering Points To Identify the Clustering Structure)是一种基于密度的空间聚类算法,常用于异常检测和噪声点识别。以下是一个简单的Python实现,使用了`scipy`库:
```python
from scipy.spatial.distance import pdist, cdist
from sklearn.cluster import DBSCAN
import numpy as np
# 假设我们有二维数据集data
data = np.array([[1, 2], [2, 2], [2, 3], [8, 7], [8, 8], [25, 80]]) # 示例数据
def optics_algorithm(X, eps=0.5, min_samples=5):
# 计算所有样本之间的距离矩阵
dist_matrix = pdist(X)
# 将距离转换为连接矩阵
reachability_distance = cdist([np.inf] * len(X), X, lambda u, v: -u) + dist_matrix
# 初始化标记数组,初始状态下每个点都是孤立的
ordering = np.arange(len(X))
reached_at = np.zeros(len(X))
# Optics算法的核心部分
core_sample_indices = []
for point_index in range(len(X)):
if reached_at[point_index] == 0:
neighbors = np.where(reachability_distance[ordering, point_index] <= eps)[0]
reachability_distance[ordering[neighbors], point_index] = eps + reachability_distance[ordering[neighbors], neighbors]
# 如果邻居的数量超过min_samples,则更新核心点列表
if len(neighbors) >= min_samples:
core_sample_indices.append(point_index)
ordering = np.concatenate((ordering[:point_index], ordering[neighbors], ordering[point_index+1:]))
# 更新达到状态
reached_at[neighbors] = point_index + 1
return ordering, reached_at, core_sample_indices
ordering, reached_at, core_samples = optics_algorithm(data)
# 可视化结果
import matplotlib.pyplot as plt
plt.scatter(data[:, 0], data[:, 1], s=(reached_at / reached_at.max())**2, c='gray')
plt.scatter(data[core_samples, 0], data[core_samples, 1], s=100, edgecolors='black', facecolors='none')
plt.show()
阅读全文