python dbscan算法的实现
时间: 2023-09-08 07:14:51 浏览: 50
DBSCAN(Density-Based Spatial Clustering of Applications with Noise)是一种基于密度的聚类算法,它能够将具有足够密度的区域划分为簇,并可以发现任意形状的簇,同时可以将噪声点识别出来。下面是Python实现DBSCAN算法的代码示例:
``` python
from typing import List, Tuple
import numpy as np
def euclidean_distance(x: np.ndarray, y: np.ndarray) -> float:
"""计算欧几里得距离"""
return np.sqrt(np.sum((x - y) ** 2))
def region_query(X: np.ndarray, point_id: int, eps: float) -> List[int]:
"""计算半径为eps内的点的id列表"""
neighbors = []
for i, point in enumerate(X):
if euclidean_distance(X[point_id], point) <= eps:
neighbors.append(i)
return neighbors
def expand_cluster(X: np.ndarray, point_id: int, neighbors_ids: List[int], cluster_id: int, eps: float, min_samples: int, labels: np.ndarray) -> bool:
"""将点加入簇中并判断是否可扩展"""
labels[point_id] = cluster_id
while len(neighbors_ids) > 0:
current_point_id = neighbors_ids[0]
current_neighbors = region_query(X, current_point_id, eps)
if len(current_neighbors) >= min_samples:
for neighbor_id in current_neighbors:
if labels[neighbor_id] == -1:
labels[neighbor_id] = cluster_id
elif labels[neighbor_id] == 0:
labels[neighbor_id] = cluster_id
neighbors_ids.append(neighbor_id)
neighbors_ids = neighbors_ids[1:]
return True
def dbscan(X: np.ndarray, eps: float, min_samples: int) -> np.ndarray:
"""DBSCAN算法实现函数"""
labels = np.zeros(X.shape[0])
cluster_id = 0
for point_id, point in enumerate(X):
if labels[point_id] != 0:
continue
neighbors_ids = region_query(X, point_id, eps)
if len(neighbors_ids) < min_samples:
labels[point_id] = -1 # 标记为噪声点
else:
cluster_id += 1
expand_cluster(X, point_id, neighbors_ids, cluster_id, eps, min_samples, labels)
return labels
```
在使用该算法时,需要传入数据集X、半径eps和最小样本数min_samples三个参数,其中eps和min_samples是算法的超参数,需要根据具体问题进行调整。返回的labels数组记录了每个点所属的簇的编号,如果编号为0表示该点为未分类点,如果编号为-1表示该点为噪声点。