dbscan算法 python代码
时间: 2023-07-23 19:38:01 浏览: 117
密度聚类dbscan算法—python代码实现(含二维三维案例、截图、说明手册等)
5星 · 资源好评率100%
以下是 DBSCAN 聚类算法的 Python 代码实现:
```python
from sklearn.neighbors import NearestNeighbors
import numpy as np
class DBSCAN:
def __init__(self, eps=0.5, min_samples=5):
self.eps = eps
self.min_samples = min_samples
self.core_indices_ = None
self.labels_ = None
def fit(self, X):
n_samples = X.shape[0]
self.core_indices_ = np.zeros(n_samples, dtype=bool)
nbrs = NearestNeighbors(n_neighbors=self.min_samples, metric='euclidean').fit(X)
distances, indices = nbrs.kneighbors(X)
core_distances = np.zeros(n_samples)
for i in range(n_samples):
core_distances[i] = distances[i, self.min_samples - 1]
if core_distances[i] <= self.eps:
self.core_indices_[i] = True
labels = np.zeros(n_samples)
current_label = 1
for i in range(n_samples):
if not self.core_indices_[i]:
continue
if labels[i] != 0:
continue
labels[i] = current_label
neighbors = indices[i, 1:].tolist()
while len(neighbors) > 0:
j = neighbors.pop(0)
if labels[j] == -1:
labels[j] = current_label
if labels[j] != 0:
continue
labels[j] = current_label
if self.core_indices_[j]:
neighbors += indices[j, 1:].tolist()
current_label += 1
self.labels_ = labels
return self.labels_
```
该算法实现了 DBSCAN 算法,其中 `eps` 和 `min_samples` 是 DBSCAN 算法的两个超参数,分别表示邻域半径和最小样本数。在 `fit` 方法中,首先计算每个样本点的核心距离,然后确定核心点的索引,通过 BFS(广度优先搜索)遍历所有核心点的邻居,将它们分配到对应的聚类中,并标记为已访问。最终返回所有样本点的聚类标签。
阅读全文