python语言用dbscan聚类做情感文本聚类演示代码
时间: 2023-09-08 08:10:29 浏览: 48
以下是使用Python语言实现DBSCAN聚类算法进行情感文本聚类的示例代码:
```python
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn.feature_extraction.text import TfidfVectorizer
# 定义数据集
sentences = [
"I love this product, it's amazing!",
"This is terrible, I would never recommend it to anyone.",
"The customer service was great, very helpful.",
"I had a bad experience with this product, it didn't work as expected.",
"This is the best purchase I've ever made, highly recommended!",
"I was disappointed with this product, it didn't meet my expectations.",
"The shipping was fast, I received my order within two days.",
"This product exceeded my expectations, I'm extremely satisfied with my purchase."
]
# 将文本数据转换为向量
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(sentences)
# 使用DBSCAN进行聚类
dbscan = DBSCAN(eps=0.5, min_samples=2)
dbscan.fit(X)
# 输出聚类结果
labels = dbscan.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
print('Estimated number of clusters: %d' % n_clusters_)
print("Cluster labels:", labels)
```
以上代码中,我们首先定义了一个情感文本数据集`sentences`,然后使用TF-IDF向量化器将文本数据转换为向量。接着,我们使用DBSCAN聚类算法对数据进行聚类,并输出聚类结果。聚类结果中,`-1`表示噪声数据点,大于`-1`的整数表示该数据点所属的簇的编号。