Python库中还有哪些可以实现TF-IDF、TextRank和LSA三种算法的程序包,并通过实例实现关键词提取。
时间: 2024-05-03 07:20:52 浏览: 87
Python实现中文文本关键词抽取,分别采用TF-IDF、TextRank、Word2Vec词聚类三种方法
可以使用Python的sklearn库中的TfidfVectorizer、CountVectorizer和TruncatedSVD实现TF-IDF、TextRank和LSA算法。以下是一个示例代码:
```python
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.decomposition import TruncatedSVD
from textrankr import TextRank
# 使用TfidfVectorizer实现TF-IDF算法
corpus = ['This is the first document.', 'This is the second document.', 'And this is the third one.', 'Is this the first document?']
vectorizer = TfidfVectorizer()
tfidf = vectorizer.fit_transform(corpus)
print(vectorizer.get_feature_names())
print(tfidf.toarray())
# 使用CountVectorizer和TextRank实现TextRank算法
corpus = ['This is the first document.', 'This is the second document.', 'And this is the third one.', 'Is this the first document?']
vectorizer = CountVectorizer()
tf = vectorizer.fit_transform(corpus)
tr = TextRank(tf.toarray())
keywords = tr.summarize(2)
print(keywords)
# 使用TruncatedSVD实现LSA算法
corpus = ['This is the first document.', 'This is the second document.', 'And this is the third one.', 'Is this the first document?']
vectorizer = TfidfVectorizer()
tfidf = vectorizer.fit_transform(corpus)
svd = TruncatedSVD(n_components=2)
lsa = svd.fit_transform(tfidf)
print(lsa)
```
当然,以上只是简单示例,具体实现还需要根据实际需求进行调整。
阅读全文