用Textrank提取关键词
时间: 2023-09-09 14:12:02 浏览: 30
Textrank是一种基于图论的文本关键词提取方法,以下是使用Python实现Textrank提取关键词的步骤:
1. 导入相关的库
```python
import jieba
import jieba.analyse
import networkx as nx
```
2. 加载文本并进行分词
```python
with open('text.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 分词
words = jieba.cut(text)
```
3. 使用Textrank算法计算词之间的权重
```python
# 计算词频
word_freq = {}
for word in words:
if word in word_freq:
word_freq[word] += 1
else:
word_freq[word] = 1
# 计算词之间的权重
g = nx.Graph()
for word, freq in word_freq.items():
g.add_node(word, weight=freq)
for word1 in word_freq:
for word2 in word_freq:
if word1 == word2:
continue
weight = (word_freq[word1] * word_freq[word2]) ** 0.5
g.add_edge(word1, word2, weight=weight)
```
4. 使用PageRank算法计算关键词的重要性
```python
pr = nx.pagerank(g)
# 按照重要性排序
keywords = sorted(pr, key=pr.get, reverse=True)
```
5. 输出关键词
```python
# 输出前10个关键词
print(keywords[:10])
```
完整代码如下:
```python
import jieba
import jieba.analyse
import networkx as nx
# 加载文本并进行分词
with open('text.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 分词
words = jieba.cut(text)
# 计算词频
word_freq = {}
for word in words:
if word in word_freq:
word_freq[word] += 1
else:
word_freq[word] = 1
# 计算词之间的权重
g = nx.Graph()
for word, freq in word_freq.items():
g.add_node(word, weight=freq)
for word1 in word_freq:
for word2 in word_freq:
if word1 == word2:
continue
weight = (word_freq[word1] * word_freq[word2]) ** 0.5
g.add_edge(word1, word2, weight=weight)
# 使用PageRank算法计算关键词的重要性
pr = nx.pagerank(g)
# 按照重要性排序
keywords = sorted(pr, key=pr.get, reverse=True)
# 输出前10个关键词
print(keywords[:10])
```