python实现textrank代码
时间: 2023-09-30 17:09:05 浏览: 139
好的,下面是 Python 实现的 TextRank 代码:
``` python
import itertools
import networkx as nx
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from string import punctuation
def preprocess(text):
stop_words = set(stopwords.words('english') + list(punctuation))
words = [word.lower() for word in word_tokenize(text)]
return [word for word in words if word not in stop_words]
def similarity(sent1, sent2):
words1 = set(sent1)
words2 = set(sent2)
return len(words1 & words2) / (len(words1) + len(words2))
def textrank(text, top_n):
sentences = sent_tokenize(text)
sentence_tokens = [preprocess(sent) for sent in sentences]
sim_graph = nx.Graph()
for sent1, sent2 in itertools.product(sentence_tokens, repeat=2):
if sent1 != sent2:
sim_graph.add_edge(sentences.index(' '.join(sent1)), sentences.index(' '.join(sent2)), weight=similarity(sent1, sent2))
scores = nx.pagerank(sim_graph)
sorted_scores = sorted(scores.items(), key=lambda x: x[1], reverse=True)
return [sentences[idx] for idx, score in sorted_scores[:top_n]]
```
以上代码实现了基于 TextRank 算法的文本摘要功能,它将文本分成若干句子,通过计算句子之间的相似度来建立句子之间的图,然后对图进行 PageRank 计算,得到每个句子的重要性分数,并按照分数的降序返回前 top_n 个句子作为摘要。
阅读全文