首页python实现textrank代码

python实现textrank代码

时间: 2023-09-30 17:09:05 浏览: 139

好的，下面是 Python 实现的 TextRank 代码： ``` python import itertools import networkx as nx from nltk.tokenize import sent_tokenize, word_tokenize from nltk.corpus import stopwords from string import punctuation def preprocess(text): stop_words = set(stopwords.words('english') + list(punctuation)) words = [word.lower() for word in word_tokenize(text)] return [word for word in words if word not in stop_words] def similarity(sent1, sent2): words1 = set(sent1) words2 = set(sent2) return len(words1 & words2) / (len(words1) + len(words2)) def textrank(text, top_n): sentences = sent_tokenize(text) sentence_tokens = [preprocess(sent) for sent in sentences] sim_graph = nx.Graph() for sent1, sent2 in itertools.product(sentence_tokens, repeat=2): if sent1 != sent2: sim_graph.add_edge(sentences.index(' '.join(sent1)), sentences.index(' '.join(sent2)), weight=similarity(sent1, sent2)) scores = nx.pagerank(sim_graph) sorted_scores = sorted(scores.items(), key=lambda x: x[1], reverse=True) return [sentences[idx] for idx, score in sorted_scores[:top_n]] ``` 以上代码实现了基于 TextRank 算法的文本摘要功能，它将文本分成若干句子，通过计算句子之间的相似度来建立句子之间的图，然后对图进行 PageRank 计算，得到每个句子的重要性分数，并按照分数的降序返回前 top_n 个句子作为摘要。

阅读全文