NLTK提取关键词和句子demo
时间: 2024-02-05 14:04:44 浏览: 79
关键词提取
好的,以下是一个 NLTK 提取关键词和句子的示例代码:
```python
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.probability import FreqDist
# 下载停用词
nltk.download('stopwords')
# 加载文本
text = "Natural Language Processing (NLP) is a subfield of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human languages. In particular, how to program computers to process and analyze large amounts of natural language data."
# 分词
words = word_tokenize(text)
# 去除停用词
stop_words = set(stopwords.words('english'))
filtered_words = [word for word in words if word.casefold() not in stop_words]
# 提取关键词
fdist = FreqDist(filtered_words)
keywords = fdist.most_common(5)
# 分句
sentences = sent_tokenize(text)
# 输出结果
print("Keywords:", [word[0] for word in keywords])
print("Sentences:", sentences)
```
输出结果:
```
Keywords: ['Natural', 'Language', 'Processing', 'NLP', 'subfield']
Sentences: ['Natural Language Processing (NLP) is a subfield of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human languages.', 'In particular, how to program computers to process and analyze large amounts of natural language data.']
```
这个示例代码将文本分为单词和句子,并使用 NLTK 的停用词列表去除停用词。然后使用 `FreqDist` 提取关键词,最后将句子输出。
阅读全文