nltk分析提取关键词
时间: 2023-10-24 16:05:55 浏览: 130
关键词提取
好的,您可以参考以下代码使用NLTK进行分析和提取关键词:
首先,您需要下载NLTK和其相关的数据包:
```python
import nltk
nltk.download('punkt')
nltk.download('stopwords')
```
然后,您可以使用NLTK的`word_tokenize()`函数对文本进行分词:
```python
from nltk.tokenize import word_tokenize
text = "Hello, how are you doing today?"
tokens = word_tokenize(text)
print(tokens)
```
输出:
```
['Hello', ',', 'how', 'are', 'you', 'doing', 'today', '?']
```
接着,您可以使用NLTK的`PorterStemmer`或`LancasterStemmer`对分词结果进行词干提取:
```python
from nltk.stem import PorterStemmer, LancasterStemmer
ps = PorterStemmer()
ls = LancasterStemmer()
words = ['running', 'ran', 'runner']
for word in words:
print(ps.stem(word), ls.stem(word))
```
输出:
```
run run
ran ran
runner run
```
最后,您可以使用NLTK的`stopwords`包去除一些常见的停用词:
```python
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
text = "This is a sample sentence, showing off the stop words filtration."
tokens = word_tokenize(text)
filtered_tokens = []
for token in tokens:
if token.lower() not in stop_words:
filtered_tokens.append(token)
print(filtered_tokens)
```
输出:
```
['sample', 'sentence', ',', 'showing', 'stop', 'words', 'filtration', '.']
```
希望以上代码可以对您有所帮助!
阅读全文