编写对《natural language processing with python》这本书中的第一章节内容进行单词查找和分析,分别找出词形相近的单词的代码。
时间: 2024-05-05 22:14:55 浏览: 99
用python查找单词
很高兴为你服务!以下是代码:
```python
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
lemmatizer = WordNetLemmatizer()
# 这里假设已经将文本读入到了变量 text 中
# 分句
sentences = nltk.sent_tokenize(text)
# 停用词列表
stop_words = set(stopwords.words('english'))
# 遍历每个句子并进行单词查找和分析
for sentence in sentences:
# 分词
words = nltk.word_tokenize(sentence)
# 对于每个单词进行词形还原
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
# 找出每个单词的词性
tagged_words = nltk.pos_tag(words)
# 找出名词并过滤停用词
nouns = [word for word, pos in tagged_words if pos.startswith('N') and word.lower() not in stop_words]
# 找出动词并过滤停用词
verbs = [word for word, pos in tagged_words if pos.startswith('V') and word.lower() not in stop_words]
# 找出形容词并过滤停用词
adjectives = [word for word, pos in tagged_words if pos.startswith('J') and word.lower() not in stop_words]
# 输出每个单词及其词形还原后的形式
for word, lemmatized_word in zip(words, lemmatized_words):
print(f'{word} -> {lemmatized_word}')
# 输出名词、动词和形容词列表
print(f'Nouns: {nouns}')
print(f'Verbs: {verbs}')
print(f'Adjectives: {adjectives}')
```
这段代码会对文本进行分句、分词、词形还原和词性标注等操作,然后找出名词、动词和形容词,并输出每个单词及其词形还原后的形式。
阅读全文