如何用python对文章中的名词、动词和形容词分别做词频统计并绘制词云图。
时间: 2024-01-22 07:18:22 浏览: 144
要用Python进行词频统计和绘制词云图,可以使用Python的Natural Language Toolkit(NLTK)和WordCloud库。以下是一些基本的步骤:
1. 安装NLTK和WordCloud库:
```python
!pip install nltk
!pip install wordcloud
```
2. 导入所需的库:
```python
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist
from wordcloud import WordCloud
import matplotlib.pyplot as plt
```
3. 加载文章并进行预处理:
```python
# 加载文章
with open('article.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 将文章转换为小写并进行分词
tokens = word_tokenize(text.lower())
# 去除停用词
stop_words = set(stopwords.words('english'))
filtered_tokens = [token for token in tokens if token not in stop_words]
# 选择名词、动词和形容词
pos_tags = nltk.pos_tag(filtered_tokens)
selected_tags = ['NN', 'NNS', 'NNP', 'NNPS', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'JJ', 'JJR', 'JJS']
selected_tokens = [word for word, tag in pos_tags if tag in selected_tags]
```
4. 进行词频统计并绘制词云图:
```python
# 统计词频
fdist = FreqDist(selected_tokens)
# 绘制词云图
wordcloud = WordCloud(width=800, height=800, background_color='white', max_words=200).generate_from_frequencies(fdist)
plt.figure(figsize=(8, 8), facecolor=None)
plt.imshow(wordcloud)
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()
```
完整代码如下:
```python
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# 加载文章
with open('article.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 将文章转换为小写并进行分词
tokens = word_tokenize(text.lower())
# 去除停用词
stop_words = set(stopwords.words('english'))
filtered_tokens = [token for token in tokens if token not in stop_words]
# 选择名词、动词和形容词
pos_tags = nltk.pos_tag(filtered_tokens)
selected_tags = ['NN', 'NNS', 'NNP', 'NNPS', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'JJ', 'JJR', 'JJS']
selected_tokens = [word for word, tag in pos_tags if tag in selected_tags]
# 统计词频
fdist = FreqDist(selected_tokens)
# 绘制词云图
wordcloud = WordCloud(width=800, height=800, background_color='white', max_words=200).generate_from_frequencies(fdist)
plt.figure(figsize=(8, 8), facecolor=None)
plt.imshow(wordcloud)
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()
```
阅读全文