python爬取做成词云图
时间: 2023-08-20 10:06:57 浏览: 58
好的,我可以帮你解答。首先,你需要用Python编写爬虫程序来爬取网页上的文本数据。然后,你需要使用Python的第三方库WordCloud来生成词云图。
下面是一个简单的示例代码,它可以从一个网页上爬取文本数据,并生成词云图:
```python
import requests
from bs4 import BeautifulSoup
from wordcloud import WordCloud
# 从网页上爬取文本数据
url = 'https://www.example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
text = soup.get_text()
# 生成词云图
wordcloud = WordCloud(width=800, height=800, background_color='white').generate(text)
wordcloud.to_file('wordcloud.png')
```
在这个示例代码中,我们使用requests库来获取网页的HTML代码,然后使用BeautifulSoup库来解析HTML代码,提取出文本数据。接下来,我们使用WordCloud库来生成词云图,将词云图保存为PNG格式的图片文件。
当然,这只是一个简单的示例代码,你还需要根据你自己的需求来完善爬虫程序和词云图的生成过程。
相关问题
python爬取豆瓣电影词云图
以下是Python爬取豆瓣电影词云图的步骤:
1.导入需要的库和模块
```python
import requests
from bs4 import BeautifulSoup
import jieba
from wordcloud import WordCloud, ImageColorGenerator
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
```
2.获取网页源代码并解析
```python
url = 'https://movie.douban.com/subject/26363254/comments?status=P'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
```
3.获取评论内容并进行分词
```python
comments = soup.find_all('span', class_='short')
comment_text = ''
for comment in comments:
comment_text += comment.text
words = jieba.cut(comment_text)
```
4.统计词频并生成词云图
```python
word_counts = {}
for word in words:
if len(word) == 1:
continue
else:
word_counts[word] = word_counts.get(word, 0) + 1
wordcloud = WordCloud(font_path='msyh.ttc', background_color='white', max_words=200, max_font_size=100, width=800, height=600)
wordcloud.generate_from_frequencies(word_counts)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
```
5.生成带有图片的词云图
```python
mask = np.array(Image.open('movie.png'))
image_colors = ImageColorGenerator(mask)
wordcloud = WordCloud(font_path='msyh.ttc', background_color='white', max_words=200, max_font_size=100, width=800, height=600, mask=mask)
wordcloud.generate_from_frequencies(word_counts)
plt.imshow(wordcloud.recolor(color_func=image_colors), interpolation='bilinear')
plt.axis('off')
plt.show()
```
python爬取战狼2词云图
好的,以下是一个简单的Python爬虫获取《战狼2》剧本并生成词云图的示例代码:
```python
import requests
from bs4 import BeautifulSoup
from wordcloud import WordCloud
import jieba
# 获取电影《战狼2》的剧本
url = 'https://www.zybuluo.com/cmd/'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
text = soup.get_text()
# 使用 jieba 分词
words = jieba.cut(text)
# 生成词云图
wordcloud = WordCloud(font_path='simhei.ttf', background_color='white', width=800, height=600).generate(' '.join(words))
wordcloud.to_file('zhanlang2_wordcloud.png')
```
这段代码会爬取电影《战狼2》的剧本,然后使用 jieba 分词,最后生成词云图并保存为 PNG 格式图片。你可以根据自己的需求修改代码。