python爬虫并数据可视化代码复杂
时间: 2023-09-04 11:16:45 浏览: 112
好的,以下是一个稍微复杂一些的Python爬虫和数据可视化的代码示例:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud
from PIL import Image
import numpy as np
# 爬取数据
url = 'https://www.example.com'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
data = []
for item in soup.select('some-selector'):
data.append(item.text)
# 数据处理
df = pd.DataFrame(data, columns=['column_name'])
df['column_name'] = df['column_name'].apply(lambda x: x.lower()) # 将文本转换为小写字母
df['word_count'] = df['column_name'].apply(lambda x: len(x.split())) # 计算每个单词出现的次数
df['word_length'] = df['column_name'].apply(lambda x: len(x)) # 计算每个单词的长度
# 数据可视化
# 绘制单词云图
wordcloud = WordCloud(background_color='white', max_words=200, max_font_size=40, scale=3, random_state=1).generate(' '.join(df['column_name']))
plt.figure(figsize=(12,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
# 绘制词频分布图
plt.hist(df['word_count'], bins=50)
plt.title('Word Count Distribution')
plt.xlabel('Word Count')
plt.ylabel('Frequency')
plt.show()
# 绘制单词长度分布图
plt.hist(df['word_length'], bins=50)
plt.title('Word Length Distribution')
plt.xlabel('Word Length')
plt.ylabel('Frequency')
plt.show()
# 绘制单词长度和词频的散点图
plt.scatter(df['word_length'], df['word_count'])
plt.title('Word Length vs. Word Count')
plt.xlabel('Word Length')
plt.ylabel('Word Count')
plt.show()
# 绘制单词云图的形状
mask = np.array(Image.open('mask.png'))
wordcloud = WordCloud(background_color='white', max_words=200, max_font_size=40, scale=3, random_state=1, mask=mask).generate(' '.join(df['column_name']))
plt.figure(figsize=(12,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
```
请注意,这只是一个稍微复杂一些的示例,实际应用中可能需要更复杂的数据处理和可视化技术,还需要根据具体的数据情况进行相应的调整和优化。
阅读全文