爬取周杰伦花海评论数据代码 并作出可视化分析
时间: 2023-10-01 10:09:44 浏览: 119
基于Python爬虫技术对歌曲评论数据可视化分析
以下是爬取周杰伦花海评论数据的Python代码:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://movie.douban.com/subject/33420285/comments?start={}&limit=20&sort=new_score&status=P'
comments_list = []
for i in range(0, 200, 20):
res = requests.get(url.format(i))
soup = BeautifulSoup(res.text, 'html.parser')
comments = soup.select('.comment-item .comment-content span')
for comment in comments:
comments_list.append(comment.text.strip())
df = pd.DataFrame(comments_list, columns=['comments'])
df.to_csv('jay_comments.csv', index=False)
```
这段代码会爬取豆瓣电影《周杰伦的告白》的短评数据,并保存为CSV文件。
接下来,我们可以使用Python中的Matplotlib库对这些数据进行可视化分析。以下是代码示例:
```python
import pandas as pd
import jieba
import jieba.analyse
import matplotlib.pyplot as plt
df = pd.read_csv('jay_comments.csv')
comments = df['comments'].tolist()
# 使用jieba库对评论进行分词和关键词提取
words = []
for comment in comments:
seg_list = jieba.cut(comment, cut_all=False)
words.extend(seg_list)
keywords = jieba.analyse.extract_tags(' '.join(words), topK=20, withWeight=True)
# 绘制词云图
plt.figure(figsize=(10, 6))
for keyword in keywords:
plt.text(keyword[1], keyword[0], keyword[0], fontsize=keyword[1]*30)
plt.axis('off')
plt.show()
# 绘制情感分布图
sentiments = []
for comment in comments:
if '好看' in comment:
sentiments.append('positive')
elif '难看' in comment:
sentiments.append('negative')
else:
sentiments.append('neutral')
sentiments_counts = pd.Series(sentiments).value_counts()
plt.pie(sentiments_counts, labels=sentiments_counts.index, autopct='%1.1f%%')
plt.title('Sentiment Distribution')
plt.show()
```
这段代码会生成两张图表,一张是词云图,另一张是情感分布图。你可以根据需要自行修改代码和图表样式。
阅读全文