用python写一段代码,爬取豆瓣电影信息,进行简单的数据分析并可视化展示
时间: 2024-02-19 07:00:43 浏览: 78
一个python 豆瓣电影的爬虫
5星 · 资源好评率100%
下面是一个使用Python爬取豆瓣电影信息、进行简单数据分析和可视化展示的示例代码:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
# 爬取电影信息
def get_movie_info():
url = 'https://movie.douban.com/top250'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
movie_info = []
for i in range(0, 250, 25):
params = {
'start': str(i),
'filter': ''
}
response = requests.get(url, headers=headers, params=params)
soup = BeautifulSoup(response.text, 'html.parser')
movies = soup.find_all('div', class_='item')
for movie in movies:
name = movie.find('span', class_='title').text
rating_num = movie.find('span', class_='rating_num').text
comment_num = movie.find('div', class_='star').find_all('span')[-1].text[:-3]
movie_info.append((name, float(rating_num), int(comment_num)))
return movie_info
# 进行数据分析
def data_analysis(movie_info):
df = pd.DataFrame(movie_info, columns=['name', 'rating_num', 'comment_num'])
df['rating_num'].plot(kind='hist', bins=20)
plt.title('Rating Distribution')
plt.xlabel('Rating')
plt.ylabel('Frequency')
plt.show()
df.plot(kind='scatter', x='rating_num', y='comment_num')
plt.title('Rating vs. Comment')
plt.xlabel('Rating')
plt.ylabel('Comment')
plt.show()
if __name__ == '__main__':
movie_info = get_movie_info()
data_analysis(movie_info)
```
该代码使用requests和BeautifulSoup库爬取豆瓣电影Top250的电影信息,包括电影名称、评分和评论数,并将其存储在一个列表中。随后使用pandas和matplotlib库进行数据分析和可视化展示,分别展示了电影评分的分布和评分与评论数之间的关系。
阅读全文