Python爬取豆瓣top100信息和电影图片并做出可视化数据,每行代码都要注释,并写出程序设计思路,代码完整可运行
时间: 2023-06-24 08:07:18 浏览: 94
Python课程设计-豆瓣读书TOP250的数据爬取及可视化设计源码.zip
5星 · 资源好评率100%
程序设计思路:
1. 导入所需库:requests,BeautifulSoup,pandas,matplotlib,os
2. 构造请求头和请求链接,模拟浏览器访问,获取网页信息
3. 使用BeautifulSoup对网页信息进行解析,从中提取需要的电影信息和电影图片链接
4. 构造文件夹用于存储电影图片
5. 使用requests库下载电影图片到指定文件夹
6. 使用pandas库将电影信息存储到CSV文件中
7. 使用matplotlib库将电影评分可视化为柱状图
代码实现:
```python
# 导入所需库
import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
import os
# 构造请求头和请求链接
url = 'https://movie.douban.com/top250'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
# 模拟浏览器访问,获取网页信息
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
# 提取电影信息和电影图片链接
movies = soup.find_all('div', class_='info')
movie_list = []
for movie in movies:
title = movie.find('span', class_='title').text.strip()
actors = movie.find('div', class_='bd').find('p').text.strip()
rating = movie.find('span', class_='rating_num').text.strip()
quote = movie.find('span', class_='inq').text.strip()
img_url = movie.parent.find('a').find('img')['src']
movie_dict = {'title': title, 'actors': actors, 'rating': rating, 'quote': quote, 'img_url': img_url}
movie_list.append(movie_dict)
# 构造文件夹用于存储电影图片
if not os.path.exists('movie_images'):
os.mkdir('movie_images')
# 下载电影图片到指定文件夹
for movie in movie_list:
img_name = movie['title'] + '.jpg'
img_path = os.path.join('movie_images', img_name)
img_url = movie['img_url']
response = requests.get(img_url, headers=headers)
with open(img_path, 'wb') as f:
f.write(response.content)
# 将电影信息存储到CSV文件中
df = pd.DataFrame(movie_list)
df.to_csv('movie_top250.csv', index=False, encoding='utf-8-sig')
# 将电影评分可视化为柱状图
plt.figure(figsize=(15, 6))
plt.bar(df['title'], df['rating'])
plt.xticks(rotation=45, ha='right')
plt.title('豆瓣电影Top250评分排行榜')
plt.xlabel('电影名称')
plt.ylabel('电影评分')
plt.show()
```
注释:
1. 导入所需库
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
import os
```
2. 构造请求头和请求链接,模拟浏览器访问,获取网页信息
```python
url = 'https://movie.douban.com/top250'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
```
3. 提取电影信息和电影图片链接
```python
movies = soup.find_all('div', class_='info')
movie_list = []
for movie in movies:
title = movie.find('span', class_='title').text.strip()
actors = movie.find('div', class_='bd').find('p').text.strip()
rating = movie.find('span', class_='rating_num').text.strip()
quote = movie.find('span', class_='inq').text.strip()
img_url = movie.parent.find('a').find('img')['src']
movie_dict = {'title': title, 'actors': actors, 'rating': rating, 'quote': quote, 'img_url': img_url}
movie_list.append(movie_dict)
```
4. 构造文件夹用于存储电影图片
```python
if not os.path.exists('movie_images'):
os.mkdir('movie_images')
```
5. 下载电影图片到指定文件夹
```python
for movie in movie_list:
img_name = movie['title'] + '.jpg'
img_path = os.path.join('movie_images', img_name)
img_url = movie['img_url']
response = requests.get(img_url, headers=headers)
with open(img_path, 'wb') as f:
f.write(response.content)
```
6. 将电影信息存储到CSV文件中
```python
df = pd.DataFrame(movie_list)
df.to_csv('movie_top250.csv', index=False, encoding='utf-8-sig')
```
7. 将电影评分可视化为柱状图
```python
plt.figure(figsize=(15, 6))
plt.bar(df['title'], df['rating'])
plt.xticks(rotation=45, ha='right')
plt.title('豆瓣电影Top250评分排行榜')
plt.xlabel('电影名称')
plt.ylabel('电影评分')
plt.show()
```
阅读全文