python爬虫,豆瓣上面获取排名前列的电影名称、导演和演员、评分、排名、图片存储到film记事本中,电影图片获取后存储到films_ pic文件夹中,使用折线图将电影名评价数及评分可视化
时间: 2024-03-15 18:43:31 浏览: 62
好的,这是一个比较复杂的问题,需要分步骤来实现。下面是我给出的解决方案:
1. 导入需要的库
```
import requests
from bs4 import BeautifulSoup
import os
import matplotlib.pyplot as plt
```
2. 定义一个函数,用于获取电影信息
```
def get_movies():
# 豆瓣电影排行榜url
url = 'https://movie.douban.com/chart'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
movie_list = soup.find('div', class_='article').find_all('div', class_='pl2')
movies = []
for movie in movie_list:
movie_name = movie.find('a').text.strip()
movie_url = movie.find('a')['href']
movie_rating = movie.find('div', class_='star').find_all('span')[1].text.strip()
movie_director = movie.find('p').text.strip().split('\n')[0].split(':')[1].strip()
movie_actors = movie.find('p').text.strip().split('\n')[1].split(':')[1].strip()
movies.append([movie_name, movie_url, movie_rating, movie_director, movie_actors])
return movies
```
3. 定义一个函数,用于保存电影信息和电影图片
```
def save_movies(movies):
if not os.path.exists('films_pic'):
os.mkdir('films_pic')
with open('film.txt', 'w', encoding='utf-8') as f:
for i, movie in enumerate(movies):
f.write('排名:{}\n电影名称:{}\n导演:{}\n演员:{}\n评分:{}\n\n'.format(i+1, movie[0], movie[3], movie[4], movie[2]))
img_url = movie[1]
response = requests.get(img_url)
img_name = 'films_pic/{}_{}.jpg'.format(i+1, movie[0])
with open(img_name, 'wb') as img_file:
img_file.write(response.content)
```
4. 定义一个函数,用于绘制折线图
```
def plot_movies(movies):
x = [movie[0] for movie in movies]
y1 = [float(movie[2]) for movie in movies]
y2 = [movie[0] for movie in movies]
plt.figure(figsize=(10, 5))
plt.plot(x, y1, label='评分')
plt.plot(x, y2, label='评价数')
plt.xticks(rotation=45)
plt.legend()
plt.title('豆瓣电影排行榜')
plt.xlabel('电影名称')
plt.ylabel('评分/评价数')
plt.show()
```
5. 调用函数获取电影信息、保存电影信息和电影图片、绘制折线图
```
movies = get_movies()
save_movies(movies)
plot_movies(movies)
```
以上就是我给出的解决方案,你可以根据需要进行修改和完善。
阅读全文