python爬取豆瓣top250电影信息并保存 在做一个电影类型分析并保存 的代码
时间: 2023-06-26 15:09:15 浏览: 97
好的,以下是Python爬取豆瓣Top250电影信息并保存的代码:
```python
import requests
from bs4 import BeautifulSoup
import csv
# 爬取豆瓣 Top250 电影信息的函数
def get_top250_movies():
url = 'https://movie.douban.com/top250'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
movie_info_list = []
for i in range(0, 250, 25):
params = {'start': str(i)}
res = requests.get(url, headers=headers, params=params)
soup = BeautifulSoup(res.text, 'html.parser')
movie_list = soup.find_all('div', class_='info')
for movie in movie_list:
title = movie.find('span', class_='title').text
info = movie.find('div', class_='bd').p.text.strip().split('\n')
director = info[0][4:]
actors = info[1][3:]
year = info[1][-5:-1]
rating = movie.find('span', class_='rating_num').text
comment_num = movie.find('div', class_='star').find_all('span')[3].text[:-3]
movie_info_list.append([title, director, actors, year, rating, comment_num])
return movie_info_list
# 保存电影信息到csv文件中的函数
def save_to_csv(movie_info_list):
with open('movie_info.csv', 'w', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f)
writer.writerow(['电影名称', '导演', '演员', '上映年份', '评分', '评论数'])
for movie_info in movie_info_list:
writer.writerow(movie_info)
if __name__ == '__main__':
movie_info_list = get_top250_movies()
save_to_csv(movie_info_list)
print('电影信息保存成功!')
```
以上代码会爬取豆瓣Top250电影的电影名称、导演、演员、上映年份、评分和评论数,并将这些信息保存到名为`movie_info.csv`的CSV文件中。
接下来,可以在这个CSV文件的基础上进行电影类型分析,例如统计每个类型的电影数量、计算每个类型电影的平均评分等等。
阅读全文