写出python requests库爬取豆瓣电影和电视剧的信息,信息包含名称,别名,评分,导演,演员,编剧,上映/播出时间,类型,评分,观看人数,并以CSV文件形式保存到本地的代码
时间: 2024-06-09 07:05:24 浏览: 9
以下是爬取豆瓣电影和电视剧信息的Python代码,使用了requests库和BeautifulSoup库:
```python
import requests
from bs4 import BeautifulSoup
import csv
# 爬取电影信息
movie_url = 'https://movie.douban.com/top250'
movie_headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
movie_response = requests.get(movie_url, headers=movie_headers)
movie_soup = BeautifulSoup(movie_response.text, 'html.parser')
# 爬取电视剧信息
tv_url = 'https://movie.douban.com/tv/'
tv_headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
tv_response = requests.get(tv_url, headers=tv_headers)
tv_soup = BeautifulSoup(tv_response.text, 'html.parser')
# 输出文件路径
file_path = 'douban.csv'
# 创建csv文件并写入表头
with open(file_path, 'w', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f)
writer.writerow(['名称', '别名', '评分', '导演', '演员', '编剧', '上映/播出时间', '类型', '评分', '观看人数'])
# 爬取电影信息并写入csv文件
with open(file_path, 'a', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f)
for item in movie_soup.find_all('div', class_='info'):
name = item.find('span', class_='title').get_text()
alias = item.find('span', class_='other').get_text().strip(' /')
rating_num = item.find('span', class_='rating_num').get_text()
director = item.find('p').get_text().split('\n')[1].strip('导演: ').replace(' ', '/')
actor = item.find('p').get_text().split('\n')[2].strip('主演: ').replace(' ', '/')
scriptwriter = item.find('p').get_text().split('\n')[3].strip('编剧: ').replace(' ', '/')
time = item.find('p', class_='').get_text().split('\n')[1].strip('上映日期: ').replace(' ', '/')
genre = item.find('p', class_='').get_text().split('\n')[2].strip().replace(' ', '/')
rating_per = item.find('div', class_='star').find_all('span')[3].get_text().strip('%')
watch_num = item.find('div', class_='star').find_all('span')[4].get_text().strip('人评价')
writer.writerow([name, alias, rating_num, director, actor, scriptwriter, time, genre, rating_per, watch_num])
# 爬取电视剧信息并写入csv文件
with open(file_path, 'a', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f)
for item in tv_soup.find_all('div', class_='item')[:-1]:
name = item.find('a', class_='title').get_text()
alias = item.find('span', class_='other').get_text().strip(' /')
rating_num = item.find('span', class_='rating_num').get_text()
director = item.find('p', class_='').get_text().split('\n')[1].strip('导演: ').replace(' ', '/')
actor = item.find('p', class_='').get_text().split('\n')[2].strip('主演: ').replace(' ', '/')
scriptwriter = item.find('p', class_='').get_text().split('\n')[3].strip('编剧: ').replace(' ', '/')
time = item.find('p', class_='').get_text().split('\n')[4].strip('首播: ').replace(' ', '/')
genre = item.find('p', class_='').get_text().split('\n')[5].strip().replace(' ', '/')
rating_per = item.find('div', class_='star').find_all('span')[3].get_text().strip('%')
watch_num = item.find('div', class_='star').find_all('span')[4].get_text().strip('人评价')
writer.writerow([name, alias, rating_num, director, actor, scriptwriter, time, genre, rating_per, watch_num])
print('爬取完成!')
```
运行以上代码后,会在当前目录下生成一个名为`douban.csv`的文件,里面包含了爬取到的电影和电视剧信息。