写出python requests库爬取豆瓣电影和电视剧的信息,信息包含名称,评分,导演,演员,编剧,上映/播出时间,类型,评分,观看人数,并以CSV文件形式保存到本地的代码
时间: 2024-06-09 13:05:25 浏览: 185
下面是爬取电影信息的代码:
```python
import requests
import csv
url = 'https://movie.douban.com/j/search_subjects'
params = {
'type': 'movie',
'tag': '热门',
'page_limit': 50,
'page_start': 0
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, params=params, headers=headers)
data = response.json()['subjects']
filename = 'movie.csv'
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['名称', '评分', '导演', '演员', '编剧', '上映时间', '类型', '评价人数'])
for item in data:
url = item['url']
response = requests.get(url, headers=headers)
html = response.text
name = item['title']
score = item['rate']
director = ''
actor = ''
writer = ''
release_date = ''
genre = ''
rating_count = ''
# 解析电影详情页面
lines = html.split('\n')
for line in lines:
if '导演' in line:
director = line.split(':')[1].strip()
elif '主演' in line:
actor = line.split(':')[1].strip()
elif '编剧' in line:
writer = line.split(':')[1].strip()
elif '上映日期' in line:
release_date = line.split(':')[1].strip()
elif '类型' in line:
genre = line.split(':')[1].strip()
elif '评价人数' in line:
rating_count = line.split(':')[1].strip()
writer.writerow([name, score, director, actor, writer, release_date, genre, rating_count])
```
下面是爬取电视剧信息的代码:
```python
import requests
import csv
url = 'https://movie.douban.com/j/search_subjects'
params = {
'type': 'tv',
'tag': '热门',
'page_limit': 50,
'page_start': 0
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, params=params, headers=headers)
data = response.json()['subjects']
filename = 'tv.csv'
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['名称', '评分', '导演', '演员', '编剧', '播出时间', '类型', '评价人数'])
for item in data:
url = item['url']
response = requests.get(url, headers=headers)
html = response.text
name = item['title']
score = item['rate']
director = ''
actor = ''
writer = ''
air_date = ''
genre = ''
rating_count = ''
# 解析电视剧详情页面
lines = html.split('\n')
for line in lines:
if '导演' in line:
director = line.split(':')[1].strip()
elif '主演' in line:
actor = line.split(':')[1].strip()
elif '编剧' in line:
writer = line.split(':')[1].strip()
elif '首播' in line:
air_date = line.split(':')[1].strip()
elif '类型' in line:
genre = line.split(':')[1].strip()
elif '评价人数' in line:
rating_count = line.split(':')[1].strip()
writer.writerow([name, score, director, actor, writer, air_date, genre, rating_count])
```
阅读全文