用python爬取豆瓣电影top250,获取数据URL,解析数据,导演年份,电影年份,上映时间,评论人数等
时间: 2024-02-14 19:17:47 浏览: 17
好的,您可以参考下面的代码实现:
```python
import requests
from bs4 import BeautifulSoup
def get_movie_info(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
movie_info = {}
# 解析电影名称和导演
title = soup.find('span', {'property': 'v:itemreviewed'}).text
directors = [d.text for d in soup.find_all('a', {'rel': 'v:directedBy'})]
movie_info['title'] = title
movie_info['directors'] = directors
# 解析年份
year = soup.find('span', {'class': 'year'}).text
movie_info['year'] = year
# 解析上映时间
release_date = soup.find('span', {'property': 'v:initialReleaseDate'}).text
movie_info['release_date'] = release_date
# 解析评论人数
rating_num = soup.find('span', {'property': 'v:votes'}).text
movie_info['rating_num'] = rating_num
return movie_info
if __name__ == '__main__':
for i in range(10):
url = 'https://movie.douban.com/top250?start={}&filter='.format(i * 25)
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
movie_list = soup.find_all('div', {'class': 'hd'})
for movie in movie_list:
movie_url = movie.a['href']
movie_info = get_movie_info(movie_url)
print(movie_info)
```
这段代码会从豆瓣电影Top250的前10页中获取每部电影的URL,并解析出电影名称、导演、年份、上映时间和评论人数等信息。您可以根据自己的需求进行修改和扩展。