Python爬虫代码豆瓣
时间: 2023-11-19 15:16:57 浏览: 102
python豆瓣电影爬虫
以下是一个简单的 Python 爬虫代码,用于从豆瓣网站上获取电影的信息:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://movie.douban.com/top250'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
movie_list = []
for movie in soup.find_all('div', class_='hd'):
title = movie.a.span.text.strip()
link = movie.a['href']
movie_list.append({'title': title, 'link': link})
for movie in movie_list:
response = requests.get(movie['link'])
soup = BeautifulSoup(response.text, 'html.parser')
info = soup.find('div', id='info')
directors = [d.text for d in info.find_all('a', rel='v:directedBy')]
cast = [c.text for c in info.find_all('a', rel='v:starring')]
genres = [g.text for g in info.find_all('span', property='v:genre')]
date = info.find('span', property='v:initialReleaseDate').text
movie.update({'directors': directors, 'cast': cast, 'genres': genres, 'date': date})
print(movie_list)
```
这个爬虫代码首先从豆瓣电影 TOP250 的页面获取每个电影的标题和链接,然后逐个访问每个电影的链接,获取电影的导演、演员、类型和上映日期等信息。最后将所有电影的信息存储在一个列表中,并打印输出。
阅读全文