用python写一段豆瓣电影的爬虫
时间: 2023-07-12 07:53:26 浏览: 43
好的,以下是一个简单的Python爬虫程序,用于从豆瓣电影中获取电影的名称、评分、导演、演员和电影海报等信息:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://movie.douban.com/top250'
def get_movie_info(movie_url):
res = requests.get(movie_url)
soup = BeautifulSoup(res.text, 'html.parser')
name = soup.select_one('h1 > span').text
score = soup.select_one('.rating_num').text
director = soup.select('.attrs > a')[0].text
actors = [actor.text for actor in soup.select('.actor > .attrs > a')]
poster = soup.select_one('.subject > a > img')['src']
return {'name': name, 'score': score, 'director': director, 'actors': actors, 'poster': poster}
def main():
movie_list = []
for i in range(10):
res = requests.get(url, params={'start': i*25})
soup = BeautifulSoup(res.text, 'html.parser')
for movie in soup.select('.grid_view > li'):
movie_url = movie.select_one('.info > div > a')['href']
movie_list.append(get_movie_info(movie_url))
print(movie_list)
if __name__ == '__main__':
main()
```
这个程序通过循环遍历豆瓣电影Top250的每一页,获取每部电影的链接,并使用 `get_movie_info()` 函数获取电影的相关信息。最终将所有电影信息存储在一个列表中。