首页python爬取豆瓣电影

python爬取豆瓣电影

时间: 2023-09-30 21:12:08 浏览: 76

要爬取豆瓣电影，需要使用 Python 的 requests 库和 BeautifulSoup 库。以下是一个简单的爬取电影列表的示例代码： ```python import requests from bs4 import BeautifulSoup url = 'https://movie.douban.com/top250' headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, 'html.parser') movies = soup.find_all('div', class_='hd') for movie in movies: print(movie.a.span.text) ``` 代码解析： 1. 首先，我们定义了需要爬取的 URL 和请求头（headers）。 2. 然后，我们使用 requests 库发送 GET 请求，并使用 BeautifulSoup 库解析 HTML 页面。 3. 通过查看豆瓣电影网页的 HTML 结构，我们发现所有电影的信息都在 class 为 "hd" 的 div 标签中。 4. 我们使用 soup.find_all() 方法找到所有 class 为 "hd" 的 div 标签，并遍历输出每部电影的名称。注意：在实际爬取中，豆瓣电影网站可能会对反复请求同一页面的 IP 进行封锁，因此需要适当地设置请求间隔和代理 IP。

阅读全文