首页怎么利用 Python 的 requests 库和 BeautifulSoup 库进行豆瓣电影数据的爬取

怎么利用 Python 的 requests 库和 BeautifulSoup 库进行豆瓣电影数据的爬取

时间: 2024-04-23 15:25:47 浏览: 146

你可以按照以下步骤使用 Python 的 requests 库和 BeautifulSoup 库进行豆瓣电影数据的爬取： 1. 首先，导入 requests 和 BeautifulSoup 库： ```python import requests from bs4 import BeautifulSoup ``` 2. 然后，使用 requests 库向豆瓣电影网站发送 HTTP 请求，获取网页的 HTML 内容： ```python url = 'https://movie.douban.com/top250' response = requests.get(url) html_content = response.text ``` 3. 接着，使用 BeautifulSoup 库解析 HTML 内容，提取需要的数据： ```python soup = BeautifulSoup(html_content, 'html.parser') movies = soup.find_all('div', {'class': 'info'}) for movie in movies: title = movie.find('span', {'class': 'title'}).text rating = movie.find('span', {'class': 'rating_num'}).text print(title, rating) ``` 以上代码将提取豆瓣电影 Top250 中每部电影的标题和评分并打印出来。需要注意的是，为了避免被豆瓣网站封禁，我们需要在发送请求时模拟浏览器的行为，例如设置请求头部信息： ```python headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'} response = requests.get(url, headers=headers) ``` 另外，为了防止数据丢失和程序出错，我们还需要进行一些异常处理。

阅读全文