怎么爬取多页豆瓣前250的电影国家

你可以使用 Python 中的 requests 和 BeautifulSoup 库来爬取豆瓣前250的电影国家。具体步骤如下： 1. 打开豆瓣前250的电影页面，例如 https://movie.douban.com/top250。 2. 在浏览器中按 F12 打开开发者工具，点击 Network 选项卡，刷新页面。 3. 在开发者工具中找到名为 "subject" 的请求，在 Headers 中找到 Request Headers，复制里面的 User-Agent。 4. 在 Python 中使用 requests 库发送请求，并设置请求头中的 User-Agent。 5. 使用 BeautifulSoup 库解析返回的 HTML 页面，提取电影国家信息。 6. 如果该页面有下一页，就重复上述步骤，直到爬取完所有页面为止。以下是示例代码： ```python import requests from bs4 import BeautifulSoup url = 'https://movie.douban.com/top250' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3' } countries = set() while True: # 发送请求 response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, 'html.parser') # 解析页面，提取电影国家信息 items = soup.find_all('div', class_='info') for item in items: country = item.find('p', class_='').text.strip().split('/')[-1].strip() countries.add(country) # 判断是否有下一页 next_page = soup.find('span', class_='next').find('a') if not next_page: break url = urljoin(url, next_page['href']) print(countries) ``` 运行上述代码后，就可以得到所有电影的国家信息。注意：由于这个页面使用 JavaScript 动态加载，如果你直接使用 requests 库发送请求，可能无法获取到完整页面。建议使用 Selenium 或 Splash 等工具模拟浏览器操作，从而获取完整页面。

CSDN会员

开通CSDN年卡参与万元壕礼抽奖

海量 VIP免费资源千本正版电子书商城会员专享价千门课程&专栏

全年可省5,000元立即开通