首页python爬取二级页面

python爬取二级页面

时间: 2023-09-25 15:12:36 浏览: 120

要爬取二级页面，通常需要使用 Python 爬虫框架（如 Scrapy 或 Beautiful Soup）以及 HTTP 请求库（如 requests）。以下是一个简单的示例代码，以爬取某网站的二级页面为例： ```python import requests from bs4 import BeautifulSoup # 发送 HTTP 请求并获取网页内容 url = 'https://example.com/page1' response = requests.get(url) html = response.content # 解析 HTML 内容，找到二级页面链接 soup = BeautifulSoup(html, 'html.parser') link = soup.find('a', {'class': 'link-to-page2'})['href'] # 发送 HTTP 请求并获取二级页面内容 response = requests.get(link) html = response.content # 解析 HTML 内容，提取需要的信息 soup = BeautifulSoup(html, 'html.parser') # TODO: 提取信息的代码 ``` 在这个示例代码中，我们首先发送 HTTP 请求并获取网页内容，然后使用 Beautiful Soup 解析 HTML 内容，找到二级页面链接并发送 HTTP 请求并获取二级页面内容，最后再次使用 Beautiful Soup 解析 HTML 内容，提取需要的信息。需要注意的是，具体的解析 HTML 内容和提取信息的代码需要根据实际情况进行编写。

阅读全文