首页如何使用python爬取网页超链接后下载到本地

如何使用python爬取网页超链接后下载到本地

时间: 2023-06-13 11:05:54 浏览: 136

可以Python中的requests和BeautifulSoup库来实现网页爬取和解析。具体步骤如下： 1. 使用requests库发送GET请求获取网页内容。 ```python import requests url = 'https://www.example.com' response = requests.get(url) html = response.text ``` 2. 使用BeautifulSoup库解析HTML文档，并获取所有超链接。 ```python from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') links = soup.find_all('a') ``` 3. 循环遍历所有超链接，使用requests库下载链接指向的文件。 ```python import os for link in links: href = link.get('href') if href.endswith('.pdf'): filename = href.split('/')[-1] filepath = os.path.join('downloads', filename) response = requests.get(href) with open(filepath, 'wb') as f: f.write(response.content) ``` 上述代码中，我们首先判断链接是否指向PDF文件，然后从链接中提取文件名，并保存到本地。你需要根据自己的需求修改代码中的文件类型、保存路径等参数。

阅读全文