首页python爬取网页超链接

python爬取网页超链接

时间: 2024-09-17 18:03:41 浏览: 71

在Python中，你可以使用一些库如`requests`和`BeautifulSoup`来爬取网页并获取超链接。以下是一个简单的步骤： 1. **安装所需库**：首先确保已安装`requests`库，如果没有，可以使用pip安装：`pip install requests` 对于解析HTML，还需要`beautifulsoup4`库：`pip install beautifulsoup4` 2. **发送HTTP请求**：使用`requests.get(url)`函数向目标网站发送GET请求，获取HTML内容。 ```python import requests response = requests.get('http://example.com') ``` 3. **解析HTML**：将响应的HTML内容传递给`BeautifulSoup`，它能帮助解析复杂的HTML结构。 ```python from bs4 import BeautifulSoup soup = BeautifulSoup(response.text, 'html.parser') ``` 4. **提取超链接**：使用`find_all('a')`或`.select('a')`（取决于BeautifulSoup版本）找到所有的`<a>`标签，它们通常代表超链接。 ```python links = soup.find_all('a', href=True) hyperlinks = [link['href'] for link in links] ``` 完整的示例： ```python import requests from bs4 import BeautifulSoup url = 'http://example.com' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') # 获取所有带href属性的<a>标签 all_links = soup.select('a[href]') for link in all_links: print(link['href'])

阅读全文