python爬取已经打开的网页源代码
时间: 2023-12-29 14:05:34 浏览: 210
可以使用Python的requests库和BeautifulSoup库来爬取已经打开的网页源代码。具体步骤如下:
1. 安装requests和BeautifulSoup库
```python
pip install requests
pip install beautifulsoup4
```
2. 使用requests库获取网页源代码
```python
import requests
url = 'https://www.example.com'
response = requests.get(url)
if response.status_code == requests.codes.ok:
html = response.text
print(html)
else:
print('Request failed with status code:', response.status_code)
```
3. 使用BeautifulSoup库解析网页源代码
```python
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
```
现在,你可以使用BeautifulSoup的方法来获取网页内容,例如:
```python
# 获取网页标题
title = soup.title.string
print(title)
# 获取所有链接
links = soup.find_all('a')
for link in links:
print(link.get('href'))
```
这样就可以使用Python爬取已经打开的网页源代码了。
阅读全文