首页python爬取已经打开的网页源代码

python爬取已经打开的网页源代码

时间: 2023-12-29 07:05:34 浏览: 230

可以使用Python的requests库和BeautifulSoup库来爬取已经打开的网页源代码。具体步骤如下： 1. 安装requests和BeautifulSoup库 ```python pip install requests pip install beautifulsoup4 ``` 2. 使用requests库获取网页源代码 ```python import requests url = 'https://www.example.com' response = requests.get(url) if response.status_code == requests.codes.ok: html = response.text print(html) else: print('Request failed with status code:', response.status_code) ``` 3. 使用BeautifulSoup库解析网页源代码 ```python from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') ``` 现在，你可以使用BeautifulSoup的方法来获取网页内容，例如： ```python # 获取网页标题 title = soup.title.string print(title) # 获取所有链接 links = soup.find_all('a') for link in links: print(link.get('href')) ``` 这样就可以使用Python爬取已经打开的网页源代码了。

阅读全文