首页编写一个爬取网页的python爬虫函数

编写一个爬取网页的python爬虫函数

时间: 2024-05-09 10:20:06 浏览: 110

以下是一个简单的爬取网页的Python爬虫函数，使用requests库和BeautifulSoup库： ```python import requests from bs4 import BeautifulSoup def get_web_page(url): # 发送一个GET请求 response = requests.get(url) # 如果请求成功 if response.status_code == requests.codes.ok: # 返回网页内容 return response.text # 否则返回None return None def parse_web_page(web_page): # 使用BeautifulSoup解析网页内容 soup = BeautifulSoup(web_page, 'html.parser') # 获取网页中的所有链接 links = soup.find_all('a') # 遍历所有链接，打印链接地址和链接文本 for link in links: print(link.get('href'), link.text) # 测试 web_page = get_web_page('https://www.baidu.com') if web_page: parse_web_page(web_page) ``` 这个函数会首先发送一个GET请求获取网页内容，然后使用BeautifulSoup库解析网页内容，提取其中的所有链接，并打印链接的地址和文本。

阅读全文