首页python匹配网页内容

python匹配网页内容

时间: 2023-09-05 08:14:20 浏览: 89

要匹配网页内容，可以使用Python的正则表达式或者BeautifulSoup库。使用正则表达式，可以使用Python自带的re模块，例如： ```python import re import requests url = 'https://www.example.com' response = requests.get(url) html = response.text # 匹配所有的链接 links = re.findall('<a href="(.*?)".*?</a>', html) print(links) # 匹配所有的图片链接 img_links = re.findall('<img.*?src="(.*?)".*?>', html) print(img_links) ``` 使用BeautifulSoup库，可以方便地解析HTML文档，例如： ```python from bs4 import BeautifulSoup import requests url = 'https://www.example.com' response = requests.get(url) html = response.text soup = BeautifulSoup(html, 'html.parser') # 获取所有的链接 links = [a.get('href') for a in soup.find_all('a')] print(links) # 获取所有的图片链接 img_links = [img.get('src') for img in soup.find_all('img')] print(img_links) ``` 以上代码仅供参考，具体的匹配方式需要根据网页内容的格式进行调整。

阅读全文