首页在python中用re库爬取到数据存在href中如何把href的数据保存在本地文件中

在python中用re库爬取到数据存在href中如何把href的数据保存在本地文件中

时间: 2024-09-22 21:07:52 浏览: 49

在Python中，你可以使用`re`库进行正则表达式匹配来从网页源码中提取`<a>`标签中的`href`属性值。然后利用`requests`库获取对应链接的内容，并将这些链接及其内容保存到本地文件。以下是简单的步骤： 1. 导入需要的库： ```python import re import requests from bs4 import BeautifulSoup ``` 2. 定义函数来获取并保存链接： ```python def save_links_to_file(url, output_filename): # 使用requests.get()下载网页内容 response = requests.get(url) # 利用BeautifulSoup解析HTML soup = BeautifulSoup(response.text, 'html.parser') # 找到所有的<a>标签 links = soup.find_all('a', href=True) # 提取href属性值，并存储到列表中 hrefs = [link['href'] for link in links] # 将链接写入文件 with open(output_filename, 'w', encoding='utf-8') as f: for href in hrefs: f.write(f'{href}\n') # 调用函数，提供目标URL和输出文件名 save_links_to_file('http://example.com', 'output.txt') ``` 这个函数会将所有`<a>`标签的`href`属性值追加到名为`output.txt`的文本文件中。

阅读全文