首页将C:/Users/test/Desktop/DIDItest文件夹下多个文件夹内的html文件一个个打开并提取源代码

将C:/Users/test/Desktop/DIDItest文件夹下多个文件夹内的html文件一个个打开并提取源代码

时间: 2023-09-07 15:13:01 浏览: 98

下面是一个示例代码，用于递归遍历指定文件夹下的多个文件夹，并打开每个文件夹中的HTML文件，并提取源代码： ```python import os # 设置文件夹路径 folder_path = 'C:/Users/test/Desktop/DIDItest' # 遍历文件夹中的所有文件和子文件夹 for root, dirs, files in os.walk(folder_path): for file_name in files: if file_name.endswith('.html'): file_path = os.path.join(root, file_name) # 打开HTML文件并读取源代码 with open(file_path, 'r', encoding='utf-8') as file: html_content = file.read() # 在这里可以对源代码进行进一步的操作和提取 # 比如使用BeautifulSoup解析HTML、使用正则表达式提取特定信息等 # 打印源代码 print(html_content) ``` 你可以根据需要在代码中加入额外的操作，例如使用`BeautifulSoup`库解析HTML源代码，或使用正则表达式提取特定信息。这个示例代码将会打印每个HTML文件的源代码，你可以在此基础上进行进一步的处理和提取。

阅读全文