首页爬取本地多个文件夹下html数据并写入mysql数据库

爬取本地多个文件夹下html数据并写入mysql数据库

时间: 2023-08-11 11:56:31 浏览: 109

可以使用Python的BeautifulSoup库来解析HTML文件，并使用PythonMySQL Connector库来连接和写入MySQL数据库。下面是一个示例代码，可以帮你完成这个任务： ```python import os from bs4 import BeautifulSoup import mysql.connector # 创建数据库连接 db = mysql.connector.connect( host="localhost", user="your_username", password="your_password", database="your_database" ) cursor = db.cursor() # 指定本地文件夹路径 folder_path = "/path/to/folder" # 遍历文件夹中的文件 for root, dirs, files in os.walk(folder_path): for file in files: if file.endswith(".html"): file_path = os.path.join(root, file) # 打开HTML文件并解析 with open(file_path, "r", encoding="utf-8") as f: html_content = f.read() soup = BeautifulSoup(html_content, "html.parser") # 提取所需数据 title = soup.title.string.strip() content = soup.get_text().strip() # 将数据插入数据库 insert_query = "INSERT INTO your_table (title, content) VALUES (%s, %s)" insert_values = (title, content) cursor.execute(insert_query, insert_values) db.commit() # 关闭数据库连接 cursor.close() db.close() ``` 请确保将 `your_username`、`your_password`、`your_database`、`your_table` 替换为你自己的MySQL连接信息和目标表名。此外，还需要根据你的HTML文件结构，使用BeautifulSoup库提取所需的数据，并相应地修改代码。这段示例代码将遍历指定文件夹中的所有HTML文件，提取标题和内容，并将它们插入到MySQL数据库中。

阅读全文