首页Python爬虫抓取数据到MySQL数据库

Python爬虫抓取数据到MySQL数据库

时间: 2024-11-15 14:22:57 浏览: 32

Python爬虫抓取数据并存储到MySQL数据库通常涉及以下几个步骤： 1. **安装所需库**：首先，需要安装`requests`库用于发送HTTP请求获取网页内容，`BeautifulSoup`或`lxml`处理HTML解析，以及`pymysql`或`mysql-connector-python`连接MySQL数据库。 2. **编写爬虫程序**：使用如下的伪代码概述过程： ```python import requests from bs4 import BeautifulSoup import pymysql # 定义URL和目标数据库信息 url = 'https://example.com' db_host = 'localhost' db_user = 'your_username' db_password = 'your_password' db_name = 'your_database' # 发送GET请求 response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') # 解析数据 data_to_store = [] # 根据页面结构填充数据 # 连接数据库 connection = pymysql.connect(host=db_host, user=db_user, password=db_password, db=db_name) try: with connection.cursor() as cursor: # 创建表，如果不存在 create_table_query = "CREATE TABLE IF NOT EXISTS your_table (column1 datatype, column2 datatype)" cursor.execute(create_table_query) # 插入数据 for item in data_to_store: insert_query = f"INSERT INTO your_table VALUES (%s, %s)" # 使用占位符防止SQL注入 cursor.execute(insert_query, tuple(item)) finally: # 关闭连接 connection.commit() connection.close() ``` 记得将上述代码中的URL、数据库信息、表名和列名替换为实际值。

阅读全文