scrapy爬虫数据存入MySQL
时间: 2024-01-16 07:03:38 浏览: 137
可以使用Python编写一个Scrapy爬虫来爬取数据,并将数据存储到MySQL数据库中。首先需要安装MySQL连接库,例如mysql-connector-python。然后在Scrapy爬虫的piplines.py文件中增加一个MySQL Pipeline,以便将数据存储到MySQL数据库中。此外,还需要添加MySQL数据库的连接信息和表结构信息,以便能够正确地连接数据库和将数据存储到正确的表和字段中。以下是一个示例:
``` python
import mysql.connector
class MySQLPipeline(object):
def __init__(self, mysql_host, mysql_username, mysql_password, mysql_database):
self.mysql_host = mysql_host
self.mysql_username = mysql_username
self.mysql_password = mysql_password
self.mysql_database = mysql_database
@classmethod
def from_crawler(cls, crawler):
mysql_host = crawler.settings.get('MYSQL_HOST', 'localhost')
mysql_username = crawler.settings.get('MYSQL_USERNAME', 'root')
mysql_password = crawler.settings.get('MYSQL_PASSWORD', '')
mysql_database = crawler.settings.get('MYSQL_DATABASE', 'test')
return cls(mysql_host, mysql_username, mysql_password, mysql_database)
def open_spider(self, spider):
self.connection = mysql.connector.connect(
host=self.mysql_host,
user=self.mysql_username,
password=self.mysql_password,
database=self.mysql_database
)
self.cursor = self.connection.cursor()
def close_spider(self, spider):
self.cursor.close()
self.connection.close()
def process_item(self, item, spider):
sql = "INSERT INTO table_name (column1, column2, column3) VALUES (%s, %s, %s)"
values = (item['column1'], item['column2'], item['column3'])
self.cursor.execute(sql, values)
self.connection.commit()
return item
```
其中,MYSQL_HOST、MYSQL_USERNAME、MYSQL_PASSWORD和MYSQL_DATABASE是在Scrapy的settings.py文件中设置的变量,用于配置MySQL连接信息。在Scrapy的Spider中,通过yield发送数据给MySQL Pipeline进行处理。例如:
``` python
class MySpider(Spider):
name = 'my_spider'
start_urls = ['http://www.example.com']
def parse(self, response):
items = response.xpath('//div[contains(@class, "item")]')
for item in items:
# 解析数据
column1 = item.xpath('a/text()').extract_first().strip()
column2 = item.xpath('p/text()').extract_first().strip()
column3 = item.xpath('span/text()').extract_first().strip()
# 将数据发送给MySQL Pipeline
yield {
'column1': column1,
'column2': column2,
'column3': column3
}
```
这样就可以将Scrapy爬虫爬取的数据存储到MySQL数据库中了。
阅读全文