scarpy中把数据保存到mysql怎么实现
时间: 2024-05-10 21:21:08 浏览: 78
实现步骤如下:
1. 安装mysql-connector-python模块
```
pip install mysql-connector-python
```
2. 在settings.py文件中配置MySQL数据库连接信息
```
MYSQL_HOST = 'localhost'
MYSQL_PORT = 3306
MYSQL_USER = 'root'
MYSQL_PASSWORD = 'your_password'
MYSQL_DATABASE = 'your_database_name'
```
3. 在pipelines.py文件中编写MySQL存储管道
```
import mysql.connector
class MysqlPipeline(object):
def __init__(self, mysql_host, mysql_port, mysql_user, mysql_password, mysql_database):
self.mysql_host = mysql_host
self.mysql_port = mysql_port
self.mysql_user = mysql_user
self.mysql_password = mysql_password
self.mysql_database = mysql_database
@classmethod
def from_crawler(cls, crawler):
return cls(
mysql_host = crawler.settings.get('MYSQL_HOST'),
mysql_port = crawler.settings.get('MYSQL_PORT'),
mysql_user = crawler.settings.get('MYSQL_USER'),
mysql_password = crawler.settings.get('MYSQL_PASSWORD'),
mysql_database = crawler.settings.get('MYSQL_DATABASE')
)
def open_spider(self, spider):
self.conn = mysql.connector.connect(
host=self.mysql_host,
port=self.mysql_port,
user=self.mysql_user,
password=self.mysql_password,
database=self.mysql_database
)
self.cursor = self.conn.cursor()
def close_spider(self, spider):
self.conn.commit()
self.cursor.close()
self.conn.close()
def process_item(self, item, spider):
insert_sql = '''
INSERT INTO table_name (field1, field2, field3)
VALUES (%s, %s, %s)
'''
self.cursor.execute(insert_sql, (item['field1'], item['field2'], item['field3']))
return item
```
4. 在settings.py文件中启用MySQL存储管道
```
ITEM_PIPELINES = {
'your_project_name.pipelines.MysqlPipeline': 300,
}
```
其中,300是优先级,数字越小,优先级越高。
5. 在Spider中编写item数据
```
import scrapy
class MySpider(scrapy.Spider):
name = 'my_spider'
def start_requests(self):
yield scrapy.Request(url='http://example.com', callback=self.parse)
def parse(self, response):
item = {}
item['field1'] = response.xpath('//h1/text()').get()
item['field2'] = response.xpath('//h2/text()').get()
item['field3'] = response.xpath('//h3/text()').get()
yield item
```
这样,当Spider爬取到数据时,会自动调用MysqlPipeline进行存储。
阅读全文