爬取 猫眼电影榜单TOP100榜 的100部电影名字name,主演starts,上映时间releasetime,评分score等信息,并保存至本地MySQL数据库mydb中,建表mymovies(自己在pipelines.py中建表)存放以上数据
时间: 2024-02-11 21:05:25 浏览: 163
python,爬取猫眼电影top100电影信息
好的,我明白了。首先,你需要安装相关的库:
```python
pip install scrapy pymysql
```
接着,在`settings.py`中配置MySQL数据库连接信息:
```python
MYSQL_HOST = 'localhost' # 数据库地址
MYSQL_PORT = 3306 # 数据库端口
MYSQL_USER = 'your_username' # 数据库用户名
MYSQL_PASSWORD = 'your_password' # 数据库密码
MYSQL_DATABASE = 'mydb' # 数据库名称
```
然后,在`pipelines.py`中编写代码实现连接数据库、创建表和插入数据:
```python
import pymysql
class MymoviesPipeline:
def __init__(self, mysql_host, mysql_port, mysql_user, mysql_password, mysql_database):
self.mysql_host = mysql_host
self.mysql_port = mysql_port
self.mysql_user = mysql_user
self.mysql_password = mysql_password
self.mysql_database = mysql_database
@classmethod
def from_crawler(cls, crawler):
return cls(
mysql_host=crawler.settings.get('MYSQL_HOST'),
mysql_port=crawler.settings.get('MYSQL_PORT'),
mysql_user=crawler.settings.get('MYSQL_USER'),
mysql_password=crawler.settings.get('MYSQL_PASSWORD'),
mysql_database=crawler.settings.get('MYSQL_DATABASE')
)
def open_spider(self, spider):
self.conn = pymysql.connect(
host=self.mysql_host,
port=self.mysql_port,
user=self.mysql_user,
password=self.mysql_password,
database=self.mysql_database,
charset='utf8mb4'
)
self.cur = self.conn.cursor()
def close_spider(self, spider):
self.cur.close()
self.conn.close()
def process_item(self, item, spider):
sql = '''
CREATE TABLE IF NOT EXISTS mymovies (
id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
starts VARCHAR(255) NOT NULL,
releasetime VARCHAR(255) NOT NULL,
score FLOAT(2, 1) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
'''
self.cur.execute(sql)
sql = '''
INSERT INTO mymovies (name, starts, releasetime, score)
VALUES (%s, %s, %s, %s)
'''
self.cur.execute(sql, (
item['name'],
item['starts'],
item['releasetime'],
item['score']
))
self.conn.commit()
return item
```
最后,在`items.py`中定义爬取的数据字段:
```python
import scrapy
class MymoviesItem(scrapy.Item):
name = scrapy.Field()
starts = scrapy.Field()
releasetime = scrapy.Field()
score = scrapy.Field()
```
这样就可以完成爬取猫眼电影榜单TOP100榜的100部电影名字、主演、上映时间、评分等信息,并保存至本地MySQL数据库mydb中了。
阅读全文