pycharm用正则爬取数据后连接到数据库
时间: 2024-03-12 21:49:22 浏览: 88
pycharm连接数据库
首先,你需要安装并导入相应的库,如pymysql和re模块。
接着,你需要建立数据库连接,可以使用pymysql库的connect()方法,例如:
```
import pymysql
# 建立数据库连接
conn = pymysql.connect(host='localhost', user='root', password='password', database='test')
```
接下来,你需要使用正则表达式匹配到需要的数据,例如:
```
import re
# 正则表达式匹配数据
pattern = re.compile(r'<div class="item">.*?<em class="">(.*?)</em>.*?<span class="title">(.*?)</span>.*?<span class="rating_num" property="v:average">(.*?)</span>.*?</div>', re.S)
```
最后,你需要将匹配到的数据插入到数据库中,可以使用pymysql库的execute()方法,例如:
```
# 插入数据到数据库
cursor = conn.cursor()
cursor.execute("INSERT INTO movies (rank, title, rating) VALUES (%s, %s, %s)", (rank, title, rating))
conn.commit()
```
完整代码示例:
```
import pymysql
import re
# 建立数据库连接
conn = pymysql.connect(host='localhost', user='root', password='password', database='test')
# 正则表达式匹配数据
pattern = re.compile(r'<div class="item">.*?<em class="">(.*?)</em>.*?<span class="title">(.*?)</span>.*?<span class="rating_num" property="v:average">(.*?)</span>.*?</div>', re.S)
# 爬取数据并插入到数据库
with open('douban.html', 'r', encoding='utf-8') as f:
html = f.read()
results = pattern.findall(html)
for result in results:
rank = result[0]
title = result[1]
rating = result[2]
cursor = conn.cursor()
cursor.execute("INSERT INTO movies (rank, title, rating) VALUES (%s, %s, %s)", (rank, title, rating))
conn.commit()
# 关闭数据库连接
conn.close()
```
阅读全文