import requests from bs4 import BeautifulSoup from pprint import pprint url = "https://movie.douban.com/subject/30391186/comments?sort=new_score&status=P" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" } response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, "html.parser") comments = [] for comment in soup.select(".comment-item"): content = comment.select_one(".comment-content").get_text().strip() author = comment.select_one(".comment-info a").get_text().strip() likes = comment.select_one(".votes").get_text().strip() time = comment.select_one(".comment-time").get_text().strip() location = comment.select_one(".comment-location").contents[-1].strip() comments.append({ "content": content, "author": author, "likes": likes, "time": time, "location": location }) pprint(comments),我要将爬取到的内容按照字段存入数据库
时间: 2024-04-03 16:36:41 浏览: 78
import reimport requestsfrom bs4 import BeautifulSoupimport t
好的,你需要先安装数据库相关的库,比如pymysql或者sqlite3。然后在代码中添加数据库连接及插入数据的代码。
以pymysql为例,你需要先连接数据库,比如:
```
import pymysql
db = pymysql.connect(host="localhost", user="root", password="password", database="test")
```
其中`host`是数据库服务器地址,`user`是数据库用户名,`password`是密码,`database`是要连接的数据库名称。
然后,你需要创建一个游标对象,使用游标对象执行插入语句,比如:
```
cursor = db.cursor()
for comment in comments:
sql = "INSERT INTO comments (content, author, likes, time, location) VALUES (%s, %s, %s, %s, %s)"
values = (comment["content"], comment["author"], comment["likes"], comment["time"], comment["location"])
cursor.execute(sql, values)
db.commit()
```
其中,`comments`是你要插入的数据,`sql`是插入语句模板,`values`是插入语句中的参数。执行完插入语句后,需要使用`db.commit()`提交事务。
完整代码如下:
```
import requests
from bs4 import BeautifulSoup
from pprint import pprint
import pymysql
url = "https://movie.douban.com/subject/30391186/comments?sort=new_score&status=P"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
comments = []
for comment in soup.select(".comment-item"):
content = comment.select_one(".comment-content").get_text().strip()
author = comment.select_one(".comment-info a").get_text().strip()
likes = comment.select_one(".votes").get_text().strip()
time = comment.select_one(".comment-time").get_text().strip()
location = comment.select_one(".comment-location").contents[-1].strip()
comments.append({
"content": content,
"author": author,
"likes": likes,
"time": time,
"location": location
})
pprint(comments)
db = pymysql.connect(host="localhost", user="root", password="password", database="test")
cursor = db.cursor()
for comment in comments:
sql = "INSERT INTO comments (content, author, likes, time, location) VALUES (%s, %s, %s, %s, %s)"
values = (comment["content"], comment["author"], comment["likes"], comment["time"], comment["location"])
cursor.execute(sql, values)
db.commit()
```
请注意,以上代码仅供参考,你需要根据实际情况进行调整。
阅读全文