from pprint import pprint
时间: 2024-08-16 19:03:02 浏览: 44
`from pprint import pprint` 是 Python 中的一个导入语句,它用于引入 `pprint` 模块,该模块提供了一个名为 `pprint()` 的函数,全称为“pretty-print”。这个函数的主要作用是将复杂的数据结构(如字典、列表、元组等)以一种更易读的方式格式化并打印出来。它会缩进显示嵌套的对象,使得层次关系一目了然,便于调试和理解代码中的数据。
例如:
```python
data = {
"name": "Alice",
"age": [25, 26],
"hobbies": ["reading", "gaming"]
}
pprint(data)
```
输出结果将会是格式化的:
```
{
'name': 'Alice',
'age': [25, 26],
'hobbies': ['reading', 'gaming']
}
```
相关问题
import requests from bs4 import BeautifulSoup from pprint import pprint import pymysql url = "https://movie.douban.com/subject/30391186/comments?sort=new_score&status=P" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" } response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, "html.parser") comments = [] for comment in soup.select(".comment-item"): content = comment.select_one(".comment-content").get_text().strip() author = comment.select_one(".comment-info a").get_text().strip() likes = comment.select_one(".votes").get_text().strip() time = comment.select_one(".comment-time").get_text().strip() location = comment.select_one(".comment-location").contents[-1].strip() comments.append({ "content": content, "author": author, "likes": likes, "time": time, "location": location }) pprint(comments) db = pymysql.connect(host="localhost", user="root", password="password", database="test") cursor = db.cursor() for comment in comments: sql = "INSERT INTO comments (content, author, likes, time, location) VALUES (%s, %s, %s, %s, %s)" values = (comment["content"], comment["author"], comment["likes"], comment["time"], comment["location"]) cursor.execute(sql, values) db.commit(),我要爬取前五页的内容
好的,你可以使用一个循环来爬取前五页的内容:
```
import requests
from bs4 import BeautifulSoup
from pprint import pprint
import pymysql
db = pymysql.connect(host="localhost", user="root", password="password", database="test")
cursor = db.cursor()
for page in range(5):
url = f"https://movie.douban.com/subject/30391186/comments?start={page*20}&limit=20&sort=new_score&status=P"
headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" }
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
comments = []
for comment in soup.select(".comment-item"):
content = comment.select_one(".comment-content").get_text().strip()
author = comment.select_one(".comment-info a").get_text().strip()
likes = comment.select_one(".votes").get_text().strip()
time = comment.select_one(".comment-time").get_text().strip()
location = comment.select_one(".comment-location").contents[-1].strip()
comments.append({
"content": content,
"author": author,
"likes": likes,
"time": time,
"location": location
})
pprint(comments)
sql = "INSERT INTO comments (content, author, likes, time, location) VALUES (%s, %s, %s, %s, %s)"
values = (comment["content"], comment["author"], comment["likes"], comment["time"], comment["location"])
cursor.execute(sql, values)
db.commit()
db.close()
```
这段代码会循环5次,每次爬取20条评论,然后将这些评论存储到数据库中。
import requests from bs4 import BeautifulSoup from pprint import pprint url = "https://movie.douban.com/subject/30391186/comments?sort=new_score&status=P" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" } response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, "html.parser") comments = [] for comment in soup.select(".comment-item"): content = comment.select_one(".comment-content").get_text().strip() author = comment.select_one(".comment-info a").get_text().strip() likes = comment.select_one(".votes").get_text().strip() time = comment.select_one(".comment-time").get_text().strip() location = comment.select_one(".comment-location").contents[-1].strip() comments.append({ "content": content, "author": author, "likes": likes, "time": time, "location": location }) pprint(comments),我要爬取前五页的数据
好的,你可以使用以下代码来爬取前五页的数据:
```
import requests
from bs4 import BeautifulSoup
from pprint import pprint
for page in range(5):
url = "https://movie.douban.com/subject/30391186/comments?start={}&limit=20&sort=new_score&status=P".format(page*20)
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
comments = []
for comment in soup.select(".comment-item"):
content = comment.select_one(".comment-content").get_text().strip()
author = comment.select_one(".comment-info a").get_text().strip()
likes = comment.select_one(".votes").get_text().strip()
time = comment.select_one(".comment-time").get_text().strip()
location = comment.select_one(".comment-location").contents[-1].strip()
comments.append({
"content": content,
"author": author,
"likes": likes,
"time": time,
"location": location
})
pprint(comments)
```
这段代码中,我们使用了一个循环来爬取前五页的数据。在每次循环中,我们构造了一个新的URL,将页码作为参数传递给URL,然后使用requests库发送GET请求。然后使用BeautifulSoup库解析HTML响应,并将每个评论的信息存储到一个字典中,最后使用pprint库打印出来。
阅读全文