select方法中.get_text跟text有什么区别
时间: 2023-03-25 20:01:33 浏览: 153
在select方法中,.get_text和text都是用来获取选定元素的文本内容。它们的区别在于,.get_text是BeautifulSoup库中的方法,而text是元素对象的属性。.get_text可以指定分隔符来获取文本内容,而text只能获取元素内的所有文本内容。
相关问题
import requests from bs4 import BeautifulSoup url = "https://movie.douban.com/subject/30391186/comments?sort=new_score&status=P" response = requests.get(url) soup = BeautifulSoup(response.text, "html.parser") comments = [] for comment in soup.select(".comment-item"): content = comment.select_one(".comment-content").get_text().strip() # 评论内容 author = comment.select_one(".comment-info a").get_text().strip() # 发布人 likes = comment.select_one(".votes").get_text().strip() # 点赞数 time = comment.select_one(".comment-time").get_text().strip() # 时间 location = comment.select_one(".comment-info").contents[-1].strip() # 地点 comments.append({ "content": content, "author": author, "likes": likes, "time": time, "location": location }) print(comments),设置请求头
好的,设置请求头的方法如下,可以在请求中添加"User-Agent"等信息:
```python
import requests
from bs4 import BeautifulSoup
url = "https://movie.douban.com/subject/30391186/comments?sort=new_score&status=P"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
comments = []
for comment in soup.select(".comment-item"):
content = comment.select_one(".comment-content").get_text().strip() # 评论内容
author = comment.select_one(".comment-info a").get_text().strip() # 发布人
likes = comment.select_one(".votes").get_text().strip() # 点赞数
time = comment.select_one(".comment-time").get_text().strip() # 时间
location = comment.select_one(".comment-info").contents[-1].strip() # 地点
comments.append({
"content": content,
"author": author,
"likes": likes,
"time": time,
"location": location
})
print(comments)
```
在这个例子中,我们添加了一个名为"User-Agent"的请求头,值为一个模拟Chrome浏览器的User-Agent字符串,这样可以模拟浏览器访问网站,绕过反爬虫机制。
import requests from bs4 import BeautifulSoup url = "https://movie.douban.com/subject/30391186/comments?sort=new_score&status=P" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" } response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, "html.parser") comments = [] for comment in soup.select(".comment-item"): content = comment.select_one(".comment-content").get_text().strip() # 评论内容 author = comment.select_one(".comment-info a").get_text().strip() # 发布人 likes = comment.select_one(".votes").get_text().strip() # 点赞数 time = comment.select_one(".comment-time").get_text().strip() # 时间 location = comment.select_one(".comment-info").contents[-1].strip() # 地点 comments.append({ "content": content, "author": author, "likes": likes, "time": time, "location": location }) print(comments),这段代码,我要输出的格式好看一点
你可以尝试使用 `pprint` 模块来美化输出格式,如下所示:
```python
import requests
from bs4 import BeautifulSoup
from pprint import pprint
url = "https://movie.douban.com/subject/30391186/comments?sort=new_score&status=P"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
comments = []
for comment in soup.select(".comment-item"):
content = comment.select_one(".comment-content").get_text().strip()
author = comment.select_one(".comment-info a").get_text().strip()
likes = comment.select_one(".votes").get_text().strip()
time = comment.select_one(".comment-time").get_text().strip()
location = comment.select_one(".comment-info").contents[-1].strip()
comments.append({
"content": content,
"author": author,
"likes": likes,
"time": time,
"location": location
})
pprint(comments)
```
这样输出的格式会更加清晰易读。
阅读全文