https://movie.douban.com/subject/30391186/comments?sort=new_score&status=P,爬取该网站的评论内容,发布人,点赞数,时间和地点
时间: 2023-12-27 19:03:35 浏览: 137
好的,同样需要使用Python的requests和BeautifulSoup库。以下是代码示例:
```python
import requests
from bs4 import BeautifulSoup
url = "https://movie.douban.com/subject/30391186/comments?sort=new_score&status=P"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
comments = []
for comment in soup.select(".comment-item"):
content = comment.select_one(".comment-content").get_text().strip() # 评论内容
author = comment.select_one(".comment-info a").get_text().strip() # 发布人
likes = comment.select_one(".votes").get_text().strip() # 点赞数
time = comment.select_one(".comment-time").get_text().strip() # 时间
location = comment.select_one(".comment-info").contents[-1].strip() # 地点
comments.append({
"content": content,
"author": author,
"likes": likes,
"time": time,
"location": location
})
print(comments)
```
这段代码会输出该网站所有评论的内容、发布人、点赞数、时间和地点。同样需要注意网站的反爬虫机制,需要适当设置请求头或使用代理等方式绕过限制。
阅读全文