首页https://movie.douban.com/subject/30391186/comments?sort=new_score&status=P，爬取该网站的评论内容，发布人，点赞数，时间和地点

https://movie.douban.com/subject/30391186/comments?sort=new_score&status=P，爬取该网站的评论内容，发布人，点赞数，时间和地点

时间: 2023-12-27 19:03:35 浏览: 137

好的，同样需要使用Python的requests和BeautifulSoup库。以下是代码示例： ```python import requests from bs4 import BeautifulSoup url = "https://movie.douban.com/subject/30391186/comments?sort=new_score&status=P" response = requests.get(url) soup = BeautifulSoup(response.text, "html.parser") comments = [] for comment in soup.select(".comment-item"): content = comment.select_one(".comment-content").get_text().strip() # 评论内容 author = comment.select_one(".comment-info a").get_text().strip() # 发布人 likes = comment.select_one(".votes").get_text().strip() # 点赞数 time = comment.select_one(".comment-time").get_text().strip() # 时间 location = comment.select_one(".comment-info").contents[-1].strip() # 地点 comments.append({ "content": content, "author": author, "likes": likes, "time": time, "location": location }) print(comments) ``` 这段代码会输出该网站所有评论的内容、发布人、点赞数、时间和地点。同样需要注意网站的反爬虫机制，需要适当设置请求头或使用代理等方式绕过限制。

阅读全文