爬取小米社区近一年帖子发帖人昵称以及帖子评论用户昵称的Python代码,如果帖子无人评论则不需要爬取,存在评论就爬取
时间: 2023-10-31 15:04:21 浏览: 39
好的,以下是Python代码,需要使用requests和BeautifulSoup库:
```python
import requests
from bs4 import BeautifulSoup
# 小米社区近一年的URL
url = "https://www.mi.com/bbs/forum.php?mod=forumdisplay&fid=38&filter=author&orderby=dateline&dateline=86400"
# 发送请求,获取HTML内容
response = requests.get(url)
html = response.content
# 使用BeautifulSoup解析HTML内容
soup = BeautifulSoup(html, "html.parser")
# 获取所有帖子的链接
post_links = [a["href"] for a in soup.select(".xst")]
for link in post_links:
# 发送请求,获取帖子页面的HTML内容
post_response = requests.get(link)
post_html = post_response.content
# 使用BeautifulSoup解析帖子页面的HTML内容
post_soup = BeautifulSoup(post_html, "html.parser")
# 获取发帖人昵称
author = post_soup.select_one(".authi a").text
# 获取评论用户昵称
comment_users = []
comments = post_soup.select(".p_t")
for comment in comments:
user = comment.select_one(".xw1").text
if user != author:
comment_users.append(user)
# 如果有评论,则输出发帖人昵称和评论用户昵称
if comment_users:
print("发帖人:", author)
print("评论用户:", comment_users)
```
需要注意的是,如果小米社区对爬虫进行了反爬虫措施,这个代码可能会失效。