python爬取微博评论数、点赞数、发文时间、转发数、发文内容、博主信息
时间: 2023-10-03 21:07:37 浏览: 134
要爬取微博评论数、点赞数、发文时间、转发数、发文内容、博主信息,你需要使用Python中的第三方库requests和BeautifulSoup来获取网页源代码,并解析HTML内容。
首先,你需要获取微博页面的URL,然后使用requests库发送HTTP请求,获取网页源代码。例如:
```python
import requests
url = 'https://weibo.com/123456789'
response = requests.get(url)
html = response.text
```
接下来,你需要使用BeautifulSoup库来解析HTML内容,找到评论数、点赞数、发文时间、转发数、发文内容和博主信息的标签,例如:
```python
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
weibo = soup.find_all('div', {'class': 'WB_feed_detail clearfix'})
for w in weibo:
# 评论数
comments = w.find('span', {'class': 'line S_line1', 'node-type': 'comment_btn_text'})
if comments is not None:
comments_num = comments.text.strip()
else:
comments_num = '0'
# 点赞数
likes = w.find('span', {'class': 'line S_line1', 'node-type': 'like_status'})
if likes is not None:
likes_num = likes.em.text
else:
likes_num = '0'
# 发文时间
time = w.find('a', {'class': 'S_txt2', 'node-type': 'feed_list_item_date'})
if time is not None:
post_time = time.text.strip()
else:
post_time = '无'
# 转发数
reposts = w.find('a', {'class': 'S_txt2', 'action-type': 'feed_list_forward'})
if reposts is not None:
reposts_num = reposts.text.strip()
else:
reposts_num = '0'
# 发文内容
content = w.find('div', {'class': 'WB_text W_f14'})
if content is not None:
post_content = content.text.strip()
else:
post_content = '无'
# 博主信息
user_info = w.find('div', {'class': 'WB_info'})
if user_info is not None:
user_name = user_info.find('a', {'class': 'W_f14 W_fb S_txt1'})
if user_name is not None:
user_name = user_name.text.strip()
else:
user_name = '无'
user_id = user_info.find('a', {'class': 'S_txt1'})
if user_id is not None:
user_id = user_id.text.strip()
else:
user_id = '无'
else:
user_name = '无'
user_id = '无'
# 输出结果
print('评论数:', comments_num)
print('点赞数:', likes_num)
print('发文时间:', post_time)
print('转发数:', reposts_num)
print('发文内容:', post_content)
print('博主名称:', user_name)
print('博主ID:', user_id)
print('------------------------')
```
这样,你就可以使用Python爬取微博评论数、点赞数、发文时间、转发数、发文内容、博主信息了。但是请注意,爬取他人数据可能会侵犯他人隐私和权益,建议在遵守相关法律法规的前提下进行爬取。
阅读全文