我想爬取微博热榜的数据,这是url:https://s.weibo.com/top/summary/,user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 SLBrowser/8.0.1.4031 SLBChan/30cookie: SINAGLOBAL=5471196458768.1.1650350231825; UOR=m.ali213.net,widget.weibo.com,graph.qq.com; SUBP=0033WrSXqPxfM72-Ws9jqgMF55529P9D9W5kN8GkjYf-fWMuj49XKco_; _s_tentry=passport.weibo.com; Apache=4338083455173.212.1685923900636; ULV=1685923900684:4:1:1:4338083455173.212.1685923900636:1662901020857; SUB=_2AkMTIafsf8NxqwJRmPASym_ia4V1zA_EieKlfVY3JRMxHRl-yj9kqmEQtRB6OKGJA2r3wurgWm0tD7mB5Sg3sjQyYFaX
时间: 2024-04-27 16:21:39 浏览: 185
好的,你可以使用 Python 的 requests 库和 Beautiful Soup 库来爬取微博热榜的数据。以下是示例代码:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://s.weibo.com/top/summary/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 SLBrowser/8.0.1.4031 SLBChan/30',
'Cookie': 'SINAGLOBAL=5471196458768.1.1650350231825; UOR=m.ali213.net,widget.weibo.com,graph.qq.com; SUBP=0033WrSXqPxfM72-Ws9jqgMF55529P9D9W5kN8GkjYf-fWMuj49XKco_; _s_tentry=passport.weibo.com; Apache=4338083455173.212.1685923900636; ULV=1685923900684:4:1:1:4338083455173.212.1685923900636:1662901020857; SUB=_2AkMTIafsf8NxqwJRmPASym_ia4V1zA_EieKlfVY3JRMxHRl-yj9kqmEQtRB6OKGJA2r3wurgWm0tD7mB5Sg3sjQyYFaX'
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
# 找到热搜榜的标签
hot_list = soup.find_all('td', class_='td-02')
# 遍历热搜榜,获取每个热搜的信息
for hot in hot_list:
# 获取热搜排名
rank = hot.find('span', class_='hot').text.strip('热')
# 获取热搜标题
title = hot.find('a').text
# 获取热度
heat = hot.find('span', class_='s').text
print(rank, title, heat)
```
你需要将代码中的 headers 替换成你的请求头信息,然后运行代码即可爬取微博热榜的数据。
阅读全文