首页微博热搜及详情页爬虫代码

微博热搜及详情页爬虫代码

时间: 2024-12-18 08:12:23 浏览: 13

微博热搜和详情页的爬虫通常涉及到网络数据抓取技术，使用Python的库如BeautifulSoup、Scrapy或Requests等进行操作。以下是一个简化的爬虫示例，用于获取微博热搜榜： ```python import requests from bs4 import BeautifulSoup def weibo_hot_search(): url = "https://s.weibo.com/top/summary" # 热搜榜页面URL headers = {'User-Agent': 'Mozilla/5.0'} # 设置请求头，模拟浏览器 response = requests.get(url, headers=headers) if response.status_code == 200: soup = BeautifulSoup(response.text, 'lxml') # 解析HTML内容 # 查找热搜列表的元素，这可能会因微博页面结构变动而变化 hot_topics = soup.select('.hot span a') for topic in hot_topics: title = topic.text.strip() # 主题名 link = "https://weibo.com/" + topic['href'] # 链接 print(f"{title}\n{link}\n") # 输出结果 else: print("抓取失败") # 调用函数开始爬取 weibo_hot_search() ``` 对于详情页，如果每个话题都有独立的链接，你可以递归地对每一个链接进行同样的处理。但是请注意，频繁的爬虫活动可能会违反微博的robots.txt规则，甚至导致IP被封禁，所以在实际操作前务必检查相关条款并遵守。

阅读全文