首页Python爬取微博热搜标题、时间、内容、和链接

Python爬取微博热搜标题、时间、内容、和链接

时间: 2024-12-20 12:30:17 浏览: 27

Python爬取微博热搜的信息通常涉及网络爬虫技术，可以使用如requests库获取网页数据，然后解析HTML内容，借助BeautifulSoup或其他类似库（如lxml）提取所需信息。以下是一个简化的步骤说明： 1. **安装依赖**: - `requests`用于发送HTTP请求 - `beautifulsoup4`或`lxml`用于HTML解析 ```bash pip install requests beautifulsoup4 # 或者如果你更喜欢lxml pip install lxml ``` 2. **编写爬虫脚本**: ```python import requests from bs4 import BeautifulSoup def get_weibo_hot(): url = "https://s.weibo.com/top/summary" # 微博热搜页面URL headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3' } response = requests.get(url, headers=headers) if response.status_code == 200: soup = BeautifulSoup(response.text, 'lxml') # 解析并收集数据（例如使用CSS选择器或XPath） titles = soup.select('.weibo-hot li a span') # 热搜标题 times = soup.select('.time') # 时间 content = soup.select('.txt') # 内容 links = [a['href'] for a in titles] # 链接 # 返回结果字典 return {'titles': titles, 'times': times, 'content': content, 'links': links} else: print(f"Failed to fetch data with status code {response.status_code}") return None # 调用函数并处理结果 hot_data = get_weibo_hot() if hot_data: for i, title in enumerate(hot_data['titles']): print(f"{i+1}. 标题: {title.text}, 时间: {hot_data['times'][i].text}, 链接: {hot_data['links'][i]}") else: print("获取热搜失败")

阅读全文