首页爬取微博所有热搜并保存为Excel的代码

爬取微博所有热搜并保存为Excel的代码

时间: 2024-12-17 15:30:07 浏览: 8

爬取微博热搜并将其保存到Excel文件通常需要使用网络爬虫技术，比如Python中的requests库获取网页内容，然后解析HTML数据可能需要用到BeautifulSoup或者lxml库。对于微博这样的网站，由于反爬策略的存在，直接抓取可能会遇到限制，所以一般会涉及到模拟登录、使用代理IP等操作。以下是一个简化的Python示例，使用`selenium`库配合`pandas`来完成这个任务，这只是一个基本框架，实际操作还需要根据微博的具体API和网页结构进行调整： ```python import requests from bs4 import BeautifulSoup from selenium import webdriver import time import pandas as pd # 设置浏览器驱动 driver = webdriver.Chrome() # 登录微博（如果需要） # driver.get('https://weibo.com/login') # login_code = input('请输入登录后的cookies：') # 替换成实际登录后的cookies def get_hot_searches(): driver.get('http://hotsearch.weibo.com/') time.sleep(3) # 给页面加载时间 soup = BeautifulSoup(driver.page_source, 'html.parser') # 找到热搜列表的元素 hot_search_list = soup.find('ul', class_='list').find_all('li') searches = [] for item in hot_search_list: title = item.find('a').text link = item.find('a')['href'] searches.append({'热搜': title, '链接': link}) return searches # 获取热搜 hot_searches = get_hot_searches() df = pd.DataFrame(hot_searches) # 保存为Excel df.to_excel('微博热搜.xlsx', index=False) # 关闭浏览器 driver.quit()

阅读全文