首页python爬取微博代码

python爬取微博代码

时间: 2023-07-03 22:28:56 浏览: 112

以下是使用Python爬取微博的示例代码： ```python import requests from lxml import etree # 设置请求头信息 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} # 登录微博账号 login_url = 'https://passport.weibo.cn/sso/login' data = { 'username': '你的微博账号', 'password': '你的微博密码', 'savestate': '1', 'r': 'https://weibo.cn/', 'ec': '0', 'pagerefer': '', 'entry': 'mweibo', 'wentry': '', 'loginfrom': '', 'client_id': '', 'code': '', 'qq': '', 'mainpageflag': '1', 'hff': '', 'hfp': '' } session = requests.Session() session.post(login_url, data=data, headers=headers) # 爬取微博 url = 'https://weibo.cn/u/xxxxxx?page=1' # 替换为你要爬取的微博用户的主页链接 response = session.get(url, headers=headers) html = etree.HTML(response.content) weibos = html.xpath('//div[@class="c" and @id]') for weibo in weibos: # 解析微博内容 weibo_content = weibo.xpath('.//span[@class="ctt"]/text()')[0] print(weibo_content) ``` 需要注意的是，爬取微博需要登录账号，故需要提供账号和密码。此外，微博的页面结构较为复杂，需要使用XPath等方式解析页面内容。在代码中，使用了requests库发送HTTP请求，使用lxml库解析HTML页面。

阅读全文