import requests from bs4 import BeautifulSoup url = 'https://weibo.com/u/1670659923' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.41' } cookie = { 'SUB': '_2A25JhpdQDeRhGeBM61MQ8CjOzzqIHXVq9Y-YrDV8PUNbmtAGLWvxkW9NRQ-ugE6rVY1cMvmezl4YyGeJ1ClvO4dN', 'SUBP': '0033WrSXqPxfM725Ws9jqgMF55529P9D9Wh2o78rH.vjBjZo6khos3mY5JpX5KzhUgL.FoqEeh2pehqEShq2dJLoIpjLxK.L1h2LB--LxKBLBonL12BLxKnL1hzL1h.t;ALF=1717836416; SSOLoginState=1686300416;XSRF-TOKEN=LdwKRX-Gczu2_lX9TQQ4O9EY' } response = requests.get(url, headers=headers,cookies=cookie) soup = BeautifulSoup(response.text, 'html.parser') #获取名称 name = soup.find('div', {'class': 'ProfileHeader_name_1KbBs'}) print(name)请修改元素查找方式
时间: 2023-12-31 16:05:41 浏览: 136
如果想要获取用户的微博数、关注数、粉丝数等信息,可以使用以下代码:
```python
# 获取微博数、关注数、粉丝数
info_list = soup.select('div.tb_counter > div > a')
weibo_num = info_list[0].text.strip()
following_num = info_list[1].text.strip()
follower_num = info_list[2].text.strip()
print(f'微博数:{weibo_num},关注数:{following_num},粉丝数:{follower_num}')
```
这里使用了CSS选择器`div.tb_counter > div > a`来查找微博数、关注数、粉丝数对应的`a`标签,并使用`.text`属性获取标签文本内容。同时,通过`.strip()`方法来去掉一些不必要的空格和换行符。最后使用`f-string`格式化输出结果。
相关问题
import requests from bs4 import BeautifulSoup url = 'https://weibo.com/u/1670659923' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36' } cookies = { 'Cookie': '1mN2r5kExwviijjaUAMJq3Ks' } response = requests.get(url, headers=headers, cookies=cookies) soup = BeautifulSoup(response.text, 'html.parser') #获取名称 name = soup.find('div', {'class': 'detail_wbtext_4CRf9'}).text print(name) 执行以上代码报以下错误name = soup.find('div', {'class': 'detail_wbtext_4CRf9'}).text AttributeError: 'NoneType' object has no attribute 'text' 请根据报错修改代码
该错误表示无法找到相应的 div 标签,导致 `soup.find()` 方法返回了 `None`。可能是因为网页的结构发生了变化,或者定位 div 标签的 class 名称不正确。可以尝试修改为以下代码:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://weibo.com/u/1670659923'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
}
cookies = {
'Cookie': '1mN2r5kExwviijjaUAMJq3Ks'
}
response = requests.get(url, headers=headers, cookies=cookies)
soup = BeautifulSoup(response.text, 'html.parser')
# 修改为通过标签名获取 div 标签内的文本内容
name = soup.find('div', {'class': 'pf_username'}).find('a').text
print(name)
```
这段代码将通过标签名获取 `pf_username` class 下的 a 标签内的文本内容,避免了直接获取 `detail_wbtext_4CRf9` class 下的 div 标签的问题。
我想爬取微博热榜的数据,这是url:https://s.weibo.com/top/summary/,user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 SLBrowser/8.0.1.4031 SLBChan/30cookie: SINAGLOBAL=5471196458768.1.1650350231825; UOR=m.ali213.net,widget.weibo.com,graph.qq.com; SUBP=0033WrSXqPxfM72-Ws9jqgMF55529P9D9W5kN8GkjYf-fWMuj49XKco_; _s_tentry=passport.weibo.com; Apache=4338083455173.212.1685923900636; ULV=1685923900684:4:1:1:4338083455173.212.1685923900636:1662901020857; SUB=_2AkMTIafsf8NxqwJRmPASym_ia4V1zA_EieKlfVY3JRMxHRl-yj9kqmEQtRB6OKGJA2r3wurgWm0tD7mB5Sg3sjQyYFaX
好的,你可以使用 Python 的 requests 库和 Beautiful Soup 库来爬取微博热榜的数据。以下是示例代码:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://s.weibo.com/top/summary/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 SLBrowser/8.0.1.4031 SLBChan/30',
'Cookie': 'SINAGLOBAL=5471196458768.1.1650350231825; UOR=m.ali213.net,widget.weibo.com,graph.qq.com; SUBP=0033WrSXqPxfM72-Ws9jqgMF55529P9D9W5kN8GkjYf-fWMuj49XKco_; _s_tentry=passport.weibo.com; Apache=4338083455173.212.1685923900636; ULV=1685923900684:4:1:1:4338083455173.212.1685923900636:1662901020857; SUB=_2AkMTIafsf8NxqwJRmPASym_ia4V1zA_EieKlfVY3JRMxHRl-yj9kqmEQtRB6OKGJA2r3wurgWm0tD7mB5Sg3sjQyYFaX'
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
# 找到热搜榜的标签
hot_list = soup.find_all('td', class_='td-02')
# 遍历热搜榜,获取每个热搜的信息
for hot in hot_list:
# 获取热搜排名
rank = hot.find('span', class_='hot').text.strip('热')
# 获取热搜标题
title = hot.find('a').text
# 获取热度
heat = hot.find('span', class_='s').text
print(rank, title, heat)
```
你需要将代码中的 headers 替换成你的请求头信息,然后运行代码即可爬取微博热榜的数据。
阅读全文