python爬取微博评论数据存入csv文件的详细代码
时间: 2023-10-25 18:07:26 浏览: 203
以下是使用Python爬取微博评论数据并存入CSV文件的示例代码:
首先,需要安装requests和BeautifulSoup库,可以使用以下命令进行安装:
```
pip install requests
pip install beautifulsoup4
```
接下来,可以使用如下的代码实现爬取微博评论数据并存入CSV文件:
```python
import requests
from bs4 import BeautifulSoup
import csv
# 设置请求头信息
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
# 模拟登录微博并获取Cookies
session = requests.session()
url = 'https://passport.weibo.cn/signin/login'
response = session.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
vk = soup.find('input', {'name': 'vk'})['value']
data = {
'username': 'your_username',
'password': 'your_password',
'savestate': '1',
'r': '',
'ec': '0',
'pagerefer': '',
'entry': 'mweibo',
'wentry': '',
'loginfrom': '',
'client_id': '',
'code': '',
'qq': '',
'mainpageflag': '1',
'hff': '',
'hfp': '',
'vt': '4',
'backURL': 'https%3A%2F%2Fm.weibo.cn%2F',
'mainpageparam': '',
'testcookie': '1',
'sr': '1920*1080',
'nonce': '',
'rsakv': '',
'ua': '',
'callback': 'jsonpcallback' + str(int(time.time() * 1000) + random.randint(1, 100))
}
login_url = 'https://passport.weibo.cn/sso/login'
headers['Referer'] = 'https://passport.weibo.cn/signin/login'
response = session.post(login_url, data=data, headers=headers)
cookies = response.cookies.get_dict()
# 爬取微博评论数据
url = 'https://m.weibo.cn/comments/hotflow?id=xxxxx&mid=xxxxx&max_id_type=0'
headers['Referer'] = 'https://m.weibo.cn/detail/xxxxx'
headers['Cookie'] = '; '.join([key + '=' + value for key, value in cookies.items()])
comments = []
while True:
response = session.get(url, headers=headers)
data = response.json()
if data['ok'] == 1:
max_id = data['data']['max_id']
for comment in data['data']['data']:
comments.append([comment['user']['screen_name'], comment['created_at'], comment['text']])
if max_id == 0:
break
url = 'https://m.weibo.cn/comments/hotflow?id=xxxxx&mid=xxxxx&max_id=' + str(max_id) + '&max_id_type=0'
else:
break
# 将评论数据存入CSV文件
with open('comments.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(['用户名', '评论时间', '评论内容'])
writer.writerows(comments)
```
其中,需要将代码中的`your_username`和`your_password`替换为自己的微博账号和密码,`xxxxx`替换为需要爬取评论的微博ID和MID。
代码中首先模拟登录微博并获取Cookies,然后通过不断发送请求获取微博评论数据,最后将评论数据存入CSV文件中。
阅读全文