爬取https://www.bilibili.com/video/BV1SQ4y1V7do/?spm_id_from=333.999.0.0&vd_source=6e9299676612960e729网页的所有评论用户信息和评论时间并保存为CSV文件的代码
时间: 2024-11-20 08:37:40 浏览: 5
爬取B站视频评论的过程通常涉及到网络请求、HTML解析以及数据导出,这里我会提供一个基础的Python代码示例,使用`requests`, `BeautifulSoup`库来抓取网页内容,然后使用`pandas`将数据保存为CSV。注意实际操作中可能需要处理反爬机制,如登录验证、动态加载等。
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
# 定义函数获取单页评论
def get_page_comments(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
comments = []
for comment in soup.find_all('li', class_='comment-item'):
user_info = comment.find('span', class_='author').text.strip()
time_info = comment.find('time')['datetime']
comments.append({'用户名': user_info, '评论时间': time_info})
return comments
# 主程序开始爬取并保存数据
def main():
url = 'https://www.bilibili.com/video/BV1SQ4y1V7do/?spm_id_from=333.999.0.0&vd_source=6e9299676612960e729'
total_pages = 1 # 假设单页评论数已知或者通过页面总数计算
all_comments = [] # 所有评论的列表
for _ in range(total_pages):
page_comments = get_page_comments(url)
all_comments.extend(page_comments)
df = pd.DataFrame(all_comments)
df.to_csv('bilibili_comments.csv', index=False)
if __name__ == '__main__':
main()
```
阅读全文