用Python抓取微博数据,含:微博内容、作者、发布时间、点赞数、转发数、收藏数等,并并将胖爬取的数据导入csv的形式
时间: 2024-10-23 12:08:35 浏览: 190
在Python中抓取微博数据通常涉及到网络爬虫技术,使用如`requests`, `beautifulsoup4` 或 `selenium` 等库。这里我会简述一个基本步骤:
1. **安装所需库**:
首先需要安装`requests`用于发送HTTP请求获取网页源码,`lxml` 和 `beautifulsoup4` 用于解析HTML。
```bash
pip install requests beautifulsoup4 lxml
```
2. **编写爬虫脚本**:
使用`requests.get()`获取微博网页源码,然后用BeautifulSoup解析提取出你需要的信息,例如微博内容(`content`), 作者(`author`), 发布时间(`pub_date`), 点赞数(`likes`), 转发数(`reposts`)和收藏数(`favorites`)。
```python
import requests
from bs4 import BeautifulSoup
import csv
def get_weibo_info(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
# 假设微博信息保存在一个特定的HTML标签内,如div class="weibo-info"
content = soup.find('div', {'class': 'weibo-content'}).text
author = soup.find('span', {'class': 'username'}).text
pub_date = soup.find('time')['datetime']
# 获取点赞、转发和收藏数,假设数字在span标签里,如'data-like-count'
likes = int(soup.find('span', {'data-stat': 'like-count'})['data-stat'])
reposts = int(soup.find('span', {'data-stat': 'repost-count'})['data-stat'])
favorites = int(soup.find('span', {'data-stat': 'collect-count'})['data-stat'])
return {
'content': content,
'author': author,
'pub_date': pub_date,
'likes': likes,
'reposts': reposts,
'favorites': favorites
}
def save_to_csv(data, filename):
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=data.keys())
writer.writeheader()
writer.writerow(data)
# 示例微博链接
url = "https://weibo.com/your_username/status_id"
info = get_weibo_info(url)
save_to_csv(info, 'weibo_data.csv')
```
记得替换`url`为你想要抓取的具体微博页面地址。
阅读全文