用python语言写一个程序,程序的要求以“小牛”为关键词,爬取关于他的五十条微博信息,其中包括点赞转发评论的数据以及微博的图片,并将其保存在excel,保存路径为C:\Users\wangshiwei\Desktop\xiaoniu.xlsx
时间: 2023-09-25 08:06:37 浏览: 237
用于爬取微博信息的python爬虫程序
### 回答1:
你好!这是一个可以帮助你完成这个任务的程序:
首先,我们需要安装必要的库:
```python
# 安装爬虫库 requests
!pip install requests
# 安装解析库 BeautifulSoup
!pip install beautifulsoup4
# 安装 xlwt 库来写入 Excel 文件
!pip install xlwt
```
然后,我们可以使用以下代码爬取微博信息并保存到 Excel 文件中:
```python
import requests
from bs4 import BeautifulSoup
import xlwt
# 设置保存路径
save_path = r'C:\Users\wangshiwei\Desktop\xiaoniu.xlsx'
# 创建工作簿
workbook = xlwt.Workbook()
# 创建工作表
worksheet = workbook.add_sheet('Sheet1')
# 设置表头
worksheet.write(0, 0, '标题')
worksheet.write(0, 1, '点赞数')
worksheet.write(0, 2, '转发数')
worksheet.write(0, 3, '评论数')
worksheet.write(0, 4, '图片地址')
# 设置抓取的关键词和页数
keyword = '小牛'
page = 50
# 循环抓取每一页的数据
for i in range(1, page+1):
# 请求的 URL
url = f'https://s.weibo.com/weibo?q={keyword}&Refer=weibo_search&page={i}'
# 发送请求,获取响应
response = requests.get(url)
# 解析响应内容
soup = BeautifulSoup(response.text, 'html.parser')
# 获取所有的微博 div
weibo_divs = soup.find_all('div', class_='card-wrap')
# 循环处理每一条微博
for j, weibo_div in enumerate(weibo_divs):
# 获取标题
title = weibo_div.find('p', class_='
### 回答2:
下面是使用Python语言编写的程序,可以根据关键词"小牛"爬取五十条相关微博信息,包括点赞、转发、评论数据以及微博的图片,并将结果保存在Excel文件“C:\Users\wangshiwei\Desktop\xiaoniu.xlsx”中。
```python
import requests
import openpyxl
import json
# 设置请求头部信息
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36'
}
# 设置保存的路径和文件名
save_path = 'C:\\Users\\wangshiwei\\Desktop\\xiaoniu.xlsx'
# 爬取微博信息的URL
url = 'https://weibo.com/api/container/getIndex?type=uid&value=123456789&containerid=107603123456789&page=1'
# 发送请求并获取响应数据
response = requests.get(url, headers=headers)
data = response.json()
# 创建一个Excel工作簿和工作表
workbook = openpyxl.Workbook()
sheet = workbook.active
# 写入表头信息
sheet['A1'] = '微博内容'
sheet['B1'] = '点赞数'
sheet['C1'] = '转发数'
sheet['D1'] = '评论数'
sheet['E1'] = '微博图片链接'
# 解析并写入数据
row_num = 2
for i in range(50):
weibo = data['data']['cards'][i]['mblog']
content = weibo['text']
attitudes_count = weibo['attitudes_count']
reposts_count = weibo['reposts_count']
comments_count = weibo['comments_count']
pic_urls = weibo['pic_urls']
# 将数据写入Excel表格相应的列
sheet.cell(row=row_num, column=1).value = content
sheet.cell(row=row_num, column=2).value = attitudes_count
sheet.cell(row=row_num, column=3).value = reposts_count
sheet.cell(row=row_num, column=4).value = comments_count
sheet.cell(row=row_num, column=5).value = ', '.join([pic['url'] for pic in pic_urls])
row_num += 1
# 保存Excel文件
workbook.save(save_path)
print('微博信息已保存到{}'.format(save_path))
```
需要将代码中的`123456789`替换为小牛的微博用户ID,即可实现按照要求爬取微博信息,并将结果保存在指定路径下的Excel文件中。
### 回答3:
import requests
from bs4 import BeautifulSoup
import openpyxl
# 创建Excel文件
wb = openpyxl.Workbook()
# 创建工作表
ws = wb.active
# 设置表头
ws.append(["微博内容", "点赞数", "转发数", "评论数", "图片链接"])
# 获取50条微博信息
url = "https://weibo.com/?category=10007"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
weibo_list = soup.find_all("div", class_="WB_feed_detail")
count = 0
for weibo in weibo_list:
# 获取微博内容
content = weibo.find("div", class_="WB_text").get_text(strip=True)
# 获取点赞数、转发数、评论数
like_count = weibo.find("span", class_="ico_loading").next_sibling.get_text()
repost_count = weibo.find("span", class_="pos").get_text()
comment_count = weibo.find("span", class_="pos").next_sibling.get_text()
# 获取微博图片链接
image_url = ""
if weibo.find("div", class_="WB_media_wrap"):
images = weibo.find_all("img")
image_url = "\n".join([img["src"] for img in images])
# 保存数据到Excel
ws.append([content, like_count, repost_count, comment_count, image_url])
count += 1
if count >= 50:
break
# 保存Excel文件
wb.save("C:\\Users\\wangshiwei\\Desktop\\xiaoniu.xlsx")
阅读全文