爬虫爬取csdn博客信息并保存至本地形成xls表格的完整源代码
时间: 2023-10-31 22:30:14 浏览: 110
以下是一个Python爬虫,可以爬取CSDN博客信息并将其保存为Excel文件:
```python
import requests
from bs4 import BeautifulSoup
import xlwt
url = "https://blog.csdn.net/"
user = "your_csdn_username"
user_url = url + user
response = requests.get(user_url)
soup = BeautifulSoup(response.text, "html.parser")
articles = soup.find_all("div", class_="article-item-box csdn-tracking-statistics")
# 创建Excel文件
workbook = xlwt.Workbook()
worksheet = workbook.add_sheet('CSDN Blog')
# 写入表头
worksheet.write(0, 0, 'Title')
worksheet.write(0, 1, 'Link')
worksheet.write(0, 2, 'Summary')
# 写入数据
row = 1
for article in articles:
title = article.h4.a.text
link = article.h4.a['href']
summary = article.find("div", class_="article-desc").text.strip()
worksheet.write(row, 0, title)
worksheet.write(row, 1, link)
worksheet.write(row, 2, summary)
row += 1
# 保存Excel文件
workbook.save('csdn_blog.xls')
```
请将`your_csdn_username`替换为您自己的CSDN用户名,并确保已安装`requests`、`beautifulsoup4`和`xlwt`库。
阅读全文