import requests from bs4 import BeautifulSoup url="https://www.shu.edu.cn/" headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) response.encoding = 'utf-8' html=response.text soup=BeautifulSoup(html,"lxml") content_all=soup.find_all("a") for content in content_all: contentstring=content.text if contentstring!=None: print(contentstring)请你对这串代码改进,我想把抓取到的内容存在一个EXCEL中
时间: 2024-04-04 19:35:18 浏览: 360
好的,我可以帮你进行改进。我们可以使用`pandas`库中的`DataFrame`和`to_excel`方法来将抓取到的内容保存到Excel文件中。具体的代码如下:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.shu.edu.cn/"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
response.encoding = 'utf-8'
html = response.text
soup = BeautifulSoup(html, "lxml")
content_all = soup.find_all("a")
# 创建一个空的DataFrame
df = pd.DataFrame(columns=["content"])
for content in content_all:
# 获取内容字符串
contentstring = content.text.strip()
if contentstring != "":
# 将内容字符串添加到DataFrame中
df = df.append({"content": contentstring}, ignore_index=True)
# 将DataFrame保存到Excel文件中
df.to_excel("content.xlsx", index=False)
```
这段代码会将抓取到的内容保存到一个名为`content.xlsx`的Excel文件中。你可以根据需要修改文件名和保存路径。
阅读全文