用python爬虫爬取https://www.shanghairanking.cn/institution网站的内容并写入csv文件的代码
时间: 2023-12-24 14:05:37 浏览: 82
好的,以下是代码实现:
```python
import requests
from bs4 import BeautifulSoup
import csv
# 获取网页内容
url = "https://www.shanghairanking.cn/institution"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
# 获取表格头
table_head = soup.find("thead").find_all("th")
headers = [th.text.strip() for th in table_head]
# 获取表格数据
table_body = soup.find("tbody").find_all("tr")
data = []
for tr in table_body:
row = []
tds = tr.find_all("td")
for td in tds:
row.append(td.text.strip())
data.append(row)
# 写入csv文件
with open("shanghai_ranking.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(data)
```
这段代码使用requests和BeautifulSoup库获取了目标网站的内容,然后从中提取出表格头和表格数据,最后将它们写入到CSV文件中。注意要在代码中指定文件名和编码格式。
阅读全文