用pycharm爬取软科-高等教育评价领先品牌网站的院校的内容并按照校名、地区、管理部门、类别、双一流 共计5个字段进行解析最后保存到csv文件的代码
时间: 2024-01-26 09:02:18 浏览: 114
好的,以下是代码实现:
```python
import requests
from bs4 import BeautifulSoup
import csv
url = "http://www.zuihaodaxue.com/zuihaodaxuepaiming2021.html"
# 设置请求头信息
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
# 用csv库打开(创建)文件并写入表头
with open('universities.csv', 'w', newline='', encoding='utf-8-sig') as file:
writer = csv.writer(file)
writer.writerow(['校名', '地区', '管理部门', '类别', '双一流'])
# 发送请求并获取响应
response = requests.get(url, headers=headers)
# 解析网页内容
soup = BeautifulSoup(response.content.decode('utf-8'), 'lxml')
# 获取大学列表
universities = soup.select('#main > div > table > tbody > tr')
# 循环解析每个大学的信息并写入csv文件
with open('universities.csv', 'a', newline='', encoding='utf-8-sig') as file:
writer = csv.writer(file)
for university in universities:
# 获取每个字段的值
name = university.select_one('td:nth-child(2)').text.strip()
location = university.select_one('td:nth-child(3)').text.strip()
department = university.select_one('td:nth-child(4)').text.strip()
category = university.select_one('td:nth-child(5)').text.strip()
double_first_class = university.select_one('td:nth-child(6)').text.strip()
# 将信息写入csv文件
writer.writerow([name, location, department, category, double_first_class])
```
这段代码会爬取软科-高等教育评价领先品牌网站的 https://www.zuihaodaxue.com/zuihaodaxuepaiming2021.html 页面的内容,并将校名、地区、管理部门、类别、双一流等5个字段的信息解析出来,并保存到名为“universities.csv”的csv文件中。
阅读全文