给下面代码添加一个可以获取下一页的数据import time from selenium import webdriver import csv driver = webdriver.Chrome() driver.implicitly_wait(10) driver.get("https://www.shanghairanking.cn/institution") name = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[1]/span') address = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[3]/span[1]') manage = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[3]/span[2]') category = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[3]/span[3]') syl = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[3]/span[4]') jbw = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[3]/span[5]') eyy = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[3]/span[6]') rows = [] for name_1, address_name, manage_name, category_name, syl_name, jbw_name, eyy_name in zip(name, address, manage, category, syl, jbw, eyy): rows.append([name_1.text, address_name.text, manage_name.text, category_name.text, syl_name.text, jbw_name.text, eyy_name.text]) with open('universities.csv', 'w', newline='', encoding='utf-8') as f: writer = csv.writer(f) writer.writerow(['大学名称', '地址', '管理部门', '类别', '双一流', '985', '211']) writer.writerows(rows) driver.quit()
时间: 2024-01-21 11:03:03 浏览: 85
可以使用循环来实现获取多页数据的功能,具体步骤如下:
1. 使用一个变量来记录当前页数,初始值为1;
2. 在循环中,先查找当前页面的数据,然后将数据添加到rows列表中;
3. 查找下一页按钮,并判断是否存在下一页,如果存在则点击下一页按钮,同时将页数加1;
4. 如果不存在下一页,则退出循环。
下面是修改后的代码示例:
```python
import time
from selenium import webdriver
import csv
driver = webdriver.Chrome()
driver.implicitly_wait(10)
driver.get("https://www.shanghairanking.cn/institution")
rows = []
page = 1
while True:
name = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[1]/span')
address = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[3]/span[1]')
manage = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[3]/span[2]')
category = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[3]/span[3]')
syl = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[3]/span[4]')
jbw = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[3]/span[5]')
eyy = driver.find_elements_by_xpath('//*[@id="__layout"]/div/div[2]/div[2]/div[1]/div/div/div[2]/div[3]/span[6]')
for name_1, address_name, manage_name, category_name, syl_name, jbw_name, eyy_name in zip(name, address, manage, category, syl, jbw, eyy):
rows.append([name_1.text, address_name.text, manage_name.text, category_name.text, syl_name.text, jbw_name.text, eyy_name.text])
# 查找下一页按钮
next_page = driver.find_element_by_css_selector('.ant-pagination-next')
# 判断是否有下一页
if 'ant-pagination-disabled' in next_page.get_attribute('class'):
break
else:
# 点击下一页按钮
next_page.click()
page += 1
print(f'正在爬取第{page}页数据')
time.sleep(1)
with open('universities.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['大学名称', '地址', '管理部门', '类别', '双一流', '985', '211'])
writer.writerows(rows)
driver.quit()
```
运行后,程序会自动爬取多页数据,并将结果保存到universities.csv文件中。
阅读全文