首页用python的requests、lxml、pandas、selenium编写一段代码。以https://www.sciencedirect.com/journal/the-lancet为初始界面，等待10秒，用selenium点击该页面上的链接（class="anchor js-volume volume-issue-text anchor-default"），跳转后等待10秒，用selenium点击该页面上的链接（class="switch-check switch-small js-previews-switch"），等待10秒，获取该页面上文章的标题、摘要、作者。建立excel表格，将结果导入excel表格

用python的requests、lxml、pandas、selenium编写一段代码。以https://www.sciencedirect.com/journal/the-lancet为初始界面，等待10秒，用selenium点击该页面上的链接（class="anchor js-volume volume-issue-text anchor-default"），跳转后等待10秒，用selenium点击该页面上的链接（class="switch-check switch-small js-previews-switch"），等待10秒，获取该页面上文章的标题、摘要、作者。建立excel表格，将结果导入excel表格

时间: 2023-03-20 16:01:55 浏览: 114

import requests from lxml import etree import pandas as pd from selenium import webdriverurl = 'https://www.sciencedirect.com/journal/the-lancet'driver = webdriver.Chrome() driver.get(url)# 等待10秒 driver.implicitly_wait(10)# 用selenium点击该页面上的链接（class="anchor js-volume volume-issue-text anchor-default"），跳转后等待10秒 link = driver.find_element_by_css_selector('a.anchor.js-volume.volume-issue-text.anchor-default') link.click() driver.implicitly_wait(10)# 用selenium点击该页面上的链接（class="switch-check switch-small js-previews-switch"），等待10秒 link = driver.find_element_by_css_selector('span.switch-check.switch-small.js-previews-switch') link.click() driver.implicitly_wait(10)# 获取该页面上文章的标题、摘要、作者 html = etree.HTML(driver.page_source) titles = html.xpath('//h2[@class="title"]/a/@title') abstracts = html.xpath('//div[@class="Para"]/p/text()') authors = html.xpath('//p[@class="author"]//span/a/text()')# 建立excel表格，将结果导入excel表格 df = pd.DataFrame({'title': titles, 'abstract': abstracts, 'author': authors}) df.to_excel('result.xlsx', index=False)

阅读全文

最新推荐

相关推荐

python爬虫开发代码-电影网站信息爬取案例

http://python-requests.org/库的透明持久缓存-Python开发

实战自学python如何成为大佬(目录):https://blog.csdn.net/weixin-67859959/artic

使用lxml爬取知乎问题数据 题目：使用selenium和lxml爬取知乎一个热门问题的标题和回答数，并将结果保存到zhihu.txt文件中。 https://www.zhihu.com/knowledge-plan/hot-question/hot/0/hour

covid-19-india-data：用于在印度从https://www.mohfw.gov.in和https：www.covid19india.org收集和清理covid-19的数据和代码的代码和代码。

https://ljgk.envsc.cn/爬虫结果

ofborg：@ofborg工具自动化https：//monitoring.nix.cidashboarddbofborg

下载社会学相关公开数据的简单爬虫 http://www.dingxing.gov.cn/czyslist-394-more.

from selenium import webdriver from selenium.webdriver.common.by import By import time import pandas as pd import requests # 调用驱动 driver = webdriver.Edge(r'C:\Users\DELL\Desktop\msedgedriver.exe') driver.get("https://xl.16888.com/s/129098/")

python文章采集例子（爬取http://infoq.com）

python-贝壳-selenium+requests

Python库 | couchdb-python-requests-0.3.tar.gz

nasdaq_finance：使用Python和LXML编写的Nasdaq.com Web Scraper提取基于公司代码的可用摘要报价

1．爬取http://tyhx.com.cn/Web/Price太原河西网站上的农副产品的数据，并用xpath数据解析，并保存数据。

Python-Requests和Selenium之间的集成层用于Web操作的自动化

docs-python-requests-org-cn-zh_CN-latest.zip

最新推荐

【java毕业设计】智慧社区在线教育平台（源代码+论文+PPT模板）.zip

基于深度强化学习的德州扑克AI算法优化详细文档+全部资料+源码.zip

JavaScript实现的高效pomodoro时钟教程

管理建模和仿真的文件

【WebLogic客户端兼容性提升秘籍】：一站式解决方案与实战案例

使用jupyter读取文件“近5年考试人数.csv”，绘制近5年高考及考研人数发展趋势图，数据如下（单位：万人）。

CMake 3.25.3版本发布：程序员必备构建工具

"互动学习：行动中的多样性与论文攻读经历"

数字信号处理全攻略：掌握15个关键技巧，提升你的处理效率

给定不超过6的正整数A，考虑从A开始的连续4个数字。请输出所有由它们组成的无重复数字的3位数。编写一个C语言程序

使用lxml爬取知乎问题数据题目：使用selenium和lxml爬取知乎一个热门问题的标题和回答数，并将结果保存到zhihu.txt文件中。 https://www.zhihu.com/knowledge-plan/hot-question/hot/0/hour