首页python爬虫使用selenium同时爬取网址

python爬虫使用selenium同时爬取网址

时间: 2023-05-28 14:02:01 浏览: 155

和网页元素使用selenium可以模拟浏览器，从而爬取网页的内容和元素。以下是一个使用selenium同时爬取网址和网页元素的示例程序： ```python from selenium import webdriver # 设置Chrome的无头模式 options = webdriver.ChromeOptions() options.add_argument('--headless') options.add_argument('--disable-gpu') # 创建Chrome浏览器对象 browser = webdriver.Chrome(chrome_options=options) # 打开网址 url = 'https://www.baidu.com' browser.get(url) # 获取网页源码 html = browser.page_source print(html) # 获取搜索框元素 input_box = browser.find_element_by_id('kw') print(input_box) # 关闭浏览器 browser.quit() ``` 这个程序使用了Chrome浏览器的无头模式，也就是不弹出浏览器窗口，直接在后台运行。首先，打开了百度网址，获取了网页源码，并打印出来。然后，使用`find_element_by_id`方法获取了搜索框的元素，并将其打印出来。最后关闭了浏览器。通过这种方式，我们既可以爬取网页的内容，也可以爬取网页的元素，比如输入框、按钮、链接等。有了这些元素，我们就可以对网页进行更加深入的操作，如填写表单、点击按钮、跳转链接等。