首页python使用selenium爬取网页数据前先清除浏览器缓存，给出代码

python使用selenium爬取网页数据前先清除浏览器缓存，给出代码

时间: 2024-12-24 19:31:17 浏览: 48

在Python中使用Selenium爬虫之前，清除浏览器缓存可以帮助避免因之前的会话数据导致的数据偏差。以下是清除Chrome浏览器缓存的一个示例代码，适用于Selenium WebDriver与Chrome： ```python from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.desired_capabilities import DesiredCapabilities # 定义清除缓存的函数 def clear_cache(): # 创建Chrome浏览器服务 chrome_options = webdriver.ChromeOptions() # 配置选项以禁用缓存 chrome_options.add_argument('--disk-cache-dir=/dev/null') chrome_options.add_argument('--disable-blink-cache') # 如果需要，设置无头模式 # chrome_options.headless = True # 初始化Chrome驱动服务 service = Service('path_to_your_chromedriver') # 替换为你的chromedriver路径 # 使用DesiredCapabilities创建一个新的会话 capabilities = DesiredCapabilities.CHROME.copy() capabilities['cache'] = 'false' # 打开浏览器并清除缓存 with webdriver.Chrome(service=service, options=chrome_options, desired_capabilities=capabilities) as driver: driver.get('http://example.com') # 这里替换为你想要访问的网址 # 网页加载完成后，关闭浏览器 driver.quit() clear_cache() ``` 在这个例子中，`path_to_your_chromedriver`需要替换为你实际的Chrome驱动程序路径。运行此代码后，每次都会新开一个无缓存的Chrome会话。

阅读全文