给出一段可运行的https://zhuanlan.zhihu.com/p/687972531爬取这个网站数据的代码
时间: 2024-09-28 11:03:01 浏览: 136
针对知乎专栏文章,通常我们需要使用Selenium等浏览器自动化工具配合Python,因为网页可能会有反爬机制或者动态加载内容。下面是一个简单的Selenium + ChromeDriver的例子,假设你已经安装了所需的库和驱动程序:
```python
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def fetch_zhihu_article(url):
chrome_options = Options()
# 设置无头模式,减小被识别为爬虫的风险
chrome_options.add_argument('--headless')
driver = webdriver.Chrome(options=chrome_options)
try:
driver.get(url)
# 等待页面加载完成
wait = WebDriverWait(driver, 10)
title_element = wait.until(EC.presence_of_element_located((By.XPATH, '//h1[@class="zm-item-title"]')))
article_title = title_element.text
article_content = driver.find_element_by_class_name("zm-editable-content").text
print(f"标题: {article_title}")
print(f"内容: {article_content}")
finally:
driver.quit()
fetch_zhihu_article("https://zhuanlan.zhihu.com/p/687972531")
阅读全文