首页python爬虫代码大全

python爬虫代码大全

时间: 2023-07-22 13:12:34 浏览: 345

当然！下面是一些常见的Python爬虫代码示例： 1. 使用requests库发送HTTP请求并获取页面内容： ```python import requests url = "http://example.com" response = requests.get(url) content = response.text print(content) ``` 2. 使用BeautifulSoup库解析HTML页面： ```python from bs4 import BeautifulSoup html = "<html><body><h1>Hello, World!</h1></body></html>" soup = BeautifulSoup(html, "html.parser") title = soup.find("h1").text print(title) ``` 3. 使用正则表达式提取页面中的特定信息： ```python import re text = "Hello, my email is example@example.com" pattern = r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" emails = re.findall(pattern, text) print(emails) ``` 4. 使用Selenium库模拟浏览器行为： ```python from selenium import webdriver driver = webdriver.Chrome() driver.get("http://example.com") content = driver.page_source print(content) driver.quit() ``` 这些代码示例只是爬虫功能的一小部分，你可以根据具体需求进行进一步的学习和开发。